Filepaths
bnode_core.filepaths
Utilities for constructing, naming, and resolving files and directories used by bnode projects. This module centralizes path conventions and discovery for:
- Auto-recognition of the configuration directory based on the current working directory or CLI overrides. Using little "hint" files, the code can determine if it is running inside a bnode package repository or a standalone project.
- Creation and discovery of data folders for raw data and datasets.
- Canonical naming for raw-data and dataset artifacts derived from the data generation configuration. This includes naming conventions that reflect model name, version, if initial states, parameters, and controls are sampled, sampling strategies itself, and dataset sizes.
- Paths to Hydra runtime output artifacts (models, optimizers, datasets).
- Resolution of MLflow artifact URIs to local filesystem paths via the MLFLOW_ARTIFACTS_DESTINATION environment variable.
config_dir_auto_recognize() -> Path
Auto-recognize and return the project's configuration directory.
The current working directory is inspected for known locations of the configuration directory. CLI flags can be used to override discovery and delegate resolution to Hydra.
Returns:
| Type | Description |
|---|---|
Path
|
Path | None: Path to the discovered configuration directory. Returns |
Path
|
None when --help is requested, or when CLI flags indicate that a config |
Path
|
path will be provided externally (so Hydra may handle it). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no configuration directory can be found and no CLI flag suggests that the path will be provided manually. |
Notes
Search order: 1. If "--help" or "-h" present, print help and return None. 2. If ".bnode_package_repo" exists and "resources/config/" exists, return it. 3. Else if "./config/" exists, return it. 4. Else, if "-cp"/"--config-path" and / or "-cn/ --config-name" are present, return None to allow Hydra to handle the path. 5. Otherwise, log an error and raise ValueError.
Source code in src/bnode_core/filepaths.py
create_path(path: Path, log: bool) -> None
Create a directory if it does not exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Directory path to create. |
required |
log
|
bool
|
Whether to log the creation/exists message. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/bnode_core/filepaths.py
log_overwriting_file(path: Path) -> None
Log whether a file will be written or overwritten.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
File path for which to log the action. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/bnode_core/filepaths.py
raw_data_name(cfg: data_gen_config) -> str
Build the canonical base name for a raw-data artifact.
The name is derived from the model name/version and optional sampling strategy flags present in the configuration.
Behavior
- Always includes model name and version.
- Appends initial states sampling strategy if initial states are included. (E.g., '_s-R')
- Appends parameters sampling strategy if parameters are included. (E.g., '_p-R')
- Appends controls sampling strategy if controls are included. (E.g., '_c-RROCS')
- Appends '_c-Mo' suffix if controls are only for sampling and the actual used controls are extracted from the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
data_gen_config
|
Data generation configuration. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Base name for raw-data artifacts. |
Source code in src/bnode_core/filepaths.py
dataset_name(cfg: data_gen_config, n_samples: int) -> str
Build the canonical dataset name for a given configuration and size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
data_gen_config
|
Data generation configuration. |
required |
n_samples
|
int
|
Number of samples in the dataset. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Dataset name including sample count and optional suffix. |
Source code in src/bnode_core/filepaths.py
dir_data(log: bool = False) -> Path
Resolve the root data directory for the project.
Search order: 1) If ".bnode_package_repo" exists: "./resources/data". 2) Else if "../../.surrogate_test_data_repo" exists: "../../data". 3) Else: "./data".
The directory is created if missing.
Note
In the future, it might be a good idea to switch to a more flexible directory assignment mechanism, e.g., via environment variables or configuration files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log
|
bool
|
Whether to log directory creation/existence. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Root data directory. |
Source code in src/bnode_core/filepaths.py
dir_raw_data(cfg: data_gen_config, log: bool = False) -> Path
Return the directory in which raw data for the config is stored.
The path includes a subdirectory named by raw_data_name(cfg) and is
created on demand.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
data_gen_config
|
Data generation configuration. |
required |
log
|
bool
|
Whether to log directory creation/existence. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Directory for raw data artifacts. |
Source code in src/bnode_core/filepaths.py
filepath_raw_data(cfg: data_gen_config) -> Path
Return the path to the raw-data file for a configuration.
If cfg.pModel.RawData.raw_data_from_external_source is True, the path is
resolved inside the raw-data directory using the external file name. Otherwise,
a default file name of the form <raw_data_name>_raw_data.hdf5 is used.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
data_gen_config
|
Data generation configuration. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to the raw-data file. |
Source code in src/bnode_core/filepaths.py
filepath_raw_data_config(cfg: data_gen_config) -> Path
Return the path to the RawData configuration YAML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
data_gen_config
|
Data generation configuration. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to the YAML configuration stored alongside raw-data. |
Source code in src/bnode_core/filepaths.py
dir_datasets(log: bool = False) -> Path
Return the root directory for datasets, creating it if missing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log
|
bool
|
Whether to log directory creation/existence. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Root datasets directory. |
Source code in src/bnode_core/filepaths.py
dir_specific_dataset_from_cfg(cfg: data_gen_config, n_samples: int, log: bool = False) -> Path
Return the directory for a specific dataset derived from a config.
The directory name is computed via :func:dataset_name and created if missing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
data_gen_config
|
Data generation configuration. |
required |
n_samples
|
int
|
Number of samples in the dataset. |
required |
log
|
bool
|
Whether to log directory creation/existence. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Directory for the specific dataset. |
Source code in src/bnode_core/filepaths.py
dir_specific_dataset_from_name(name: str, log: bool = False) -> Path
Return the directory for a specific dataset by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Dataset name. |
required |
log
|
bool
|
Unused flag kept for API symmetry; directory is not created here. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Directory path for the dataset name. |
Source code in src/bnode_core/filepaths.py
filepath_dataset(cfg: data_gen_config, n_samples: int) -> Path
Return the path to a dataset file for a given config and size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
data_gen_config
|
Data generation configuration. |
required |
n_samples
|
int
|
Number of samples in the dataset. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
HDF5 dataset file path. |
Source code in src/bnode_core/filepaths.py
filepath_dataset_config(cfg: data_gen_config, n_samples: int) -> Path
Return the path to the pModel configuration YAML for the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
data_gen_config
|
Data generation configuration. |
required |
n_samples
|
int
|
Number of samples in the dataset. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
YAML configuration file path stored with the dataset. |
Source code in src/bnode_core/filepaths.py
filepath_dataset_from_name(name: str) -> Path
Return the HDF5 dataset file path for a given dataset name. Assumes that the dataset is located in the standard dataset directory structure of this package.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Dataset name. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
HDF5 dataset file path. |
Source code in src/bnode_core/filepaths.py
filepath_dataset_from_config(dataset_name: str, dataset_path: str) -> Path
Return the HDF5 dataset file path for a given dataset name or explicit path.
If dataset_path is provided, it is used directly. Otherwise, the path is
constructed based on the standard dataset directory structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_name
|
str
|
Dataset name. |
required |
dataset_path
|
str | None
|
Optional explicit dataset file path. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
HDF5 dataset file path. |
Source code in src/bnode_core/filepaths.py
filepath_dataset_config_from_name(name: str) -> Path
Return the pModel configuration YAML path for a dataset name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Dataset name. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
YAML configuration file path. |
Source code in src/bnode_core/filepaths.py
dir_current_hydra_output() -> Path
Return the current Hydra runtime output directory.
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to Hydra's current run output directory. |
Source code in src/bnode_core/filepaths.py
filepath_model_current_hydra_output(phase: int | None = None) -> Path
Return the model checkpoint path in the current Hydra output directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
phase
|
int | None
|
Optional training phase index. When provided, the filename is "model_phase_{phase}.pt"; otherwise "model.pt". |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Model checkpoint file path. |
Source code in src/bnode_core/filepaths.py
filepath_pretrained_model_current_hydra_output() -> Path
Return the pretrained model file path in the current Hydra output dir.
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
"model_pretrained.pt" path. |
Source code in src/bnode_core/filepaths.py
filepath_dataset_current_hydra_output() -> Path
Return the dataset file path in the current Hydra output directory.
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
"dataset.hdf5" path. |
filepath_optimizer_current_hydra_output(phase: int | None = None) -> Path
Return the optimizer state dict path in the current Hydra output dir.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
phase
|
int | None
|
Optional training phase index. When provided, the filename is "optimizer_phase_{phase}.pt"; otherwise "optimizer.pt". |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Optimizer checkpoint file path. |
Source code in src/bnode_core/filepaths.py
filepath_from_ml_artifacts_uri(mlflow_uri: str) -> Path
Resolve an MLflow artifacts URI to a local filesystem path.
The base directory is read from the MLFLOW_ARTIFACTS_DESTINATION
environment variable. The leading "file://" is stripped from the env var
value if present. The "mlflow-artifacts:/" prefix in mlflow_uri is also
removed before joining.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mlflow_uri
|
str
|
An MLflow artifacts URI (e.g.,
"mlflow-artifacts:/ |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Resolved local filesystem path. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in src/bnode_core/filepaths.py
filepath_from_local_or_ml_artifacts(mlflow_path: str) -> Path
Return a local Path from either a local path or an MLflow artifacts URI.
If the provided path starts with "mlflow-artifacts:", it is resolved via
:func:filepath_from_ml_artifacts_uri. Otherwise, it is treated as a local
filesystem path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mlflow_path
|
str
|
Local filesystem path or an MLflow artifacts URI beginning with "mlflow-artifacts:". |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Local filesystem path. |