Skip to content

flexcv.utilities

flexcv.utilities.add_model_to_keys(param_grid)

This function adds the string "model__" to avery key of the param_grid dict.

Parameters:

Name Type Description Default
param_grid dict

A dictionary of parameters for a model.

required

Returns:

Type Description
dict

A dictionary of parameters for a model with the string "model__" added to each key.

Source code in flexcv/utilities.py
def add_model_to_keys(param_grid):
    """This function adds the string "model__" to avery key of the param_grid dict.

    Args:
      param_grid (dict): A dictionary of parameters for a model.

    Returns:
      (dict): A dictionary of parameters for a model with the string "model__" added to each key.
    """
    return {f"model__{key}": value for key, value in param_grid.items()}

flexcv.utilities.add_module_handlers(logger)

Adds handlers to the logger for the module.

Parameters:

Name Type Description Default
logger Logger

logging.Logger: The logger for the module.

required

Returns:

Type Description
None

(None)

Source code in flexcv/utilities.py
def add_module_handlers(logger: logging.Logger) -> None:
    """Adds handlers to the logger for the module.

    Args:
      logger: logging.Logger: The logger for the module.

    Returns:
      (None)
    """
    logger = logging.getLogger()  # Get the root logger
    logger.setLevel(logging.INFO)

    c_handler = logging.StreamHandler()
    c_format = logging.Formatter("%(module)s - %(levelname)s - %(message)s")
    c_handler.setFormatter(c_format)
    c_handler.setLevel(logging.INFO)
    logger.addHandler(c_handler)

flexcv.utilities.get_fixed_effects_formula(target_name, X_data)

Returns the fixed effects formula for the dataset.

Scheme: "target ~ column1 + column2 + ...

Parameters:

Name Type Description Default
target_name

str: The name of the target variable in the dataset.

required
X_data

pd.DataFrame: The feature matrix.

required

Returns:

Type Description
str

The fixed effects formula.

Source code in flexcv/utilities.py
def get_fixed_effects_formula(target_name, X_data) -> str:
    """Returns the fixed effects formula for the dataset.

    Scheme: "target ~ column1 + column2 + ...

    Args:
      target_name: str: The name of the target variable in the dataset.
      X_data: pd.DataFrame: The feature matrix.

    Returns:
      (str): The fixed effects formula.
    """
    if X_data.shape[1] == 1:
        return f"{target_name} ~ {X_data.columns[0]}"
    start = f"{target_name} ~ {X_data.columns[0]} + "
    end = " + ".join(X_data.columns[1:])
    return start + end

flexcv.utilities.get_re_formula(random_slopes_data)

Returns a random effects formula for use in statsmodels. Scheme: ~ random_slope1 + random_slope2 + ... Returns an empty string if no random slopes are provided.

Parameters:

Name Type Description Default
random_slopes_data

pd.Series | pd.DataFrame: The random slopes data.

required

Returns:

Type Description
str

The random effects formula.

Source code in flexcv/utilities.py
def get_re_formula(random_slopes_data):
    """Returns a random effects formula for use in statsmodels. Scheme: ~ random_slope1 + random_slope2 + ...
    Returns an empty string if no random slopes are provided.

    Args:
      random_slopes_data: pd.Series | pd.DataFrame: The random slopes data.

    Returns:
      (str): The random effects formula.
    """
    if random_slopes_data is None:
        return ""
    elif isinstance(random_slopes_data, pd.DataFrame):
        return "~ " + " + ".join(random_slopes_data.columns)
    elif isinstance(random_slopes_data, pd.Series):
        return "~ " + random_slopes_data.name
    else:
        raise TypeError("Random slopes data type not recognized")

flexcv.utilities.get_repeated_cv_metadata(str_children='Instance of repeated run ', api_dict=None)

This function can be used to fetch metadata from repeated cross-validation runs. We use it to get the ids of the children runs and their descriptions.

Parameters:

Name Type Description Default
str_children str

The string that is prepended to the description of each child run.

'Instance of repeated run '
api_dict dict

A dictionary containing the Neptune.ai project name and the api token.

None
Source code in flexcv/utilities.py
def get_repeated_cv_metadata(str_children="Instance of repeated run ", api_dict=None):
    """This function can be used to fetch metadata from repeated cross-validation runs.
    We use it to get the ids of the children runs and their descriptions.

    Args:
        str_children (str): The string that is prepended to the description of each child run.
        api_dict (dict): A dictionary containing the Neptune.ai project name and the api token.
    """
    if api_dict is None:
        raise ValueError("api_dict must be provided")

    # get a list of all runs in the project
    project = neptune.init_project(
        project=api_dict["project"],
        api_token=api_dict["api_token"],
        mode="read-only",
    )
    runs_table_df = project.fetch_runs_table().to_pandas()
    # use only rows where "sys/description" begins with "Instance"
    # group by run sys/description
    grouped = runs_table_df[
        runs_table_df["sys/description"].str.startswith(str_children)
    ].groupby("sys/description")
    # get sys/id for each group
    grouped_ids = grouped["sys/id"].apply(list)
    # remove "Instance of repeated run " and trailing dot from the description
    grouped_ids.index = grouped_ids.index.str.replace(str_children, "")
    grouped_ids.index = grouped_ids.index.str.replace(".", "")
    # rename the index to "host id"
    grouped_ids.index.name = "host id"
    # rename the column to "children ids"
    grouped_ids.name = "children ids"
    metadata = pd.DataFrame(grouped_ids)
    # use the host ids to get their sys/description and make them a new column in the DataFrame
    host_ids = grouped_ids.index
    descriptions = runs_table_df[runs_table_df["sys/id"].isin(host_ids)][
        "sys/description"
    ]
    descriptions.index = host_ids
    descriptions.index.name = "host id"
    descriptions.name = "description"
    # join the two DataFrames
    metadata = metadata.join(pd.DataFrame(descriptions))
    # save to excel
    metadata.to_excel("repeated_cv_metadata.xlsx")

flexcv.utilities.handle_duplicate_kwargs(*args)

This function removes duplicate kwargs from mutiple dicts. If a key is present in multiple dicts, we check if the values are the same. If they are, we keep the key-value pair. If they are not, we raise a ValueError.

Parameters:

Name Type Description Default
kwargs dict

A dict of kwargs.

required

Returns:

Type Description
dict

The dict without duplicate kwargs.

Source code in flexcv/utilities.py
def handle_duplicate_kwargs(*args) -> dict:
    """This function removes duplicate kwargs from mutiple dicts.
    If a key is present in multiple dicts, we check if the values are the same.
    If they are, we keep the key-value pair. If they are not, we raise a ValueError.

    Args:
        kwargs (dict): A dict of kwargs.

    Returns:
        (dict): The dict without duplicate kwargs.
    """
    return_kwargs = {}
    for arg in args:
        if not isinstance(arg, dict):
            raise TypeError("All arguments must be of type dict")
        for key, value in arg.items():
            if not key in return_kwargs.keys():
                return_kwargs[key] = value
                # compare values
            else:
                if arg[key] != return_kwargs[key]:
                    raise ValueError(
                        f"Duplicate key {key} found with different values. Overwriting."
                    )
                else:
                    logger.info(f"Duplicate key {key} found with same value. Keeping.")

    return return_kwargs

flexcv.utilities.pformat_dict(d, indent='')

Pretty-format a dictionary, only printing values that are themselves dictionaries.

Parameters:

Name Type Description Default
d dict

dictionary to print

required
indent str

Level of indentation for use with recursion (Default value = "")

''

Returns:

Source code in flexcv/utilities.py
def pformat_dict(d, indent=""):
    """Pretty-format a dictionary, only printing values that are themselves dictionaries.

    Args:
      d (dict): dictionary to print
      indent (str): Level of indentation for use with recursion (Default value = "")

    Returns:

    """
    formatted = ""
    for key, value in d.items():
        formatted.join(f"{indent}{key}")
        if isinstance(value, dict):
            next_layer = pformat_dict(value, indent + "  ")
            formatted.join(next_layer)
    return formatted

flexcv.utilities.rm_model_from_keys(param_grid)

This function removes the string "model__" from avery key of the param_grid dict.

Parameters:

Name Type Description Default
param_grid dict

A dictionary of parameters for a model.

required

Returns:

Type Description
dict

A dictionary of parameters for a model with the string "model__" removed from each key.

Source code in flexcv/utilities.py
def rm_model_from_keys(param_grid):
    """This function removes the string "model__" from avery key of the param_grid dict.

    Args:
      param_grid (dict): A dictionary of parameters for a model.

    Returns:
      (dict): A dictionary of parameters for a model with the string "model__" removed from each key.
    """
    return {key.replace("model__", ""): value for key, value in param_grid.items()}

flexcv.utilities.run_padding(func)

Decorator to add padding to the output of a function. Helps to visually separate the output of different functions.

Parameters:

Name Type Description Default
func

Any callable.

required

Returns:

Type Description
Any

Return value of the passed callable.

Source code in flexcv/utilities.py
def run_padding(func):
    """Decorator to add padding to the output of a function.
    Helps to visually separate the output of different functions.

    Args:
      func: Any callable.

    Returns:
      (Any): Return value of the passed callable.
    """

    @wraps(func)
    def wrapper_function(*args, **kwargs):
        print()
        print("~" * 10, "STARTING RUN", "~" * 10)
        print()
        results = func(*args, **kwargs)
        print()
        print("~" * 10, " END OF RUN", "~" * 10)
        print()
        return results

    return wrapper_function