Perform comprehensive uncertainty and sensitivity analysis on parameter estimation results.
This function evaluates parameter uncertainties using variance-covariance matrices,
computes confidence intervals, performs sensitivity analysis via finite-difference
methods, and assesses model quality through multiple statistical metrics (R², MSE,
likelihood). It supports Hessian-based, Jacobian-based, and bootstrap-based
covariance estimation.
Central: ( frac{partial y}{partial theta_i} approx frac{y(theta + h e_i) - y(theta - h e_i)}{2h} )
Central differences are more accurate (second-order) but require twice as many simulations.
Variance-Covariance Estimation Methods (from paper Table S3):
‘H’ (Hessian-based): Local uncertainty from optimizer’s Hessian matrix.
[
mathbf{V} = sigma^2 mathbf{H}^{-1}
]
where ( sigma^2 ) is estimated from residual variance.
Advantages: Fast, automatic from optimizer
Disadvantages: Assumes local linearity, may be inaccurate for ill-conditioned problems
‘J’ (Jacobian-based): Local uncertainty from Fisher Information Matrix.
[
mathbf{M} = mathbf{Q}^T mathbf{W} mathbf{Q}, quad mathbf{V} = mathbf{M}^{-1}
]
where ( mathbf{Q} ) is the sensitivity matrix and ( mathbf{W} = text{diag}(1/sigma_i^2) ).
Advantages: More robust than Hessian for parameter-sensitive models
Disadvantages: Requires explicit sensitivity computation
‘B’ (Bootstrap): Global uncertainty via residual resampling.
Accounts for parameter bounds via truncated normal correction.
Advantages: Most robust, accounts for non-linearity
Disadvantages: Computationally expensive, requires running parmest with var-cov=’B’
Confidence Intervals:
95% confidence intervals are computed using t-distribution:
[
CI_i = t_{0.975, nu} cdot sqrt{V_{ii}}
]
where ( nu = n_{obs} - n_{params} ) is degrees of freedom.
t-Statistics:
Parameter significance is tested via:
[
t_i = frac{theta_i}{sqrt{V_{ii}}}
]
Large ( |t_i| ) (typically > 1.96 for 95% confidence) indicates the parameter
is significantly different from zero.
Normalized Sensitivities (Z-scores):
Sensitivities are normalized by parameter uncertainties and measurement errors:
This provides model comparison based on goodness-of-fit (higher probability → better model).
Mesh-Independency Test:
If epsilon is not provided, it is automatically determined by varying ( h ) over
( [10^{-12}, 10^{-1}] ) and selecting the value that minimizes sensitivity matrix
variance. Results are saved in ‘./meshindep_plots/’.
Ill-Conditioning Diagnostics:
The function automatically logs conditioning metrics:
Rank of sensitivity matrix ( mathbf{Q} )
Condition numbers of ( mathbf{Q} ) and ( mathbf{M} )
Minimum eigenvalue of ( mathbf{M} )
Warnings if system is ill-conditioned (condition number > ( 10^8 ))
In-Place Updates (case=None):
When case=None, the function updates models dictionary:
models[‘theta’]: Updated with final estimates
models[‘V_matrix’]: Updated with covariance matrices
Plot the estimability of parameters across different rounds for a given model. Estimability plotter post analysis.
Parameters:
round_data (dict): Data related to each round of experiments.
path (str): Base path where the plot will be saved.
model (str): The name of the model for which the estimability is being plotted.
Plot parameter PDFs (on the diagonal) and confidence ellipses (off-diagonal) for
the winning model across selected rounds. Then call conf_plot_metrics and accvsprec_plot.
Parameters:
filename (str) – Filename for the main confidence plot figure.
ellipsoid_volume_filename (str, optional) – Filename for the ellipsoid volume plot.
std_devs_filename (str, optional) – Filename for the standard deviations plot.
parameter_estimates_filename (str, optional) – Filename for the parameter estimates plot.
Write an Excel file for each round and a summary text file for the last round.
Excel files contain experimental data, model simulations, and input data.
The text file contains scaled parameters, t-values, and R² metrics for the last round.
Parameters:
filename (str) – Base filename. Round numbers are appended to Excel files (e.g., “my_file_2.xlsx”).
The text summary is written to a file with “_summary.txt” appended.
A class to handle plotting of results while analysis.
Attributes:
path (str): Base path for saving plots.
case (str): Case identifier for the experiment.
mutation_settings (dict): Settings related to mutation in the modelling process.
modelling_folder (str): Folder name for storing modelling results.
Generate and save a confidence ellipsoid plot for estimated parameters, across different rounds of identification for different models.
Parameters:
parameters (dict): Model parameters.
cov_matrices (dict): Covariance matrices for the estimated parameters
confidence_intervals (dict): Confidence intervals for the parameters.
round (int): The round number.
Generate and save the model fitting to experimental data plot for the given data and results.
Parameters:
data (dict): Experimental data.
result (dict): Modelling results.
round (int): The round number.
case (str): The case type (e.g., ‘doe’ or ‘classic’).
Plot rCC values against k for a specific round and model. Estimability plotter while analysis.
Parameters:
x_values (list): List of x values (k values).
rCC_values (list): List of rCC values corresponding to x values.
round (int): The round number.
framework_settings (dict): Settings related to the framework, including paths and case information.
model (str): The name of the model for which the plot is being generated.
Plot Sobol sensitivity analysis results for a given model
Parameters:
time_samples (list): List of time for samples (time span)
sobol_analysis_results (dict): Results of the Sobol sensitivity analysis.
sobol_problem (dict): Problem definition for the Sobol analysis.
model (str): The name of the model for which the plot is being generated.
response_key (str): The response key for which the sensitivities are plotted.
framework_settings (dict): User provided - Settings related to the framework, including paths and case information.
Plot R² prediction, R² validation, and R² reference for each model as an envelope plot.
Also, create a bar plot with average R² values for prediction, validation, and all the data for each model.
Parameters:
prediction_R2 (dict): Dictionary containing R² prediction values for each fold and model.
validation_R2 (dict): Dictionary containing R² validation values for each fold and model.
R2_ref (dict): Dictionary containing R² reference values for each model.
Plot normalized parameters for each model in each fold, divided by the corresponding member in ref_params.
Parameters:
parameters (dict): Dictionary containing parameter values for each fold and model. (one data out at a time based cross-validation)
ref_params (dict): Dictionary containing reference parameter values for each model. (all data used estimation)
Perform Model-Based Design of Experiments for Model Discrimination (MBDoE-MD).
This function identifies experimental designs that maximise the ability to
discriminate between competing models. Optimisation can run in parallel or
single-core mode. Failed runs are discarded, and the best valid design is returned.
Maximises sum of squared differences between model predictions
across all time points and measured outputs. Suitable for
discriminating models with significant structural differences.
’BFF’: Buzzi-Ferraris-Forzatti T-optimality
Alternative T-optimality criterion with different weighting
scheme for model divergence computation. Accounts for signal
magnitude in discrimination objective.
RuntimeError – If all parallel runs fail or if the single-core run encounters an exception.
Notes
MBDoE-MD aims to design experiments that enhance the ability to reject
incorrect models by maximising discrimination criteria based on T-optimality.
The discrimination objective maximises the divergence between competing model
predictions, enabling statistical tests (P-test, t-test) to identify the most
representative model.
Workflow for Model Discrimination:
1. Calibrate all candidate models on available data
2. Design discriminative experiment using MBDoE-MD (HR or BFF criterion)
3. Conduct designed experiment
4. Re-calibrate models with new data
5. Compute model probabilities via P-test
6. Select model with highest posterior probability
7. Validate selected model
Supported constraints include:
Control level bounds and switching rules (increasing, decreasing, relaxed)
Minimum sampling intervals and total sample limits
Forced or forbidden sampling times
Synchronised sampling across outputs
Piecewise-constant (CPF) or piecewise-linear (LPF) control profiles
References
See also
_safe_run_md
Parallel optimisation wrapper with exception handling.
Configure a logger with standard output formatting for MBDoE operations.
This function creates or reconfigures a logger with a consistent format and
ensures output is directed to stdout (not stderr). Existing handlers are
removed to prevent duplicate log messages.
Parameters:
name (str, optional) – Logger name. If None, returns the root logger (default: None).
If the logger already has handlers, they are removed before adding a new one.
This prevents duplicate messages when the function is called multiple times.
Propagation:
Logger propagation is disabled (logger.propagate = False) to prevent messages
from being passed to parent loggers.
middoe.sc_sensa.sample(settings, N, calc_second_order=True, scramble=True, skip_values=0, seed=None)[source]
Generate structured Sobol sample matrix for variance-based sensitivity analysis.
Creates a low-discrepancy quasi-random sample matrix using Sobol sequences,
structured to enable efficient computation of first-order and total-order
sensitivity indices. Supports grouped analysis and various sampling options.
Perform global Sobol sensitivity analysis to quantify parameter and variable importance.
This function implements variance-based global sensitivity analysis using Sobol indices,
identifying which parameters and input variables most influence model predictions.
It supports both single-core and multi-core parallel execution, making it suitable
for computationally expensive models. The analysis generates time-varying sensitivity
indices showing how parameter importance evolves during dynamic experiments.
Sobol Sensitivity Indices:
Variance-based global sensitivity analysis decomposes output variance into contributions
from individual inputs and their interactions.
First-Order Index (S1):
Main effect of input i, excluding interactions:
Execute forward model simulation with flexible solver and interpolation support.
This is the main simulation kernel that orchestrates the complete forward modeling
workflow. It handles model loading (internal, external Python, or gPROMS), variable
scaling and interpolation, initial condition setup, and ODE integration. The function
supports multiple control variable parameterization (CVP) methods and integrates
seamlessly with parameter estimation and MBDoE routines.
Parameters:
t (list or np.ndarray) – Normalized time points for simulation [0, 1]. Will be scaled to physical time
internally using utsc.
Scaled and interpolated time-variant input profiles:
Keys: input variable names
Values: arrays of shape (len(t),) with input trajectories
Raises:
KeyError – If model_name is not found in models[‘krt’], or if required keys are missing
from swps or uphit for specified CVP methods.
ImportError – If required modules (krnl_models, pygpas) cannot be imported.
ValueError – If CVP method is unrecognized or if switching point arrays have mismatched lengths.
Notes
Simulation Workflow:
Model Loading: Determine model type and load model function
Variable Scaling: Convert normalized inputs to physical units
Interpolation: Construct time-variant input profiles via CVP
Initial Conditions: Extract from system[‘tvo’] definitions
Integration: Call solver_selector() with appropriate backend
Return: Scaled outputs and interpolated inputs
Model Types:
‘pym’ (Python internal): Model function defined in middoe.krnl_models.
Loaded via importlib and called directly.
‘pys’ (Python external): Model defined in external .py file.
Must contain solve_model(t, y0, tii, tvi, theta) function.
Loaded dynamically via importlib.util.
‘gpr’ (gPROMS): Model defined in gPROMS .gPJ file.
Requires pygpas package for interface.
Evaluated via gPAS client-server architecture.
callable: Model function passed directly in models[‘krt’][model_name].
Most flexible option for custom models.
Control Variable Parameterization (CVP):
CPF (Constant Piecewise): Zero-order hold interpolation.
Input held constant between switching points (step changes).
[
u(t) = u_k quad text{for } t in [t_k, t_{k+1})
]
LPF (Linear Piecewise): Linear interpolation between switching points.
Smooth transitions create linear ramps.
[
u(t) = u_k + (u_{k+1} - u_k) frac{t - t_k}{t_{k+1} - t_k}
]
no_CVP: No control optimization. Input remains constant at uphit value.
Normalization Convention:
All variables, parameters, and time are normalized internally:
This improves numerical conditioning and allows unified handling across different
scales. Scaling is reversed internally before calling the physical model.
Initial Conditions:
Each time-variant output can specify:
Numeric value: Used directly as ( y_0 )
‘variable’: References uphi[‘{var}_0’] (e.g., ‘CA_0’ for initial concentration)
Select and execute appropriate ODE solver backend with automatic retry logic.
This function dispatches to the correct integration backend based on model type,
handling SciPy solve_ivp, external Python scripts, and gPROMS models. It includes
automatic retry logic to handle transient numerical failures or connection issues.
Parameters:
model (callable) –
Model function to integrate. Signature depends on backend:
SciPy: dydt = model(t, y, phi, phit, theta, t_vec)
External: result = model(t, y0, phi, phit, theta)
t (list[float]) – Physical time points for evaluation.
y0 (list[float]) – Initial conditions for ODE states.
Random port selection (10000-90000) to avoid conflicts
Silent mode (stdout/stderr suppressed)
Automatic connection teardown via context manager
Requires valid credentials in models[‘creds’]
Error Handling:
If all retries fail:
Prints error message with attempt count
Returns empty dicts (simulation failed)
Calling code should check for empty results
Output Naming Convention:
For SciPy models, outputs are automatically named ‘y1’, ‘y2’, etc.
based on position in state vector. For external/gPROMS models,
output names must match keys in system[‘tvo’] and system[‘tio’].
Execute in-silico experiment and generate synthetic measurement data with noise.
This function orchestrates the complete in-silico experimentation workflow: setting up
simulation parameters, running the forward model, adding realistic measurement noise,
and saving results to Excel. It supports both preliminary (classic) designs and optimized
MBDoE designs. The function handles time-variant and time-invariant inputs/outputs,
control variable parameterizations, and switching profiles.
Negative values are clipped to zero (physical constraint).
DataFrame Structure:
The output DataFrame contains multiple sections separated by blank columns:
Per-variable measurements: Separate columns for each output variable
with sampling times (MES_X), noisy values (MES_Y), and uncertainties (MES_E)
Global simulation: Time vector (X:all), all inputs, time-invariant outputs
CVP metadata: Control variable parameterization methods
Switching points: Only for MBDoE designs with switching profiles
Excel Output:
Results are appended to ‘data.xlsx’ in the current directory. Each experiment
creates/updates a sheet named by the experiment number. This allows accumulation
of multiple experiments in a single file for batch processing.
Time Scaling:
All time values are normalized to [0, 1] internally, then scaled back to physical
units (tsc = system[‘t_s’][1]) for output. This improves numerical conditioning.
Initialize normalized parameter vectors for all models.
This utility creates ‘normalized_parameters’ entries (all ones) from the
‘theta’ parameter scaling factors. It’s used during setup to establish the
normalization convention used throughout the package.
Accumulate experimental data in memory during sequential experiment generation.
This function provides an alternative to Excel file accumulation for in-silico
experiments. It’s particularly useful for programmatic workflows where data
is generated and consumed entirely in Python without intermediate file I/O.
Parameters:
df_combined (pd.DataFrame) – DataFrame for current experiment with standardized structure (from expera()).
experiment_number (int or str) – Unique identifier for this experiment (converted to string internally).
DataFrame Structure:
The df_combined DataFrame should match the format expected by read_excel(),
with the same column naming conventions.
Memory Considerations:
Storing many experiments in memory can consume significant RAM. For large
campaigns (>100 experiments), consider periodic saving to disk.
Register a function in the kernel_simulator module namespace for dynamic access.
This utility enables runtime registration of functions in the kernel_simulator module,
facilitating dynamic model loading and callback mechanisms. It’s primarily used for
custom model functions that need to be accessible globally during simulation.
Parameters:
func_name (str) – Name under which to register the function in kernel_simulator’s namespace.
The function with this name must exist in the current global scope.
Returns:
Function is registered as a side effect.
Return type:
None
Notes
Use Case:
When defining custom ODE models in a script, this function allows them to be
dynamically registered in the kernel_simulator module, making them accessible
to the simulation engine without modifying the core package code.
Registration Mechanism:
Uses setattr() to add a lambda wrapper that calls the local function:
Namespace Management:
The function must exist in globals() at the time of registration. This typically
means it should be defined in the same script before calling fun_globalizer().
Warning:
This modifies the kernel_simulator module’s namespace at runtime. Use with caution
in production environments. Prefer direct model registration via models[‘krt’].
See also
simula
Uses registered functions for model execution.
Examples
>>> # Define a custom model>>> defmy_custom_model(t,y,tii,tvi,theta,t_vec):... dydt=[-theta[0]*y[0],theta[0]*y[0]]... returndydt>>>>>> # Register it globally>>> fun_globalizer('my_custom_model')>>>>>> # Now accessible in kernel_simulator>>> # (used internally by simulation engine)
Load previously saved identification and sensitivity analysis results.
This function attempts to deserialize .jac files from the current directory,
providing easy access to archived results without re-running expensive
computations. It handles missing files gracefully and reports status for
each expected file.
Returns:
loaded_data –
Dictionary with keys ‘iden’ and ‘sensa’:
loaded_data[‘iden’]: Identification results (round_data) or None
loaded_data[‘sensa’]: Sensitivity results (sobol_results) or None
If both files exist: Returns dict with both keys populated
If one file exists: Returns dict with one key populated, other is None
If neither exists: Returns None
Error Handling:
If a file exists but cannot be loaded (corrupt, version mismatch, etc.),
the function prints an error message and sets that key to None, but continues
loading other files.
Usage Patterns:
Resume workflow: Load previous rounds and continue
Post-processing only: Load results for plotting/analysis
Comparison: Load multiple result sets for comparison
Pickle Security:
As with save_to_jac(), only load .jac files from trusted sources.
Read experimental data from ‘data.xlsx’ in the current working directory.
This is the standard data loading function used throughout the MBDoE package.
It reads all sheets from an Excel file containing experimental measurements,
input conditions, and metadata. The file must follow the standardized format
with specific column naming conventions.
FileNotFoundError – If ‘data.xlsx’ does not exist in current working directory.
Notes
File Location:
The function looks for ‘data.xlsx’ in Path.cwd() (current working directory).
Ensure you run your script from the project root or navigate to the correct
directory before calling this function.
Data Format:
The Excel file should contain multiple sheets, each representing one experiment.
Within each sheet:
Time column: ‘X:all’ (full simulation time vector)
Input columns: Time-invariant (e.g., ‘P’, ‘T0’) and time-variant (e.g., ‘T’)
Metadata: ‘CVP:{var}’ for control parameterization info
Switching points: ‘{var}t’ (times) and ‘{var}l’ (levels)
Usage Context:
Called by:
parmest(): To load data for parameter estimation
uncert(): To load data for uncertainty analysis
validation(): To split data for cross-validation
expera(): Writes to this file during in-silico experiments
Multiple Sheets:
Each sheet represents an independent experiment. Parameter estimation typically
uses all sheets simultaneously to estimate a single set of parameters. The
sheet names are arbitrary but should be unique identifiers.
Pandas read_excel Options:
Uses sheet_name=None to read all sheets at once, returning a dictionary.
This is more efficient than reading sheets individually.
Save and organize results from a single experimental round for post-processing.
This function consolidates all results from one round of the MBDoE workflow
(design → experiment → estimation → analysis) into a structured dictionary.
It computes reference t-values, saves parameter estimates and uncertainties,
generates plots, and prepares data for multi-round comparison.
obs (int) – Total number of observations (measurements) across all experiments.
system (dict) – System configuration with variable definitions.
ranking (list[str], optional) – Parameter ranking from estimability analysis (most to least estimable).
k_optimal_value (int, optional) – Optimal number of parameters to estimate based on rCC analysis.
rCC_values (dict[str, list[float]], optional) – Revised Critical Ratio values for each model: {solver: [rCC_1, rCC_2, …]}.
J_k_values (dict[str, list[float]], optional) – Objective function values for each k: {solver: [J_1, J_2, …]}.
best_uncert_result (dict, optional) – If estimability analysis was performed, the uncertainty results for the
optimal parameter subset. Overrides ‘result’ if provided.
Plots are saved to folders named by round and solver.
Estimability Analysis Integration:
If best_uncert_result is provided, it indicates that estimability analysis
was performed to select a subset of parameters. This result is used instead
of the full uncertainty analysis.
Mutation Tracking:
Parameter activity masks (mutation) track which parameters are estimated
(True) vs fixed (False) in each round. This is crucial for sequential
parameter estimation strategies.
Serialize and save results to a binary .jac file using pickle.
This function provides persistent storage for estimation or sensitivity analysis
results, enabling later retrieval without re-running expensive computations.
The .jac extension is a custom convention for “Job Archive” files.
’iden’: Saves to ‘iden_results.jac’ (identification results)
’sensa’: Saves to ‘sensa_results.jac’ (sensitivity results)
Returns:
File is written to disk as a side effect.
Return type:
None
Raises:
ValueError – If purpose is neither ‘iden’ nor ‘sensa’.
Exception – If file writing fails (permissions, disk space, etc.).
Notes
Pickle Protocol:
Uses pickle.HIGHEST_PROTOCOL for maximum compatibility with future Python
versions while maintaining backward compatibility.
File Location:
Files are saved in Path.cwd() (current working directory). Ensure you have
write permissions in this directory.
Security Warning:
Pickle files can execute arbitrary code during deserialization. Only load
.jac files from trusted sources. For untrusted data, consider JSON or HDF5.
File Size:
Results can be large (MB-GB) if they contain many rounds with extensive
covariance matrices, LSA matrices, and simulation trajectories. Monitor
disk space usage.
Use Cases:
Checkpoint: Save after each round for crash recovery
Archive: Preserve results for reproducibility
Sharing: Distribute results to collaborators
Post-processing: Load results for plotting without re-estimation
>>> # After completing all estimation rounds>>> save_to_jac(round_data,purpose='iden')# [INFO] Results saved to: .../iden_results.jac>>>>>> # After sensitivity analysis>>> save_to_jac(sobol_results,purpose='sensa')# [INFO] Results saved to: .../sensa_results.jac>>>>>> # Invalid purpose>>> save_to_jac(results,purpose='invalid')# ValueError: Purpose must be 'iden' or 'sensa'
Export Sobol sensitivity analysis results to Excel for visualization and reporting.
This function converts nested Sobol analysis results (time-varying sensitivity
indices for multiple models and responses) into a well-organized Excel workbook
with separate sheets for each model-response combination.
None – Excel file is written to disk as a side effect.
Side Effects
————
Creates ‘sobol_results.xlsx’ in current working directory.
Notes
Excel Structure:
Each sheet contains:
Column 1: Time (or time point index)
Columns 2+: Total Sobol indices (ST) for each parameter
Sheet names follow the pattern: ‘{model}_{response}’
(truncated to 31 chars for Excel compatibility).
Total Sobol Index (ST):
The function exports ST (total-order indices) rather than S1 (first-order)
because ST captures both main effects and all interactions, providing the
most comprehensive sensitivity information.
Time-Varying Sensitivity:
Each row represents one time point. This captures how parameter importance
evolves during a dynamic experiment:
Early times: Initial condition parameters dominate
Middle times: Kinetic parameters dominate
Late times: Equilibrium parameters dominate
Missing Data Handling:
Skips invalid or empty data
Uses default parameter names (‘Param_1’, ‘Param_2’, …) if names unavailable
Handles mismatched data gracefully
Visualization:
The exported Excel file can be:
Plotted directly in Excel
Imported into plotting software
Used for publication-quality figures
See also
plot_sobol_results
Visualization function from iden_utils.
SALib.analyze.sobol
Sobol analysis computation.
Examples
>>> # After running Sobol analysis>>> save_to_xlsx(sensa_results,sobol_problem)# Sobol results saved to: .../sobol_results.xlsx>>>>>> # Result: Excel file with sheets like:>>> # - M1_CA (Concentration A sensitivity for model 1)>>> # - M1_CB (Concentration B sensitivity for model 1)>>> # - M2_T (Temperature sensitivity for model 2)>>> # Each with columns: Time, k1, k2, Ea, ...