MIDDoE: Model-(based) Identification, Discrimination, and Design of Experiments
MIDDoE is an open-source Python package for systematic model identification workflows. It provides an integrated, physics-aware framework that combines Global Sensitivity Analysis (GSA), Estimability Analysis (EA), Parameter Estimation, Uncertainty Quantification, Model-Based Design of Experiments (MBDoE), and cross validation for dynamic systems.
MIDDoE unifies the complete identification pipeline from pre-experimental diagnostics through post-analysis reporting—suitable for experimentalists and engineers with limited programming expertise.
1. Installation
Install via pip:
pip install middoe
MIDDoE requires Python ≥ 3.9 and automatically installs dependencies: NumPy, SciPy, Pandas, Matplotlib, and Pymoo.
2. Mathematical Framework
MIDDoE operates on lumped dynamic systems governed by differential-algebraic equations:
2.1 MIMO model structure
Variables and parameters:
\(\boldsymbol{\theta} \in \mathbb{R}^{N}\): model parameters to be estimated
\(\mathbf{x}(t) \in \mathbb{R}^{N_x}\): time-variant differential states
\(\mathbf{z}(t) \in \mathbb{R}^{N_z}\): time-variant algebraic states
\(\mathbf{u}(t) \in \mathbb{R}^{N_u}\): time-variant controls (manipulated variables)
\(\mathbf{w} \in \mathbb{R}^{N_w}\): time-invariant controls (design variables, initial conditions)
\(\mathbf{y}(t) \in \mathbb{R}^{N_r}\): measured outputs
\(\boldsymbol{\varepsilon}(t)\): measurement error (noise)
2.2 Sobol Sensitivity Analysis
Two sample matrices \(\mathbf{A}, \mathbf{B} \in [0,1]^{N \times d}\) are generated from the scrambled Sobol sequence, where \(N\) is the number of quasi-random base samples. For each variable \(i \in \{1,...,d\}\) a hybrid matrix \(\mathbf{AB}_i\) is generated by replacing the \(i\)-th column of \(\mathbf{A}\) with the corresponding column from matrix \(\mathbf{B}\), keeping the remaining columns unchanged. The total number of model evaluations required is:
This includes model evaluations of all \(N\) samples of matrices \(\mathbf{A}\), \(\mathbf{B}\), and each \(\mathbf{AB}_i\). First-order Sobol indices, measuring the main effect of each input on output variance, are estimated using Jansen’s estimator:
Total-order Sobol indices, which include all interactions involving input \(i\), are computed using the Jansen estimator:
where \(\text{Var}(Y)\) represents the total variance of the model output, estimated from the pooled responses across all sample matrices.
2.3 Fisher Information Matrix
The local sensitivity matrix \(\mathbf{Q}_r\) for each measured response \(r\) is defined as:
where \(\hat{y}_r(\theta, t)\) denotes model predictions, \(N_{sp}\) is the number of sampling points, and \(N_{\theta}\) is the number of active parameters. This matrix enables construction of the Fisher information matrix \(\mathbf{I}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\) as:
where prior information of parameters can be neglected (i.e., \(\boldsymbol{\Sigma}_{\boldsymbol{\theta}}^{-1} = 0\)), and \(\sigma_{r,r'}\) are elements from the inverse of the variance-covariance matrix of measurements errors. If the condition number of the Fisher matrix, defined as \(\kappa(\mathbf{I})\), exceeds a critical threshold (typically \(10^3\)), the model is considered sloppy and if this number is extremely high, the matrix is ill-conditioned.
2.4 Confidence Intervals and t-Tests
The resulting covariance matrix \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\) is used to compute Confidence Interval \(CI_i\) and t-value of estimated parameter \(i\) at a confidence level of \(1 - \alpha\) (95%):
where \(v_{\theta,ii}\) is the \(i\)-th diagonal element of \(\mathbf{V}\), and \(t_{1 - \alpha/2, \text{DoF}}\) is the upper \(1 - \alpha/2\) critical value of a Student’s t-distribution with \(\text{DoF} = N_{sp} - N_{\theta}\) degrees of freedom. The t-test provides a quantitative measure of estimation precision by statistically assessing whether each estimated parameter \(\hat{\theta}_i\) is significantly different from the parameter noise with a zero mean. This is verified by checking whether the computed \(t_i\)-value exceeds the reference threshold \(t_{\text{ref}}\), corresponding to a Student t-value at the same confidence level.
2.5 MBDoE-MD: T-Optimality Criteria
The general form of the T-optimal design problem is expressed as:
where \(N_m\) denotes the number of competing models, \(\boldsymbol{\theta}_l\) and \(\boldsymbol{\theta}_{l'}\) are the respective preliminary estimated parameter vectors for models \(l\) and \(l'\), \(\Omega\) defines the feasible design space, and \(\mathbf{F}_{l,l'}(t)\) is a weight factor based on the uncertainties in observation and parameter estimation at different sampling times (which is 1 for HR). For the BFF, \(\mathbf{F}_{l,l'}(t)\) incorporates model sensitivity and covariance terms, and is defined at each sampling time \(t\) as:
where \(\boldsymbol{\Sigma}_y\) is the variance-covariance matrix of measurement errors, and \(\mathbf{W}_l(t)\) and \(\mathbf{W}_{l'}(t)\) are representing the modeling error contributions, derived from the sensitivity of each model to its parameters. Each \(\mathbf{W}_l(t)\) is constructed as:
where \(\mathbf{Q}_l(t)\) is the sensitivity matrix of model \(l\) with respect to its estimated parameters \(\boldsymbol{\theta}_l\) from the preliminary estimations, evaluated at time \(t\):
2.6 MBDoE-PP: Alphabetical Criteria
MBDoE-PP aims to minimise parameter uncertainty by reducing the size of the confidence hyperellipsoid through optimisation of a scalar metric \(\Phi(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\). The MBDoE-PP problem is formulated as:
Criterion |
Definition |
Mathematical form |
|---|---|---|
D |
Reducing the confidence region size by minimising the determinant of the \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\) |
\(\Phi_D = \det(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\) |
A |
Lowering the average variance across all parameter estimates by minimising the trace of \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\) |
\(\Phi_A = \text{tr}(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\) |
E |
Reducing the variance of the most uncertain parameter by minimising the biggest eigenvalue of \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\) |
\(\Phi_E = \lambda_{\max}(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\) |
ME |
Ensure uniform precision across parameters by minimises the condition number of the \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\) |
\(\Phi_{ME} = \kappa(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\) |
2.7 Estimability Analysis
A scaled sensitivity matrix \(\mathbf{Q}^S \in \mathbb{R}^{N_{tsp} \times N_{\theta}}\) is constructed, where \(N_{tsp}\) is the total number of responses samples (i.e., the sum of \(N_{sp}\) across all measured outputs), and \(N_{\theta}\) is the number of model parameters. Each element of \(\mathbf{Q}^S\), denoted as \(q_{ij}^S\), is computed by scaling the local sensitivities as:
where \(\frac{\partial y_i}{\partial \theta_j}\) is the local sensitivity of the \(i\)-th model output to the \(j\)-th parameter, \(\sigma_{y_i}\) and \(\sigma_{\theta_j}\) are the standard deviations of the measurement error for \(i\)-th model output and the prior uncertainty of \(j\)-th model parameter. An iterative deflation algorithm is then applied to rank parameters. Once the parameters are ranked, the optimal subset is selected by estimating nested top-\(k\) subsets for \(k \in \{1,2,...,N_{\theta} - 1\}\) and evaluating their WLS performance. For each \(k\), compute the critical ratio \(r_{C,k}\):
where \(J_{N_{\theta}}\) is the objective value for the full model in which all parameters are estimated. To account for statistical bias and sample size effects, compute the corrected form of the critical ratio \(r_{CC,k}\):
Determine the optimal number of parameters \(k^*\) to estimate by identifying the value of \(k\) that minimises \(r_{CC,k}\).
2.8 P-Test for Model Selection
After calibration of the candidate models, P-test quantifies the relative likelihood that each candidate model provides the best fit to the observed data. This method assigns a probability score \(P_i\) to each model \(i\), reflecting its relative quality based on goodness-of-fit metrics. The probability of the \(i\)-th model being the best among \(N_m\) candidates is defined as:
where \(\chi_i^2\) values are obtained after parameter estimation for each model. Models with lower \(\chi^2\) values are considered better descriptors of the system. A higher \(P_i\) indicates stronger evidence in favour of model \(i\).
3. Workflow Conceptual Steps
A complete MIDDoE identification workflow for Basic user follows this sequence:
I. System and model establishment
System structure Define the experimental and operational space in the
systemdictionary, including all time-variant inputs, time-invariant inputs, measured outputs, and their physical/operational constraints (bounds, sampling limits, dead-times, CVP structure).Candidate models Implement candidate models as black-box simulators and register them in the
modelsdictionary, specifying parameter vectors, feasible bounds, estimation masks (mutation), and the model interface type (e.g.'pys','pym', function name, or'gpr').Global Sensitivity Analysis Configure the
gsadictionary and callsensa()frommiddoe.sc_sensato analyse the influence of parameters and/or controls on the outputs, providing a ranking that guides model reduction and design choices.
II. Data creation
Preliminary data Either create a
data.xlsxfile in the project repository containing your experimental measurements, or define theinsilicosdictionary (true model, true parameters, noise type) and generate synthetic data by callingexpera()frommiddoe.krnl_expera.MBDoE for discrimination Set up the
des_optdictionary for model discrimination (md_obset to'HR'or'BFF') and runmbdoe_md()frommiddoe.des_mdto design experiments that maximally separate the predictions of rival models.MBDoE for precision Configure
des_optfor parameter precision (pp_obset to'D','A','E', or'ME') and runmbdoe_pp()frommiddoe.des_ppto obtain experiments that minimise parameter uncertainty for the selected model(s).Append data After executing the designed experiments (in the lab or in-silico via
expera()), updatedata.xlsxor your in-memory data structure so that all newly collected experiments are available for calibration.
III. Model identification
Parameter estimation With the current dataset and model definitions, run
parmest()frommiddoe.iden_parmestusing theiden_optdictionary to estimate parameters, compute goodness-of-fit metrics, and obtain convergence information.Uncertainty analysis Pass the estimation results to
uncert()frommiddoe.iden_uncertto evaluate variance–covariance matrices, confidence intervals, t-values, and prediction metrics, characterising the precision of the estimated parameters.Estimability analysis Run
estima()frommiddoe.sc_estimato rank parameters by estimability and determine the optimal subset that can be reliably estimated with the current data, informing which parameters should remain active or be fixed in subsequent rounds.Round storage Treat the complete sequence (design → experiment → estimation → uncertainty → estimability) as the i-th round and save it into a
round_datadictionary by callingsave_rounds()frommiddoe.log_utils.Sequential rounds Repeat steps 4–11, updating experimental designs, data, and parameter masks round by round until the desired discrimination and calibration performance indicators (e.g. model selection metrics, confidence intervals, t-tests) are achieved.
IV. Post analysis
Model validation Perform cross-validation over the full dataset by calling
validation()frommiddoe.iden_valida, obtaining calibration/validation statistics (e.g. per-response and global (R^2), residual analysis) for the final selected model(s).Save results Persist all identification and design results to disk by calling
save_to_jac()frommiddoe.log_utils(e.g. with purposes such as"iden"or"sensa"for later reuse).Load results Reload previously saved workflows and analysis outputs by calling
load_from_jac()frommiddoe.log_utils, enabling further inspection, reporting, or continuation of the workflow.Post-processing Generate global reports and visualisations by calling
run_postprocessing()frommiddoe.iden_utilson the storedround_data, producing plots (parameter trajectories, confidence ellipsoids, p/t-tests, estimability evolution) and Excel summaries for documentation and decision-making.
4. Project structure
Organise your project as follows:
my_project/
├── data.xlsx # Your experimental measurements
│
├── model1.py # Python file defining your model 1
├── model2.py # Python file defining your model 2
├── modeli.py # Python file defining your model i
│
└── workflow.py/ # analysis and dentification workflow
5. Data structure
MIDDoE reads data from a data.xlsx file in the project repository, Each sheet in data.xlsx represents one batch/experiment and contains measured outputs (MES), time-invariant controls \(\mathbf{w}\), and time-variant controls \(\mathbf{u}(t)\) in a single table.
MES_X:y1 |
MES_Y:y1 |
MES_E:y1 |
MES_X:y2 |
MES_Y:y2 |
MES_E:y2 |
X:all |
w1 |
u1 |
|---|---|---|---|---|---|---|---|---|
\(t_{\mathrm{sp},1}^{(y_1)}\) |
\(y_1\!\big(t_{\mathrm{sp},1}^{(y_1)}\big)\) |
\(\sigma_{y_1}\) |
\(t_{\mathrm{sp},1}^{(y_2)}\) |
\(y_2\!\big(t_{\mathrm{sp},1}^{(y_2)}\big)\) |
\(\sigma_{y_2}\) |
\(t_{\mathrm{all},1}\) |
\(w_1\) |
\(u_1\!\big(t_{\mathrm{all},1}\big)\) |
\(t_{\mathrm{sp},2}^{(y_1)}\) |
\(y_1\!\big(t_{\mathrm{sp},2}^{(y_1)}\big)\) |
\(\sigma_{y_1}\) |
\(t_{\mathrm{sp},2}^{(y_2)}\) |
\(y_2\!\big(t_{\mathrm{sp},2}^{(y_2)}\big)\) |
\(\sigma_{y_2}\) |
\(t_{\mathrm{all},2}\) |
\(w_1\) |
\(u_1\!\big(t_{\mathrm{all},2}\big)\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
\(w_1\) |
\(\vdots\) |
``MES_X:y1``, ``MES_X:y2``, …, ``MES_X:yi``: Sampling times \(t_{\mathrm{sp}}\) used for measurements of \(y_1\), \(y_2\), …, \(y_i\) (system measured response sampled times).
``MES_Y:y1``, ``MES_Y:y2``, …, ``MES_Y:yi``: Measured responses \(y_r(t_{\mathrm{sp}})\) for measurements of \(y_1\), \(y_2\), …, \(y_i\) (system measured responses), column names must match
system['tvo']keys.``MES_E:y1``, ``MES_E:y2``, …, ``MES_E:yi``: Measurement standard deviations \(\sigma_{y_r}\).
``X:all``: Global time grid \(t_{\mathrm{all}}\) on which the time-variant inputs \(u_i(t)\) are tabulated (should match the solver precision, stepped by
system['t_r']).``w1``, ``w2``, … : Time-invariant controls \(w_j\), constant over the experiment; column names must match
system['tii']keys.``u1``, ``u2``, … : Time-variant control profiles \(u_i(t_{\mathrm{all}})\); column names must match
system['tvi']keys so that MIDDoE can reconstruct the control trajectories fromX:allandu_i.
6. model.py structure
MIDDoE can unboard and tag models in several ways:
Type |
Configuration |
When to use |
|---|---|---|
External Python script |
|
You have your own ODE solver or external simulator (Suggested over global method and suggested for interfacing external tools) |
Built-in models |
|
For testing; MIDDoE provides basic test models (typically solid-fluid reaction models) |
Global Python function |
|
Function already defined in your script, in a global space. |
gPROMS coupling |
|
gPAS file created by gPROMS, to couple with gPROMS models. |
In general it treats models as a black box and requires a standard model interface. Create a Python file with a function following this signature (‘pys’) approach:
def solve_model(t, y0, tii, tvi, theta):
"""
Standard MIDDoE model interface.
Parameters
----------
t : list
Time vector for evaluation.
y0 : dict
Initial conditions for differential states.
Keys are state names, values are initial values.
Example: {'y1': 0.0, 'y2': 0.5}
tii : dict
Time-invariant inputs (design variables, initial concentrations).
Example: {'w1': 0.5, 'w2': 1.0}
tvi : dict
Time-variant inputs (controls as functions of time).
Each key maps to a list matching time vector length.
Example: {'u1': [296.15, 297.5, 298.0, ...]}
theta : list
Model parameters to be estimated.
Example: [50000, 75000, 0.4116, 111900, 9905, 30000]
Returns
-------
dict
Dictionary with response trajectories.
Keys match measured output names.
Values are lists matching time vector length.
Example: {'y1': [0.0, 0.01, 0.02, ...], 'y2': [0.0, 0.005, 0.010, ...]}
"""
# Example: ODE integration using SciPy
from scipy.integrate import odeint
# Extract parameters
k1, k2, k3, k4, k5, k6 = theta
# Define differential equations
def system_odes(state, t_point, u_t, w):
C_A, C_B = state
# Reaction kinetics (example)
u_value = u_t(t_point) # Get control value at time t_point
dCA_dt = -k1 * C_A + k2 * w['w1']
dCB_dt = k1 * C_A - k3 * C_B
return [dCA_dt, dCB_dt]
# Interpolate time-variant controls
from scipy.interpolate import interp1d
u1_func = interp1d(t, tvi['u1'], kind='linear', fill_value='extrapolate')
# Integrate
y0_list = [y0.get('CA', 0.0), y0.get('CB', 0.0)]
solution = odeint(system_odes, y0_list, t, args=(u1_func, tii))
# Return in MIDDoE format
return {
'y1': solution[:, 0].tolist(),
'y2': solution[:, 1].tolist()
}
7. Configuration Dictionaries
system Dictionary Definition
The system dictionary is a central hub for all MIDDoE operations. It specify system characteristics.
system = {
# ========== TIME CONFIGURATION ==========
't_s': [0, 16], # Time span: [start, end]
't_r': 0.02, # Time resolution: every 0.02
# Smaller → better accuracy but slower (it must match with the global time grid in data.xlsx
't_d': 0.3, # Dead-time fraction: exclude first 30% and last 30%
# for controls and sampling (typical batch startup/cooldown)
# ========== TIME-VARIANT INPUTS: u(t) ==========
'tvi': {
'u1': { # Example: Temperature profile
'min': 296.15, # Lower bound (K)
'max': 306.15, # Upper bound (K)
'stps': 6, # Number of CVP segments (switching points)
# stps=6 → 5 segments
'cvp': 'LPF', # Control profile type:
# 'CPF' = constant piecewise (steps)
# 'LPF' = linear piecewise (ramps)
'const': 'inc', # Constraint on signal shape:
# 'rel' = no constraint (relaxed)
# 'inc' = monotonically increasing
# 'dec' = monotonically decreasing
'offl': 0.01, # Min level change (fraction): 1% per segment
'offt': 0.3 # Min time spacing (fraction): 30% of time span
},
},
# ========== TIME-VARIANT OUTPUTS: y(t) ==========
'tvo': {
'y1': { # Example: Product concentration
'init': 0, # Initial value (can be number or 'variable')
# 'variable' = design decision (in tii)
'meas': True, # Measurable? (True) or virtual output (False)
'sp': 17, # Sampling points: max 17 measurements per batch
'unc': 0.005, # Measurement uncertainty (std dev): ±0.005 units (homoskedastic error regime)
'offt': 0.3, # Min sampling interval (fraction): 30% spacing
'samp_s': 1, # Synchronisation group: 1 = sync with other samp_s:1
'samp_f': [0, 16] # Forced sampling times: always sample at t=0 and t=16
},
'y2': {
'init': 0,
'meas': True,
'sp': 17,
'unc': 0.005,
'offt': 0.3,
'samp_s': 1, # Same group as y1 → synchronized sampling
'samp_f': [0, 16]
},
'y3': {
'init': 0,
'meas': True,
'sp': 17,
'unc': 0.005,
'offt': 0.3,
'samp_s': 1,
'samp_f': [0, 16]
},
},
# ========== TIME-INVARIANT INPUTS: w ==========
'tii': {
'y10': { # Example: Initial concentration of reactant
'min': 0.3,
'max': 1.0
},
'y20': { # Example: Initial concentration of catalyst
'min': 0.19,
'max': 1.0
},
'y20': { # Example: Initial concentration of catalyst
'min': 0.19,
'max': 1.0
},
},
# ========== TIME-INVARIANT OUTPUTS ==========
'tio': {} # Empty for this example; used for steady-state responses
}
Model Dictionary Definition
The model dictionary is a central hub for all MIDDoE operations. It specify model characteristics.
# Parameter vectors for in-silico validation (true values, nominal, and bounds)
theta = [50000, 75000, 0.4116, 111900, 9905, 30000] # Ground truth (insilico studies)
theta_n = [100000, 100000, 1, 100000, 100, 10000] # Initial guess
theta_mins = [10000, 0, 0.1, 50000, 10, 10000] # Lower bounds
theta_maxs = [1000000, 200000, 10, 200000, 10000, 200000] # Upper bounds
models = {
# ========== ACTIVE CANDIDATE MODELS ==========
'can_m': ['M'], # List of model names to use
# Example with multiple models: ['M1', 'M2', 'M3'] in case of need for model discrimination
# ========== MODEL INTERFACE TYPE ==========
'krt': {
'M': 'pys' # 'pys' = external Python script
# 'pym' = built-in MIDDoE models
# 'gpr' = gPROMS coupling
# 'function_name' = global function in namespace
},
# ========== MODEL SOURCE PATH ==========
'src': {
'M': r'C:/.../model.py'
# Only needed for 'pys' and 'gpr' type
# Use raw string (r'...') for Windows paths
},
# ========== CREDENTIALS FOR gPROMS ==========
'creds': {
'M': '' # Empty for non-gPROMS models
},
# ========== PARAMETER VECTORS AND BOUNDS ==========
'theta': { # Starting/nominal values
'M': theta_n
},
't_l': { # Lower bounds
'M': theta_mins
},
't_u': { # Upper bounds
'M': theta_maxs
},
# ========== ESTIMATION MASK ==========
'mutation': {
'M': [False, False, True, True, True, True]
# False = fix at nominal value (don't estimate)
# True = estimate from data
# Use False for poorly identifiable parameters identified by EA
}
}
insilicos Dictionary Definition
The insilicos dictionary is a settings dictionary to perform insilico experiment data generation.
# In-silico experimenter (used for synthetic data generation)
insilicos = {
'tr_m': 'M', # True model (ground truth)
'theta': theta, # True parameter vector
'errt': 'abs', # Error type:
# 'abs' = absolute noise
# 'rel' = relative (proportional to signal)
'prels': {} # Classic pre-defined designs (empty here) (iso-valued controls)
}
GSA Dictionary Definition
The gsa dictionary is a settings dictionary to perform Global Sensitivity Analysis with Sobol method.
# Global Sensitivity Analysis configuration gsa = {
‘var_s’: False, # Sensitivity w.r.t. controls? (u, w) ‘par_s’: True, # Sensitivity w.r.t. parameters? (θ) ‘var_d’: False, # Control sampling space:
# False = system-defined bounds # float > 1 (e.g., 1.1) = ±10% around nominal
- ‘par_d’: False, # Parameter sampling space:
# False = models-defined bounds
- ‘samp’: 2**12, # Sobol sample size: use powers of 2
# 2^12 = 4096 base samples # Total evaluations ≈ (2·N_vars + 2) × base
‘multi’: 0.7, # Parallel execution: 0.7 = use 70% of CPU cores ‘tii_n’: [0.508, 0.3925], # Nominal w values ‘tvi_n’: [301.15], # Nominal u values ‘plt’: True # Generate sensitivity plots?
}
MBDoE Dictionary Definition
The des_opt dictionary is a settings dictionary to conduct MBDoE-MD adn MBDoE-PP optimisations.
des_opt = {
# ========== FINITE DIFFERENCE SETUP ==========
'eps': 1e-5, # FDM perturbation size (normalised space)
# Smaller → better accuracy, larger → robustness
# None = auto mesh-independency test
# Typical: 1e-5 to 1e-3
# ========== MBDoE-MD OBJECTIVE (Model Discrimination) ==========
'md_ob': 'BFF', # T-optimality criterion:
# 'HR' = Hunter–Reiner (focuses on divergence)
# 'BFF' = Buzzi–Ferraris–Forzatti (includes error weighting)
# ========== MBDoE-PP OBJECTIVE (Parameter Precision) ==========
'pp_ob': 'D', # Alphabetical optimality criterion:
# 'D' = D-optimality (minimize det of covariance volume)
# 'A' = A-optimality (minimize average variance)
# 'E' = E-optimality (minimize maximum variance)
# 'ME' = Modified E (promote uniform precision)
# ========== OPTIMISATION METHOD ==========
'meth': 'PS', # Solver type:
# 'PS' = Pattern Search (local, derivative-free)
# 'DE' = Differential Evolution (global)
# 'DEPS' = DE + PS (hybrid: global then local)
# ========== ITERATION SETTINGS ==========
'itr': {
'pps': 100, # Population size (for DE/PS)
'maxmd': 5, # Max iterations for MD objective
'tolmd': 1, # MD convergence tolerance
'maxpp': 100, # Max iterations for PP objective
'tolpp': 1 # PP convergence tolerance
},
'plt': True # Generate design plots?
}
Calibration Dictionary Definition
The iden_opt dictionary is a settings dictionary to conduct parameter estimation and uncertainty analysis.
iden_opt = {
# ========== OPTIMISATION SOLVER ==========
'meth': 'LBFGSB', # Parameter estimation method:
# 'LBFGSB' = Limited-memory BFGS (local, fast)
# 'SLSQP' = Sequential Least Squares (local, constrained)
# 'DE' = Differential Evolution (global, slow)
# 'NMS' = Nelder–Mead Simplex (local, derivative-free)
# 'BFGS' = Standard BFGS (local)
# 'TC' = Trust-region Constrained (local, smooth)
'ms': True, # Multi-start? (True = try multiple random starts)
'maxit': 500, # Maximum iterations per solver attempt
'tol': 0.1, # Convergence tolerance
# ========== FINITE DIFFERENCE METHOD ==========
'sens_m': 'central',# FDM scheme:
# 'central' = central differences (2nd order, more accurate)
# 'forward' = forward differences (1st order)
'eps': 1e-5, # FDM perturbation size
# None = auto mesh-independency test
# ========== OBJECTIVE FUNCTION ==========
'ob': 'WLS', # Cost function:
# 'LS' = Least Squares
# 'WLS' = Weighted LS (accounts for measurement uncertainty)
# 'MLE' = Maximum Likelihood (if noise is known)
# 'CS' = Chi-Square
# ========== UNCERTAINTY QUANTIFICATION ==========
'var-cov': 'B', # Variance-covariance method:
# 'H' = Hessian-based (fast, local, assumes linearity)
# 'J' = Jacobian-based (faster)
# 'B' = Bootstrap (robust, global, slow)
'nboot': 50, # Bootstrap samples (if var-cov='B')
# ========== INITIALISATION ==========
'init': None, # Starting point:
# None = use theta from models dict
# 'rand' = random initialization
# ========== VISUALISATION ==========
'c_plt': False, # Confidence ellipsoid plots?
'f_plt': True, # Fitting comparison plots?
'plt_s': True, # Show plots while saving?
'log': False # Verbose logging?
}
8. Practical Workflow
This section demonstrates the execution sequence for a complete MIDDoE insilico identification workflow. All configuration dictionaries (system, models, gsa, insilicos, des_opt, iden_opt) are assumed to be defined as shown in Section 7.
8.1 Import Required Modules
from middoe.sc_sensa import sensa
from middoe.des_pp import mbdoe_pp
from middoe.des_md import mbdoe_md
from middoe.krnl_expera import expera
from middoe.iden_parmest import parmest
from middoe.iden_uncert import uncert
from middoe.sc_estima import estima
from middoe.iden_valida import validation
from middoe.log_utils import save_rounds, save_to_jac, load_from_jac, save_to_xlsx
from middoe.iden_utils import run_postprocessing
8.2 Global Sensitivity Analysis (Optional)
Identify influential parameters before designing experiments:
# Run Sobol sensitivity analysis
sobol_results = sensa(gsa, models, system)
# Save results
save_to_jac(sobol_results, purpose="sensa")
# Export to Excel
results = load_from_jac()
sensa_data = results['sensa']
save_to_xlsx(sensa_data)
Output: First-order and total-order Sobol indices for each parameter, plus plots if gsa['plt'] = True.
8.3 Round 1: Initial Design and Estimation
Step 1: Design the first experiment
round = 1
designs = mbdoe_pp(des_opt, system, models, round=1, num_parallel_runs=16)
Returns: Optimal control profiles (u(t)), time-invariant inputs (w), and sampling times that minimize parameter uncertainty according to the selected criterion (des_opt['pp_ob']).
Step 2: Generate data
# For insilico studies:
expera(system, models, insilicos, designs, expr=1, swps=designs['swps'])
# For real experiments: manually add data to data.xlsx following the design
If you don’t add the swps from MBDoE designs, based on the expr number, it executes the insilicos[‘prels’][expr] iso-value insilico experiments.
Result: New sheet in data.xlsx containing measurements for experiment 1.
Step 3: Estimate parameters
resultpr = parmest(system, models, iden_opt, case='strov')
None(default): Standard estimation, usesmodels['theta']as starting values and fixed parameter fallback.'freeze': Keep existing mutation masks unchanged for sequential estimation (preserves which parameters are active/frozen between rounds).'strov': **ST**art from **R**eference **O**ptimal **V**alues. Usesmodels['thetastart']as starting values instead ofmodels['theta']. Useful when continuing from a previous calibration round with improved initial guesses.
Returns: Estimated parameters \(\hat{\boldsymbol{\theta}}\), objective function value, convergence status, and computational time for each model in models['can_m'].
Step 4: Quantify uncertainty
uncert_results = uncert(resultpr, system, models, iden_opt)
resultun = uncert_results['results']
obs = uncert_results['obs']
Returns: Variance-covariance matrix \(\mathbf{V}\), confidence intervals \(CI_i\), t-values \(t_i\), correlation matrix, and bootstrap statistics.
Step 5: Evaluate estimability
ranking, k_optimal_value, rCC_values, J_k_values, best_uncert_result = \
estima(resultun, system, models, iden_opt, round)
Returns: Parameter ranking by estimability, optimal number of estimable parameters \(k^*\), and corrected critical ratio curve \(r_{CC,k}\).
Step 6: Save round results
round_data = {}
save_rounds(round, resultun, 'preliminary', round_data, models, iden_opt, obs, system,
ranking=ranking, k_optimal_value=k_optimal_value, rCC_values=rCC_values,
J_k_values=J_k_values, best_uncert_result=best_uncert_result)
Result: Round 1 data stored in round_data dictionary for later analysis.
8.4 Sequential Rounds (2, 3, …, N)
Repeat the workflow to progressively reduce parameter uncertainty:
for round in range(2, 5): # Example: rounds 2-4
# Design next experiment
designs = mbdoe_pp(des_opt, system, models, round=round, num_parallel_runs=16)
# Collect data
expera(system, models, insilicos, designs, expr=round, swps=designs['swps'])
# Estimate with cumulative dataset
resultpr = parmest(system, models, iden_opt, case='strov')
# Quantify uncertainty
uncert_results = uncert(resultpr, system, models, iden_opt)
resultun = uncert_results['results']
obs = uncert_results['obs']
# Check estimability
ranking, k_optimal_value, rCC_values, J_k_values, best_uncert_result = \
estima(resultun, system, models, iden_opt, round)
# Save round
save_rounds(round, resultun, 'sequential', round_data, models, iden_opt, obs, system,
ranking=ranking, k_optimal_value=k_optimal_value, rCC_values=rCC_values,
J_k_values=J_k_values, best_uncert_result=best_uncert_result)
Optional: Update models['mutation'] between rounds to manually fix poorly estimable parameters instead of using estima().
Stopping criteria:
All t-values exceed threshold: \(t_i > t_{\text{ref}}\)
8.5 Model Discrimination (Multi-Model Case)
If multiple candidate models compete, design discriminating experiments:
# Configure for discrimination
des_opt['md_ob'] = 'BFF' # or 'HR'
# Design experiment maximizing model separation
md_designs = mbdoe_md(des_opt, system, models, round=round)
# Execute and analyze
expera(system, models, insilicos, md_designs, expr=round, swps=md_designs['swps'])
resultpr = parmest(system, models, iden_opt, case='discrimination')
Model selection: Compare \(\chi^2\) values and P-test probabilities (Section 2.8) to identify the best model.
8.6 Cross-Validation
Validate the calibrated model across all experiments:
validres = validation(system, models, iden_opt, round_data)
Returns: Calibration vs. validation metrics (\(R^2_{\text{cal}}\), \(R^2_{\text{val}}\), \(\text{RMSE}_{\text{cal}}\), \(\text{RMSE}_{\text{val}}\)), residual plots, and leave-one-out statistics.
8.7 Save Final Results
Persist all workflow data:
# Save identification results
save_to_jac(round_data, purpose="iden")
Result: Binary file results/iden_YYYYMMDD_HHMMSS.jac containing all rounds.
8.8 Load and Post-Process
Reload saved data and generate reports:
# Load results
results = load_from_jac()
iden = results['iden']
# Generate comprehensive reports
run_postprocessing(
round_data=iden,
solvers=['M'], # Model identifiers
selected_rounds=[1, 2, 3, 4], # Rounds to analyze
plot_global_p_and_t=True, # Parameter evolution + t-tests
plot_confidence_spaces=True, # 2D confidence ellipses
plot_p_and_t_tests=True, # t-value bar charts
export_excel_reports=True, # Summary tables
plot_estimability=True # Estimability ranking evolution
)
Outputs:
results/figures/: Parameter trajectories, confidence regions, t-test evolution, fitting plotsresults/reports/: Excel files with parameter estimates, uncertainties, correlations, and validation metrics
8.9 Complete Example: Discrimination to Calibration
A realistic workflow integrating model discrimination, candidate elimination, and sequential precision refinement:
from middoe.des_md import mbdoe_md
from middoe.des_pp import mbdoe_pp
from middoe.krnl_expera import expera
from middoe.iden_parmest import parmest
from middoe.iden_uncert import uncert
from middoe.sc_estima import estima
from middoe.iden_valida import validation
from middoe.log_utils import save_rounds, save_to_jac, load_from_jac
from middoe.iden_utils import run_postprocessing
# Assume system, models, insilicos, iden_opt, des_opt defined (Section 7)
# models['can_m'] = ['M1', 'M2', 'M3'] # Multiple competing models
round_data = {}
round = 1
# ========== PHASE 1: MODEL DISCRIMINATION ==========
while len(models['can_m']) > 1:
# Design discriminating experiment
des_opt['md_ob'] = 'BFF' # Buzzi-Ferraris-Forzatti criterion
md_designs = mbdoe_md(des_opt, system, models, round=round)
# Execute experiment
expera(system, models, insilicos, md_designs, expr=round, swps=md_designs['swps'])
# Estimate parameters for all candidate models
resultpr = parmest(system, models, iden_opt, case=None)
# Uncertainty quantification
uncert_results = uncert(resultpr, system, models, iden_opt)
resultun = uncert_results['results']
obs = uncert_results['obs']
# Estimability analysis
ranking, k_opt, rCC, J_k, best_res = estima(resultun, system, models, iden_opt, round)
# Save discrimination round
save_rounds(round, resultun, 'discrimination', round_data, models, iden_opt, obs, system,
ranking=ranking, k_optimal_value=k_opt, rCC_values=rCC,
J_k_values=J_k, best_uncert_result=best_res)
# Compute P-test values for model selection
import numpy as np
chi2_values = {solver: resultun[solver]['chi2'] for solver in models['can_m']}
P_values = {}
denom = sum(np.exp(-chi2/2) for chi2 in chi2_values.values())
for solver, chi2 in chi2_values.items():
P_values[solver] = (1 - np.exp(-chi2/2) / denom) * 100
print(f"\nRound {round} - P-test results:")
for solver in models['can_m']:
print(f" {solver}: P = {P_values[solver]:.2f}%, χ² = {chi2_values[solver]:.4f}")
# Remove models with low discrimination probability (P < 20%)
threshold = 20.0
models_to_remove = [s for s in models['can_m'] if P_values[s] < threshold]
if models_to_remove:
print(f"Removing models {models_to_remove} (P < {threshold}%)")
for solver in models_to_remove:
models['can_m'].remove(solver)
# Exit discrimination if only one model remains or max rounds reached
if len(models['can_m']) == 1 or round >= 5:
print(f"\nDiscrimination complete. Selected model: {models['can_m'][0]}")
break
round += 1
# ========== PHASE 2: PARAMETER PRECISION REFINEMENT ==========
# Switch to parameter precision optimization
des_opt['pp_ob'] = 'D' # D-optimal design for final calibration
selected_model = models['can_m'][0] # Best model from discrimination
# Single precision round (or continue for multiple rounds if needed)
round += 1
designs = mbdoe_pp(des_opt, system, models, round=round, num_parallel_runs=16)
expera(system, models, insilicos, designs, expr=round, swps=designs['swps'])
# Re-estimate with discriminated model using previous optimal as starting point
models['thetastart'] = {selected_model: resultun[selected_model]['theta']}
resultpr = parmest(system, models, iden_opt, case='strov')
# Final uncertainty analysis
uncert_results = uncert(resultpr, system, models, iden_opt)
resultun = uncert_results['results']
obs = uncert_results['obs']
# Estimability check
ranking, k_opt, rCC, J_k, best_res = estima(resultun, system, models, iden_opt, round)
# Save precision round
save_rounds(round, resultun, 'precision', round_data, models, iden_opt, obs, system,
ranking=ranking, k_optimal_value=k_opt, rCC_values=rCC,
J_k_values=J_k, best_uncert_result=best_res)
# Check stopping criteria
t_values = resultun[selected_model]['t_values']
t_ref = resultun[selected_model]['t_ref']
CI_relative = resultun[selected_model]['CI'] / np.abs(resultun[selected_model]['theta'])
print(f"\nRound {round} - Convergence check:")
print(f" All t-values > t_ref: {np.all(t_values > t_ref)}")
print(f" Max relative CI: {np.max(CI_relative):.3f} (target < 0.1)")
# Continue precision rounds if needed
max_rounds = 8
while round < max_rounds:
if np.all(t_values > t_ref) and np.max(CI_relative) < 0.1:
print("Convergence criteria met. Stopping.")
break
round += 1
designs = mbdoe_pp(des_opt, system, models, round=round, num_parallel_runs=16)
expera(system, models, insilicos, designs, expr=round, swps=designs['swps'])
resultpr = parmest(system, models, iden_opt, case='strov')
uncert_results = uncert(resultpr, system, models, iden_opt)
resultun, obs = uncert_results['results'], uncert_results['obs']
ranking, k_opt, rCC, J_k, best_res = estima(resultun, system, models, iden_opt, round)
save_rounds(round, resultun, 'sequential', round_data, models, iden_opt, obs, system,
ranking=ranking, k_optimal_value=k_opt, rCC_values=rCC,
J_k_values=J_k, best_uncert_result=best_res)
# Update convergence metrics
t_values = resultun[selected_model]['t_values']
CI_relative = resultun[selected_model]['CI'] / np.abs(resultun[selected_model]['theta'])
print(f"Round {round}: max CI_rel = {np.max(CI_relative):.3f}")
# ========== PHASE 3: VALIDATION AND REPORTING ==========
# Cross-validation
validres = validation(system, models, iden_opt, round_data)
print(f"\nValidation R²: {validres[selected_model]['R2_val']:.4f}")
# Save all results
save_to_jac(round_data, purpose="iden")
# Generate comprehensive reports
results = load_from_jac()
run_postprocessing(
round_data=results['iden'],
solvers=[selected_model],
selected_rounds=list(range(1, round + 1)),
plot_global_p_and_t=True,
plot_confidence_spaces=True,
plot_p_and_t_tests=True,
export_excel_reports=True,
plot_estimability=True
)
print(f"\n{'='*60}")
print(f"Workflow complete: {round} rounds executed")
print(f"Selected model: {selected_model}")
print(f"Final parameters: {resultun[selected_model]['theta']}")
print(f"{'='*60}")
Workflow summary:
Discrimination phase (rounds 1–N): MBDoE-MD designs maximize model separation. After each round, P-test identifies and removes poorly performing models (P < 20%).
Precision phase (rounds N+1–M): Once a single model remains, MBDoE-PP designs minimize parameter uncertainty using D-optimal criterion.
Validation: Cross-validation assesses generalization performance across all experiments.
Reporting: Automated generation of parameter trajectories, confidence regions, and Excel summaries.
Key decision points:
P-value threshold: Adjust
thresholdbased on confidence level (typical: 15–25%).Stopping criteria: Customize based on application requirements (t-values, relative CI, condition number).
Max rounds: Prevents infinite loops; adjust based on computational budget.
9. Citation
If you use MIDDoE in your research, please cite:
- [1]. Tabrizi, Z., Barbera, E., Leal da Silva, W.R., & Bezzo, F. (2025).
MIDDoE: An MBDoE Python package for model identification, discrimination, and calibration. Digital Chemical Engineering, 17, 100276. https://doi.org/10.1016/j.dche.2025.100276
- [2]. Tabrizi, S.Z.B., Barbera, E., Da Silva, W.R.L, & Bezzo, F. (2025).
A Python/Numpy-based package to support model discrimination and identification. J.F.M. Van Impe, G. Léonard, S.S. Bhonsale, M.E. Polańska, F. Logist (Eds.). Systems and Control Transactions, 4 , 1282-1287. https://doi.org/10.69997/sct.192104
10. Applied to
MIDDoE has been applied in various EU reports, research works and projects, e.g. :
- [1]. Tabrizi, Z., Rodriguez, C., Barbera, E., Leal da Silva, W.R., & Bezzo, F. (2025).
Wet Carbonation of Industrial Recycled Concrete Fines: Experimental Study and Reaction Kinetic Modeling. Ind Eng Chem Res, 64, 45, 21412–21425. https://doi.org/10.1021/acs.iecr.5c02835
11. Documentation Contents
12. Resources
Issues & Support: https://github.com/zuhairblr/middoe/issues
Case Studies: https://github.com/zuhairblr/middoe/tree/main/tests/paper