MIDDoE: Model-(based) Identification, Discrimination, and Design of Experiments

MIDDoE is an open-source Python package for systematic model identification workflows. It provides an integrated, physics-aware framework that combines Global Sensitivity Analysis (GSA), Estimability Analysis (EA), Parameter Estimation, Uncertainty Quantification, Model-Based Design of Experiments (MBDoE), and cross validation for dynamic systems.

MIDDoE unifies the complete identification pipeline from pre-experimental diagnostics through post-analysis reporting—suitable for experimentalists and engineers with limited programming expertise.

1. Installation

Install via pip:

pip install middoe

MIDDoE requires Python ≥ 3.9 and automatically installs dependencies: NumPy, SciPy, Pandas, Matplotlib, and Pymoo.

2. Mathematical Framework

MIDDoE operates on lumped dynamic systems governed by differential-algebraic equations:

2.1 MIMO model structure

\[\begin{split}\begin{align} \dot{\mathbf{x}}(t) &= \mathbf{f}(\mathbf{x}, \mathbf{z}, \mathbf{u}(t), \mathbf{w}, \boldsymbol{\theta})\\ \mathbf{0} &= \mathbf{g}(\mathbf{x}, \mathbf{z}, \mathbf{u}(t), \mathbf{w}, \boldsymbol{\theta})\\ \mathbf{y}(t) &= \mathbf{h}(\mathbf{x}, \mathbf{z}, \mathbf{u}(t), \mathbf{w}, \boldsymbol{\theta}) + \boldsymbol{\varepsilon}(t) \end{align}\end{split}\]

Variables and parameters:

\(\boldsymbol{\theta} \in \mathbb{R}^{N}\): model parameters to be estimated
\(\mathbf{x}(t) \in \mathbb{R}^{N_x}\): time-variant differential states
\(\mathbf{z}(t) \in \mathbb{R}^{N_z}\): time-variant algebraic states
\(\mathbf{u}(t) \in \mathbb{R}^{N_u}\): time-variant controls (manipulated variables)
\(\mathbf{w} \in \mathbb{R}^{N_w}\): time-invariant controls (design variables, initial conditions)
\(\mathbf{y}(t) \in \mathbb{R}^{N_r}\): measured outputs
\(\boldsymbol{\varepsilon}(t)\): measurement error (noise)

2.2 Sobol Sensitivity Analysis

Two sample matrices \(\mathbf{A}, \mathbf{B} \in [0,1]^{N \times d}\) are generated from the scrambled Sobol sequence, where \(N\) is the number of quasi-random base samples. For each variable \(i \in \{1,...,d\}\) a hybrid matrix \(\mathbf{AB}_i\) is generated by replacing the \(i\)-th column of \(\mathbf{A}\) with the corresponding column from matrix \(\mathbf{B}\), keeping the remaining columns unchanged. The total number of model evaluations required is:

\[N_{\text{eval}} = N(2d + 2)\]

This includes model evaluations of all \(N\) samples of matrices \(\mathbf{A}\), \(\mathbf{B}\), and each \(\mathbf{AB}_i\). First-order Sobol indices, measuring the main effect of each input on output variance, are estimated using Jansen’s estimator:

\[s_i = 1 - \frac{1}{2N \text{Var}(Y)} \sum_{k=1}^{N} [M(\mathbf{B}_k) - M(\mathbf{AB}_i^k)]^2\]

Total-order Sobol indices, which include all interactions involving input \(i\), are computed using the Jansen estimator:

\[S_{Ti} = \frac{1}{2N \text{Var}(Y)} \sum_{k=1}^{N} [M(\mathbf{A}_k) - M(\mathbf{AB}_i^k)]^2\]

where \(\text{Var}(Y)\) represents the total variance of the model output, estimated from the pooled responses across all sample matrices.

2.3 Fisher Information Matrix

The local sensitivity matrix \(\mathbf{Q}_r\) for each measured response \(r\) is defined as:

\[\begin{split}\mathbf{Q}_r = \begin{bmatrix} \frac{\partial y_r(\theta_1, t_1)}{\partial \theta_1} & \cdots & \frac{\partial y_r(\theta_N, t_1)}{\partial \theta_N}\\ \vdots & \ddots & \vdots \\ \frac{\partial y_r(\theta_1, t_{N_{sp}})}{\partial \theta_1} & \cdots & \frac{\partial y_r(\theta_N, t_{N_{sp}})}{\partial \theta_N} \end{bmatrix}\end{split}\]

where \(\hat{y}_r(\theta, t)\) denotes model predictions, \(N_{sp}\) is the number of sampling points, and \(N_{\theta}\) is the number of active parameters. This matrix enables construction of the Fisher information matrix \(\mathbf{I}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\) as:

\[\mathbf{I}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}) = \sum_{r} \sum_{r'} \sigma_{r,r'}^{-2} \mathbf{Q}_r^T \mathbf{Q}_r + \boldsymbol{\Sigma}_{\boldsymbol{\theta}}^{-1}\]

where prior information of parameters can be neglected (i.e., \(\boldsymbol{\Sigma}_{\boldsymbol{\theta}}^{-1} = 0\)), and \(\sigma_{r,r'}\) are elements from the inverse of the variance-covariance matrix of measurements errors. If the condition number of the Fisher matrix, defined as \(\kappa(\mathbf{I})\), exceeds a critical threshold (typically \(10^3\)), the model is considered sloppy and if this number is extremely high, the matrix is ill-conditioned.

2.4 Confidence Intervals and t-Tests

The resulting covariance matrix \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\) is used to compute Confidence Interval \(CI_i\) and t-value of estimated parameter \(i\) at a confidence level of \(1 - \alpha\) (95%):

\[CI_i = \sqrt{v_{\theta,ii}} \cdot t_{1 - \alpha/2, \text{DoF}} \quad \text{for } i = 1, ..., N_{\theta}\]

\[t_i = \frac{\hat{\theta}_i}{\sqrt{v_{\theta,ii}}}, \quad i = 1, ..., N_{\theta}\]

where \(v_{\theta,ii}\) is the \(i\)-th diagonal element of \(\mathbf{V}\), and \(t_{1 - \alpha/2, \text{DoF}}\) is the upper \(1 - \alpha/2\) critical value of a Student’s t-distribution with \(\text{DoF} = N_{sp} - N_{\theta}\) degrees of freedom. The t-test provides a quantitative measure of estimation precision by statistically assessing whether each estimated parameter \(\hat{\theta}_i\) is significantly different from the parameter noise with a zero mean. This is verified by checking whether the computed \(t_i\)-value exceeds the reference threshold \(t_{\text{ref}}\), corresponding to a Student t-value at the same confidence level.

2.5 MBDoE-MD: T-Optimality Criteria

The general form of the T-optimal design problem is expressed as:

\[\boldsymbol{\xi}_{\text{OPT}} = \arg \max_{\boldsymbol{\xi} \in \Omega} \sum_{k=1}^{N_{sp}} \sum_{l=1}^{N_m} \sum_{l'=l+1}^{N_m} T_{l,l'}(\boldsymbol{\xi}, \boldsymbol{\theta}_l, \boldsymbol{\theta}_{l'}, t_k)\]

\[T_{l,l'}(\boldsymbol{\xi}, \boldsymbol{\theta}_l, \boldsymbol{\theta}_{l'}, t) = [\mathbf{y}_l(\boldsymbol{\xi}, t) - \mathbf{y}_{l'}(\boldsymbol{\xi}, t)]^T \mathbf{F}_{l,l'}(t) [\mathbf{y}_l(\boldsymbol{\xi}, t) - \mathbf{y}_{l'}(\boldsymbol{\xi}, t)]\]

where \(N_m\) denotes the number of competing models, \(\boldsymbol{\theta}_l\) and \(\boldsymbol{\theta}_{l'}\) are the respective preliminary estimated parameter vectors for models \(l\) and \(l'\), \(\Omega\) defines the feasible design space, and \(\mathbf{F}_{l,l'}(t)\) is a weight factor based on the uncertainties in observation and parameter estimation at different sampling times (which is 1 for HR). For the BFF, \(\mathbf{F}_{l,l'}(t)\) incorporates model sensitivity and covariance terms, and is defined at each sampling time \(t\) as:

\[\mathbf{F}_{l,l'}(t) = [\boldsymbol{\Sigma}_y + \mathbf{W}_l(t) + \mathbf{W}_{l'}(t)]^{-1}\]

where \(\boldsymbol{\Sigma}_y\) is the variance-covariance matrix of measurement errors, and \(\mathbf{W}_l(t)\) and \(\mathbf{W}_{l'}(t)\) are representing the modeling error contributions, derived from the sensitivity of each model to its parameters. Each \(\mathbf{W}_l(t)\) is constructed as:

\[\mathbf{W}_l(t) = \mathbf{Q}_l(t) \boldsymbol{\Sigma}_{\boldsymbol{\theta}_l}^{-1} \mathbf{Q}_l^T(t)\]

where \(\mathbf{Q}_l(t)\) is the sensitivity matrix of model \(l\) with respect to its estimated parameters \(\boldsymbol{\theta}_l\) from the preliminary estimations, evaluated at time \(t\):

\[\begin{split}\mathbf{Q}_l(t) = \begin{bmatrix} \frac{\partial y_{l,1}(t, \boldsymbol{\theta}_{l})}{\partial \theta_{l,1}} & \cdots & \frac{\partial y_{l,1}(t, \boldsymbol{\theta}_{l})}{\partial \theta_{l,N}}\\ \vdots & \ddots & \vdots \\ \frac{\partial y_{l,M}(t, \boldsymbol{\theta}_{l})}{\partial \theta_{l,1}} & \cdots & \frac{\partial y_{l,M}(t, \boldsymbol{\theta}_{l})}{\partial \theta_{l,N}} \end{bmatrix}\end{split}\]

2.6 MBDoE-PP: Alphabetical Criteria

MBDoE-PP aims to minimise parameter uncertainty by reducing the size of the confidence hyperellipsoid through optimisation of a scalar metric \(\Phi(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\). The MBDoE-PP problem is formulated as:

\[\boldsymbol{\xi}_{\text{OPT}} = \arg \min_{\boldsymbol{\xi} \in \Omega} \Phi(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\]

MBDoE-PP alphabetical criteria
Criterion	Definition	Mathematical form
D	Reducing the confidence region size by minimising the determinant of the \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\)	\(\Phi_D = \det(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\)
A	Lowering the average variance across all parameter estimates by minimising the trace of \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\)	\(\Phi_A = \text{tr}(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\)
E	Reducing the variance of the most uncertain parameter by minimising the biggest eigenvalue of \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\)	\(\Phi_E = \lambda_{\max}(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\)
ME	Ensure uniform precision across parameters by minimises the condition number of the \(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w})\)	\(\Phi_{ME} = \kappa(\mathbf{V}(\boldsymbol{\theta}, \mathbf{U}, \mathbf{w}))\)

2.7 Estimability Analysis

A scaled sensitivity matrix \(\mathbf{Q}^S \in \mathbb{R}^{N_{tsp} \times N_{\theta}}\) is constructed, where \(N_{tsp}\) is the total number of responses samples (i.e., the sum of \(N_{sp}\) across all measured outputs), and \(N_{\theta}\) is the number of model parameters. Each element of \(\mathbf{Q}^S\), denoted as \(q_{ij}^S\), is computed by scaling the local sensitivities as:

\[q_{ij}^S = \frac{\partial y_i}{\partial \theta_j} \cdot \frac{\sigma_{\theta_j}}{\sigma_{y_i}}\]

where \(\frac{\partial y_i}{\partial \theta_j}\) is the local sensitivity of the \(i\)-th model output to the \(j\)-th parameter, \(\sigma_{y_i}\) and \(\sigma_{\theta_j}\) are the standard deviations of the measurement error for \(i\)-th model output and the prior uncertainty of \(j\)-th model parameter. An iterative deflation algorithm is then applied to rank parameters. Once the parameters are ranked, the optimal subset is selected by estimating nested top-\(k\) subsets for \(k \in \{1,2,...,N_{\theta} - 1\}\) and evaluating their WLS performance. For each \(k\), compute the critical ratio \(r_{C,k}\):

\[r_{C,k} = \frac{J_k - J_{N_{\theta}}}{N_{\theta} - k}\]

where \(J_{N_{\theta}}\) is the objective value for the full model in which all parameters are estimated. To account for statistical bias and sample size effects, compute the corrected form of the critical ratio \(r_{CC,k}\):

\[r_{CC,k} = \frac{N_{\theta} - k}{N_{tsp}} \max\left( r_{C,k} - 1, \frac{2r_{C,k}}{(N_{\theta}-k)^2} - 1 \right)\]

Determine the optimal number of parameters \(k^*\) to estimate by identifying the value of \(k\) that minimises \(r_{CC,k}\).

2.8 P-Test for Model Selection

After calibration of the candidate models, P-test quantifies the relative likelihood that each candidate model provides the best fit to the observed data. This method assigns a probability score \(P_i\) to each model \(i\), reflecting its relative quality based on goodness-of-fit metrics. The probability of the \(i\)-th model being the best among \(N_m\) candidates is defined as:

\[P_i = \left( 1 - \frac{\exp(-\chi_i^2/2)}{\sum_{i=1}^{N_M} \exp(-\chi_i^2/2)} \right) \times 100\]

where \(\chi_i^2\) values are obtained after parameter estimation for each model. Models with lower \(\chi^2\) values are considered better descriptors of the system. A higher \(P_i\) indicates stronger evidence in favour of model \(i\).

3. Workflow Conceptual Steps

A complete MIDDoE identification workflow for Basic user follows this sequence:

I. System and model establishment

System structure Define the experimental and operational space in the system dictionary, including all time-variant inputs, time-invariant inputs, measured outputs, and their physical/operational constraints (bounds, sampling limits, dead-times, CVP structure).
Candidate models Implement candidate models as black-box simulators and register them in the models dictionary, specifying parameter vectors, feasible bounds, estimation masks (mutation), and the model interface type (e.g. 'pys', 'pym', function name, or 'gpr').
Global Sensitivity Analysis Configure the gsa dictionary and call sensa() from middoe.sc_sensa to analyse the influence of parameters and/or controls on the outputs, providing a ranking that guides model reduction and design choices.

II. Data creation

Preliminary data Either create a data.xlsx file in the project repository containing your experimental measurements, or define the insilicos dictionary (true model, true parameters, noise type) and generate synthetic data by calling expera() from middoe.krnl_expera.
MBDoE for discrimination Set up the des_opt dictionary for model discrimination (md_ob set to 'HR' or 'BFF') and run mbdoe_md() from middoe.des_md to design experiments that maximally separate the predictions of rival models.
MBDoE for precision Configure des_opt for parameter precision (pp_ob set to 'D', 'A', 'E', or 'ME') and run mbdoe_pp() from middoe.des_pp to obtain experiments that minimise parameter uncertainty for the selected model(s).
Append data After executing the designed experiments (in the lab or in-silico via expera()), update data.xlsx or your in-memory data structure so that all newly collected experiments are available for calibration.

III. Model identification

Parameter estimation With the current dataset and model definitions, run parmest() from middoe.iden_parmest using the iden_opt dictionary to estimate parameters, compute goodness-of-fit metrics, and obtain convergence information.
Uncertainty analysis Pass the estimation results to uncert() from middoe.iden_uncert to evaluate variance–covariance matrices, confidence intervals, t-values, and prediction metrics, characterising the precision of the estimated parameters.
Estimability analysis Run estima() from middoe.sc_estima to rank parameters by estimability and determine the optimal subset that can be reliably estimated with the current data, informing which parameters should remain active or be fixed in subsequent rounds.
Round storage Treat the complete sequence (design → experiment → estimation → uncertainty → estimability) as the i-th round and save it into a round_data dictionary by calling save_rounds() from middoe.log_utils.
Sequential rounds Repeat steps 4–11, updating experimental designs, data, and parameter masks round by round until the desired discrimination and calibration performance indicators (e.g. model selection metrics, confidence intervals, t-tests) are achieved.

IV. Post analysis

Model validation Perform cross-validation over the full dataset by calling validation() from middoe.iden_valida, obtaining calibration/validation statistics (e.g. per-response and global (R^2), residual analysis) for the final selected model(s).
Save results Persist all identification and design results to disk by calling save_to_jac() from middoe.log_utils (e.g. with purposes such as "iden" or "sensa" for later reuse).
Load results Reload previously saved workflows and analysis outputs by calling load_from_jac() from middoe.log_utils, enabling further inspection, reporting, or continuation of the workflow.
Post-processing Generate global reports and visualisations by calling run_postprocessing() from middoe.iden_utils on the stored round_data, producing plots (parameter trajectories, confidence ellipsoids, p/t-tests, estimability evolution) and Excel summaries for documentation and decision-making.

4. Project structure

Organise your project as follows:

my_project/
├── data.xlsx                           # Your experimental measurements
│
├── model1.py                           # Python file defining your model 1
├── model2.py                           # Python file defining your model 2
├── modeli.py                           # Python file defining your model i
│
└── workflow.py/                        # analysis and dentification workflow

5. Data structure

MIDDoE reads data from a data.xlsx file in the project repository, Each sheet in data.xlsx represents one batch/experiment and contains measured outputs (MES), time-invariant controls \(\mathbf{w}\), and time-variant controls \(\mathbf{u}(t)\) in a single table.

MES_X:y1	MES_Y:y1	MES_E:y1	MES_X:y2	MES_Y:y2	MES_E:y2	X:all	w1	u1
\(t_{\mathrm{sp},1}^{(y_1)}\)	\(y_1\!\big(t_{\mathrm{sp},1}^{(y_1)}\big)\)	\(\sigma_{y_1}\)	\(t_{\mathrm{sp},1}^{(y_2)}\)	\(y_2\!\big(t_{\mathrm{sp},1}^{(y_2)}\big)\)	\(\sigma_{y_2}\)	\(t_{\mathrm{all},1}\)	\(w_1\)	\(u_1\!\big(t_{\mathrm{all},1}\big)\)
\(t_{\mathrm{sp},2}^{(y_1)}\)	\(y_1\!\big(t_{\mathrm{sp},2}^{(y_1)}\big)\)	\(\sigma_{y_1}\)	\(t_{\mathrm{sp},2}^{(y_2)}\)	\(y_2\!\big(t_{\mathrm{sp},2}^{(y_2)}\big)\)	\(\sigma_{y_2}\)	\(t_{\mathrm{all},2}\)	\(w_1\)	\(u_1\!\big(t_{\mathrm{all},2}\big)\)
\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)	\(w_1\)	\(\vdots\)

``MES_X:y1``, ``MES_X:y2``, …, ``MES_X:yi``: Sampling times \(t_{\mathrm{sp}}\) used for measurements of \(y_1\), \(y_2\), …, \(y_i\) (system measured response sampled times).
``MES_Y:y1``, ``MES_Y:y2``, …, ``MES_Y:yi``: Measured responses \(y_r(t_{\mathrm{sp}})\) for measurements of \(y_1\), \(y_2\), …, \(y_i\) (system measured responses), column names must match system['tvo'] keys.
``MES_E:y1``, ``MES_E:y2``, …, ``MES_E:yi``: Measurement standard deviations \(\sigma_{y_r}\).
``X:all``: Global time grid \(t_{\mathrm{all}}\) on which the time-variant inputs \(u_i(t)\) are tabulated (should match the solver precision, stepped by system['t_r']).
``w1``, ``w2``, … : Time-invariant controls \(w_j\), constant over the experiment; column names must match system['tii'] keys.
``u1``, ``u2``, … : Time-variant control profiles \(u_i(t_{\mathrm{all}})\); column names must match system['tvi'] keys so that MIDDoE can reconstruct the control trajectories from X:all and u_i.

6. model.py structure

MIDDoE can unboard and tag models in several ways:

Type	Configuration	When to use
External Python script	`'krt': {'M': 'pys'}, 'src': {'M': 'models/my_model.py'}`	You have your own ODE solver or external simulator (Suggested over global method and suggested for interfacing external tools)
Built-in models	`'krt': {'M': 'pym'}`	For testing; MIDDoE provides basic test models (typically solid-fluid reaction models)
Global Python function	`'krt': {'M': 'my_solve_function'}`	Function already defined in your script, in a global space.
gPROMS coupling	`'krt': {'M': 'gpr'}, 'creds': {'M': 'user:pass@server'}`	gPAS file created by gPROMS, to couple with gPROMS models.

In general it treats models as a black box and requires a standard model interface. Create a Python file with a function following this signature (‘pys’) approach:

def solve_model(t, y0, tii, tvi, theta):
    """
    Standard MIDDoE model interface.

    Parameters
    ----------
    t : list
        Time vector for evaluation.
    y0 : dict
        Initial conditions for differential states.
        Keys are state names, values are initial values.
        Example: {'y1': 0.0, 'y2': 0.5}

    tii : dict
        Time-invariant inputs (design variables, initial concentrations).
        Example: {'w1': 0.5, 'w2': 1.0}

    tvi : dict
        Time-variant inputs (controls as functions of time).
        Each key maps to a list matching time vector length.
        Example: {'u1': [296.15, 297.5, 298.0, ...]}

    theta : list
        Model parameters to be estimated.
        Example: [50000, 75000, 0.4116, 111900, 9905, 30000]

    Returns
    -------
    dict
        Dictionary with response trajectories.
        Keys match measured output names.
        Values are lists matching time vector length.
        Example: {'y1': [0.0, 0.01, 0.02, ...], 'y2': [0.0, 0.005, 0.010, ...]}
    """

    # Example: ODE integration using SciPy
    from scipy.integrate import odeint

    # Extract parameters
    k1, k2, k3, k4, k5, k6 = theta

    # Define differential equations
    def system_odes(state, t_point, u_t, w):
        C_A, C_B = state
        # Reaction kinetics (example)
        u_value = u_t(t_point)  # Get control value at time t_point
        dCA_dt = -k1 * C_A + k2 * w['w1']
        dCB_dt = k1 * C_A - k3 * C_B
        return [dCA_dt, dCB_dt]

    # Interpolate time-variant controls
    from scipy.interpolate import interp1d
    u1_func = interp1d(t, tvi['u1'], kind='linear', fill_value='extrapolate')

    # Integrate
    y0_list = [y0.get('CA', 0.0), y0.get('CB', 0.0)]
    solution = odeint(system_odes, y0_list, t, args=(u1_func, tii))

    # Return in MIDDoE format
    return {
        'y1': solution[:, 0].tolist(),
        'y2': solution[:, 1].tolist()
    }

7. Configuration Dictionaries

system Dictionary Definition

The system dictionary is a central hub for all MIDDoE operations. It specify system characteristics.

system = {
    # ========== TIME CONFIGURATION ==========
    't_s': [0, 16],        # Time span: [start, end]
    't_r': 0.02,           # Time resolution: every 0.02
                           # Smaller → better accuracy but slower (it must match with the global time grid in data.xlsx
    't_d': 0.3,            # Dead-time fraction: exclude first 30% and last 30%
                           # for controls and sampling (typical batch startup/cooldown)

    # ========== TIME-VARIANT INPUTS: u(t) ==========
    'tvi': {
        'u1': {  # Example: Temperature profile
            'min': 296.15,          # Lower bound (K)
            'max': 306.15,          # Upper bound (K)
            'stps': 6,              # Number of CVP segments (switching points)
                                    # stps=6 → 5 segments
            'cvp': 'LPF',           # Control profile type:
                                    # 'CPF' = constant piecewise (steps)
                                    # 'LPF' = linear piecewise (ramps)
            'const': 'inc',         # Constraint on signal shape:
                                    # 'rel' = no constraint (relaxed)
                                    # 'inc' = monotonically increasing
                                    # 'dec' = monotonically decreasing
            'offl': 0.01,           # Min level change (fraction): 1% per segment
            'offt': 0.3             # Min time spacing (fraction): 30% of time span
        },
    },

    # ========== TIME-VARIANT OUTPUTS: y(t) ==========
    'tvo': {
        'y1': {  # Example: Product concentration
            'init': 0,              # Initial value (can be number or 'variable')
                                    # 'variable' = design decision (in tii)
            'meas': True,           # Measurable? (True) or virtual output (False)
            'sp': 17,               # Sampling points: max 17 measurements per batch
            'unc': 0.005,           # Measurement uncertainty (std dev): ±0.005 units (homoskedastic error regime)
            'offt': 0.3,            # Min sampling interval (fraction): 30% spacing
            'samp_s': 1,            # Synchronisation group: 1 = sync with other samp_s:1
            'samp_f': [0, 16]       # Forced sampling times: always sample at t=0 and t=16
        },
        'y2': {
            'init': 0,
            'meas': True,
            'sp': 17,
            'unc': 0.005,
            'offt': 0.3,
            'samp_s': 1,            # Same group as y1 → synchronized sampling
            'samp_f': [0, 16]
        },
        'y3': {
            'init': 0,
            'meas': True,
            'sp': 17,
            'unc': 0.005,
            'offt': 0.3,
            'samp_s': 1,
            'samp_f': [0, 16]
        },
    },

    # ========== TIME-INVARIANT INPUTS: w ==========
    'tii': {
        'y10': {  # Example: Initial concentration of reactant
            'min': 0.3,
            'max': 1.0
        },
        'y20': {  # Example: Initial concentration of catalyst
            'min': 0.19,
            'max': 1.0
        },
        'y20': {  # Example: Initial concentration of catalyst
            'min': 0.19,
            'max': 1.0
        },
    },

    # ========== TIME-INVARIANT OUTPUTS ==========
    'tio': {}  # Empty for this example; used for steady-state responses
}

Model Dictionary Definition

The model dictionary is a central hub for all MIDDoE operations. It specify model characteristics.

# Parameter vectors for in-silico validation (true values, nominal, and bounds)
theta      = [50000, 75000, 0.4116, 111900, 9905, 30000]   # Ground truth (insilico studies)
theta_n    = [100000, 100000, 1, 100000, 100, 10000]       # Initial guess
theta_mins = [10000, 0, 0.1, 50000, 10, 10000]             # Lower bounds
theta_maxs = [1000000, 200000, 10, 200000, 10000, 200000]  # Upper bounds

models = {
    # ========== ACTIVE CANDIDATE MODELS ==========
    'can_m': ['M'],  # List of model names to use
                     # Example with multiple models: ['M1', 'M2', 'M3'] in case of need for model discrimination

    # ========== MODEL INTERFACE TYPE ==========
    'krt': {
        'M': 'pys'  # 'pys' = external Python script
                    # 'pym' = built-in MIDDoE models
                    # 'gpr' = gPROMS coupling
                    # 'function_name' = global function in namespace
    },

    # ========== MODEL SOURCE PATH ==========
    'src': {
        'M': r'C:/.../model.py'
            # Only needed for 'pys' and 'gpr' type
            # Use raw string (r'...') for Windows paths
    },

    # ========== CREDENTIALS FOR gPROMS ==========
    'creds': {
        'M': ''  # Empty for non-gPROMS models
    },

    # ========== PARAMETER VECTORS AND BOUNDS ==========
    'theta': {  # Starting/nominal values
        'M': theta_n
    },
    't_l': {    # Lower bounds
        'M': theta_mins
    },
    't_u': {    # Upper bounds
        'M': theta_maxs
    },

    # ========== ESTIMATION MASK ==========
    'mutation': {
        'M': [False, False, True, True, True, True]
            # False = fix at nominal value (don't estimate)
            # True = estimate from data
            # Use False for poorly identifiable parameters identified by EA
    }
}

insilicos Dictionary Definition

The insilicos dictionary is a settings dictionary to perform insilico experiment data generation.

# In-silico experimenter (used for synthetic data generation)
insilicos = {
    'tr_m': 'M',        # True model (ground truth)
    'theta': theta,     # True parameter vector
    'errt': 'abs',      # Error type:
                        # 'abs' = absolute noise
                        # 'rel' = relative (proportional to signal)
    'prels': {}         # Classic pre-defined designs (empty here) (iso-valued controls)
}

GSA Dictionary Definition

The gsa dictionary is a settings dictionary to perform Global Sensitivity Analysis with Sobol method.

# Global Sensitivity Analysis configuration gsa = {

‘var_s’: False, # Sensitivity w.r.t. controls? (u, w) ‘par_s’: True, # Sensitivity w.r.t. parameters? (θ) ‘var_d’: False, # Control sampling space:

# False = system-defined bounds # float > 1 (e.g., 1.1) = ±10% around nominal

‘par_d’: False, # Parameter sampling space:
# False = models-defined bounds

‘samp’: 2**12, # Sobol sample size: use powers of 2
# 2^12 = 4096 base samples # Total evaluations ≈ (2·N_vars + 2) × base

‘multi’: 0.7, # Parallel execution: 0.7 = use 70% of CPU cores ‘tii_n’: [0.508, 0.3925], # Nominal w values ‘tvi_n’: [301.15], # Nominal u values ‘plt’: True # Generate sensitivity plots?

}

MBDoE Dictionary Definition

The des_opt dictionary is a settings dictionary to conduct MBDoE-MD adn MBDoE-PP optimisations.

des_opt = {
    # ========== FINITE DIFFERENCE SETUP ==========
    'eps': 1e-5,   # FDM perturbation size (normalised space)
                   # Smaller → better accuracy, larger → robustness
                   # None = auto mesh-independency test
                   # Typical: 1e-5 to 1e-3

    # ========== MBDoE-MD OBJECTIVE (Model Discrimination) ==========
    'md_ob': 'BFF', # T-optimality criterion:
                    # 'HR' = Hunter–Reiner (focuses on divergence)
                    # 'BFF' = Buzzi–Ferraris–Forzatti (includes error weighting)

    # ========== MBDoE-PP OBJECTIVE (Parameter Precision) ==========
    'pp_ob': 'D',   # Alphabetical optimality criterion:
                    # 'D' = D-optimality (minimize det of covariance volume)
                    # 'A' = A-optimality (minimize average variance)
                    # 'E' = E-optimality (minimize maximum variance)
                    # 'ME' = Modified E (promote uniform precision)

    # ========== OPTIMISATION METHOD ==========
    'meth': 'PS',   # Solver type:
                    # 'PS' = Pattern Search (local, derivative-free)
                    # 'DE' = Differential Evolution (global)
                    # 'DEPS' = DE + PS (hybrid: global then local)

    # ========== ITERATION SETTINGS ==========
    'itr': {
        'pps': 100,     # Population size (for DE/PS)
        'maxmd': 5,     # Max iterations for MD objective
        'tolmd': 1,     # MD convergence tolerance
        'maxpp': 100,   # Max iterations for PP objective
        'tolpp': 1      # PP convergence tolerance
    },

    'plt': True     # Generate design plots?
}

Calibration Dictionary Definition

The iden_opt dictionary is a settings dictionary to conduct parameter estimation and uncertainty analysis.

iden_opt = {
    # ========== OPTIMISATION SOLVER ==========
    'meth': 'LBFGSB',   # Parameter estimation method:
                        # 'LBFGSB' = Limited-memory BFGS (local, fast)
                        # 'SLSQP' = Sequential Least Squares (local, constrained)
                        # 'DE' = Differential Evolution (global, slow)
                        # 'NMS' = Nelder–Mead Simplex (local, derivative-free)
                        # 'BFGS' = Standard BFGS (local)
                        # 'TC' = Trust-region Constrained (local, smooth)

    'ms': True,         # Multi-start? (True = try multiple random starts)
    'maxit': 500,       # Maximum iterations per solver attempt
    'tol': 0.1,         # Convergence tolerance

    # ========== FINITE DIFFERENCE METHOD ==========
    'sens_m': 'central',# FDM scheme:
                        # 'central' = central differences (2nd order, more accurate)
                        # 'forward' = forward differences (1st order)
    'eps': 1e-5,        # FDM perturbation size
                        # None = auto mesh-independency test

    # ========== OBJECTIVE FUNCTION ==========
    'ob': 'WLS',        # Cost function:
                        # 'LS' = Least Squares
                        # 'WLS' = Weighted LS (accounts for measurement uncertainty)
                        # 'MLE' = Maximum Likelihood (if noise is known)
                        # 'CS' = Chi-Square

    # ========== UNCERTAINTY QUANTIFICATION ==========
    'var-cov': 'B',     # Variance-covariance method:
                        # 'H' = Hessian-based (fast, local, assumes linearity)
                        # 'J' = Jacobian-based (faster)
                        # 'B' = Bootstrap (robust, global, slow)
    'nboot': 50,        # Bootstrap samples (if var-cov='B')

    # ========== INITIALISATION ==========
    'init': None,       # Starting point:
                        # None = use theta from models dict
                        # 'rand' = random initialization

    # ========== VISUALISATION ==========
    'c_plt': False,     # Confidence ellipsoid plots?
    'f_plt': True,      # Fitting comparison plots?
    'plt_s': True,      # Show plots while saving?
    'log': False        # Verbose logging?
}

8. Practical Workflow

This section demonstrates the execution sequence for a complete MIDDoE insilico identification workflow. All configuration dictionaries (system, models, gsa, insilicos, des_opt, iden_opt) are assumed to be defined as shown in Section 7.

8.1 Import Required Modules

from middoe.sc_sensa import sensa
from middoe.des_pp import mbdoe_pp
from middoe.des_md import mbdoe_md
from middoe.krnl_expera import expera
from middoe.iden_parmest import parmest
from middoe.iden_uncert import uncert
from middoe.sc_estima import estima
from middoe.iden_valida import validation
from middoe.log_utils import save_rounds, save_to_jac, load_from_jac, save_to_xlsx
from middoe.iden_utils import run_postprocessing

8.2 Global Sensitivity Analysis (Optional)

Identify influential parameters before designing experiments:

# Run Sobol sensitivity analysis
sobol_results = sensa(gsa, models, system)

# Save results
save_to_jac(sobol_results, purpose="sensa")

# Export to Excel
results = load_from_jac()
sensa_data = results['sensa']
save_to_xlsx(sensa_data)

Output: First-order and total-order Sobol indices for each parameter, plus plots if gsa['plt'] = True.

8.3 Round 1: Initial Design and Estimation

Step 1: Design the first experiment

round = 1
designs = mbdoe_pp(des_opt, system, models, round=1, num_parallel_runs=16)

Returns: Optimal control profiles (u(t)), time-invariant inputs (w), and sampling times that minimize parameter uncertainty according to the selected criterion (des_opt['pp_ob']).

Step 2: Generate data

# For insilico studies:
expera(system, models, insilicos, designs, expr=1, swps=designs['swps'])

# For real experiments: manually add data to data.xlsx following the design

If you don’t add the swps from MBDoE designs, based on the expr number, it executes the insilicos[‘prels’][expr] iso-value insilico experiments.

Result: New sheet in data.xlsx containing measurements for experiment 1.

Step 3: Estimate parameters

resultpr = parmest(system, models, iden_opt, case='strov')

None (default): Standard estimation, uses models['theta'] as starting values and fixed parameter fallback.
'freeze': Keep existing mutation masks unchanged for sequential estimation (preserves which parameters are active/frozen between rounds).
'strov': **ST**art from **R**eference **O**ptimal **V**alues. Uses models['thetastart'] as starting values instead of models['theta']. Useful when continuing from a previous calibration round with improved initial guesses.

Returns: Estimated parameters \(\hat{\boldsymbol{\theta}}\), objective function value, convergence status, and computational time for each model in models['can_m'].

Step 4: Quantify uncertainty

uncert_results = uncert(resultpr, system, models, iden_opt)
resultun = uncert_results['results']
obs = uncert_results['obs']

Returns: Variance-covariance matrix \(\mathbf{V}\), confidence intervals \(CI_i\), t-values \(t_i\), correlation matrix, and bootstrap statistics.

Step 5: Evaluate estimability

ranking, k_optimal_value, rCC_values, J_k_values, best_uncert_result = \
    estima(resultun, system, models, iden_opt, round)

Returns: Parameter ranking by estimability, optimal number of estimable parameters \(k^*\), and corrected critical ratio curve \(r_{CC,k}\).

Step 6: Save round results

round_data = {}
save_rounds(round, resultun, 'preliminary', round_data, models, iden_opt, obs, system,
            ranking=ranking, k_optimal_value=k_optimal_value, rCC_values=rCC_values,
            J_k_values=J_k_values, best_uncert_result=best_uncert_result)

Result: Round 1 data stored in round_data dictionary for later analysis.

8.4 Sequential Rounds (2, 3, …, N)

Repeat the workflow to progressively reduce parameter uncertainty:

for round in range(2, 5):  # Example: rounds 2-4
    # Design next experiment
    designs = mbdoe_pp(des_opt, system, models, round=round, num_parallel_runs=16)

    # Collect data
    expera(system, models, insilicos, designs, expr=round, swps=designs['swps'])

    # Estimate with cumulative dataset
    resultpr = parmest(system, models, iden_opt, case='strov')

    # Quantify uncertainty
    uncert_results = uncert(resultpr, system, models, iden_opt)
    resultun = uncert_results['results']
    obs = uncert_results['obs']

    # Check estimability
    ranking, k_optimal_value, rCC_values, J_k_values, best_uncert_result = \
        estima(resultun, system, models, iden_opt, round)

    # Save round
    save_rounds(round, resultun, 'sequential', round_data, models, iden_opt, obs, system,
                ranking=ranking, k_optimal_value=k_optimal_value, rCC_values=rCC_values,
                J_k_values=J_k_values, best_uncert_result=best_uncert_result)

Optional: Update models['mutation'] between rounds to manually fix poorly estimable parameters instead of using estima().

Stopping criteria:

All t-values exceed threshold: \(t_i > t_{\text{ref}}\)

8.5 Model Discrimination (Multi-Model Case)

If multiple candidate models compete, design discriminating experiments:

# Configure for discrimination
des_opt['md_ob'] = 'BFF'  # or 'HR'

# Design experiment maximizing model separation
md_designs = mbdoe_md(des_opt, system, models, round=round)

# Execute and analyze
expera(system, models, insilicos, md_designs, expr=round, swps=md_designs['swps'])
resultpr = parmest(system, models, iden_opt, case='discrimination')

Model selection: Compare \(\chi^2\) values and P-test probabilities (Section 2.8) to identify the best model.

8.6 Cross-Validation

Validate the calibrated model across all experiments:

validres = validation(system, models, iden_opt, round_data)

Returns: Calibration vs. validation metrics (\(R^2_{\text{cal}}\), \(R^2_{\text{val}}\), \(\text{RMSE}_{\text{cal}}\), \(\text{RMSE}_{\text{val}}\)), residual plots, and leave-one-out statistics.

8.7 Save Final Results

Persist all workflow data:

# Save identification results
save_to_jac(round_data, purpose="iden")

Result: Binary file results/iden_YYYYMMDD_HHMMSS.jac containing all rounds.

8.8 Load and Post-Process

Reload saved data and generate reports:

# Load results
results = load_from_jac()
iden = results['iden']

# Generate comprehensive reports
run_postprocessing(
    round_data=iden,
    solvers=['M'],                    # Model identifiers
    selected_rounds=[1, 2, 3, 4],     # Rounds to analyze
    plot_global_p_and_t=True,         # Parameter evolution + t-tests
    plot_confidence_spaces=True,      # 2D confidence ellipses
    plot_p_and_t_tests=True,          # t-value bar charts
    export_excel_reports=True,        # Summary tables
    plot_estimability=True            # Estimability ranking evolution
)

Outputs:

results/figures/: Parameter trajectories, confidence regions, t-test evolution, fitting plots
results/reports/: Excel files with parameter estimates, uncertainties, correlations, and validation metrics

8.9 Complete Example: Discrimination to Calibration

A realistic workflow integrating model discrimination, candidate elimination, and sequential precision refinement:

from middoe.des_md import mbdoe_md
from middoe.des_pp import mbdoe_pp
from middoe.krnl_expera import expera
from middoe.iden_parmest import parmest
from middoe.iden_uncert import uncert
from middoe.sc_estima import estima
from middoe.iden_valida import validation
from middoe.log_utils import save_rounds, save_to_jac, load_from_jac
from middoe.iden_utils import run_postprocessing

# Assume system, models, insilicos, iden_opt, des_opt defined (Section 7)
# models['can_m'] = ['M1', 'M2', 'M3']  # Multiple competing models

round_data = {}
round = 1

# ========== PHASE 1: MODEL DISCRIMINATION ==========
while len(models['can_m']) > 1:
    # Design discriminating experiment
    des_opt['md_ob'] = 'BFF'  # Buzzi-Ferraris-Forzatti criterion
    md_designs = mbdoe_md(des_opt, system, models, round=round)

    # Execute experiment
    expera(system, models, insilicos, md_designs, expr=round, swps=md_designs['swps'])

    # Estimate parameters for all candidate models
    resultpr = parmest(system, models, iden_opt, case=None)

    # Uncertainty quantification
    uncert_results = uncert(resultpr, system, models, iden_opt)
    resultun = uncert_results['results']
    obs = uncert_results['obs']

    # Estimability analysis
    ranking, k_opt, rCC, J_k, best_res = estima(resultun, system, models, iden_opt, round)

    # Save discrimination round
    save_rounds(round, resultun, 'discrimination', round_data, models, iden_opt, obs, system,
                ranking=ranking, k_optimal_value=k_opt, rCC_values=rCC,
                J_k_values=J_k, best_uncert_result=best_res)

    # Compute P-test values for model selection
    import numpy as np
    chi2_values = {solver: resultun[solver]['chi2'] for solver in models['can_m']}
    P_values = {}
    denom = sum(np.exp(-chi2/2) for chi2 in chi2_values.values())
    for solver, chi2 in chi2_values.items():
        P_values[solver] = (1 - np.exp(-chi2/2) / denom) * 100

    print(f"\nRound {round} - P-test results:")
    for solver in models['can_m']:
        print(f"  {solver}: P = {P_values[solver]:.2f}%, χ² = {chi2_values[solver]:.4f}")

    # Remove models with low discrimination probability (P < 20%)
    threshold = 20.0
    models_to_remove = [s for s in models['can_m'] if P_values[s] < threshold]

    if models_to_remove:
        print(f"Removing models {models_to_remove} (P < {threshold}%)")
        for solver in models_to_remove:
            models['can_m'].remove(solver)

    # Exit discrimination if only one model remains or max rounds reached
    if len(models['can_m']) == 1 or round >= 5:
        print(f"\nDiscrimination complete. Selected model: {models['can_m'][0]}")
        break

    round += 1

# ========== PHASE 2: PARAMETER PRECISION REFINEMENT ==========
# Switch to parameter precision optimization
des_opt['pp_ob'] = 'D'  # D-optimal design for final calibration
selected_model = models['can_m'][0]  # Best model from discrimination

# Single precision round (or continue for multiple rounds if needed)
round += 1
designs = mbdoe_pp(des_opt, system, models, round=round, num_parallel_runs=16)
expera(system, models, insilicos, designs, expr=round, swps=designs['swps'])

# Re-estimate with discriminated model using previous optimal as starting point
models['thetastart'] = {selected_model: resultun[selected_model]['theta']}
resultpr = parmest(system, models, iden_opt, case='strov')

# Final uncertainty analysis
uncert_results = uncert(resultpr, system, models, iden_opt)
resultun = uncert_results['results']
obs = uncert_results['obs']

# Estimability check
ranking, k_opt, rCC, J_k, best_res = estima(resultun, system, models, iden_opt, round)

# Save precision round
save_rounds(round, resultun, 'precision', round_data, models, iden_opt, obs, system,
            ranking=ranking, k_optimal_value=k_opt, rCC_values=rCC,
            J_k_values=J_k, best_uncert_result=best_res)

# Check stopping criteria
t_values = resultun[selected_model]['t_values']
t_ref = resultun[selected_model]['t_ref']
CI_relative = resultun[selected_model]['CI'] / np.abs(resultun[selected_model]['theta'])

print(f"\nRound {round} - Convergence check:")
print(f"  All t-values > t_ref: {np.all(t_values > t_ref)}")
print(f"  Max relative CI: {np.max(CI_relative):.3f} (target < 0.1)")

# Continue precision rounds if needed
max_rounds = 8
while round < max_rounds:
    if np.all(t_values > t_ref) and np.max(CI_relative) < 0.1:
        print("Convergence criteria met. Stopping.")
        break

    round += 1
    designs = mbdoe_pp(des_opt, system, models, round=round, num_parallel_runs=16)
    expera(system, models, insilicos, designs, expr=round, swps=designs['swps'])

    resultpr = parmest(system, models, iden_opt, case='strov')
    uncert_results = uncert(resultpr, system, models, iden_opt)
    resultun, obs = uncert_results['results'], uncert_results['obs']
    ranking, k_opt, rCC, J_k, best_res = estima(resultun, system, models, iden_opt, round)

    save_rounds(round, resultun, 'sequential', round_data, models, iden_opt, obs, system,
                ranking=ranking, k_optimal_value=k_opt, rCC_values=rCC,
                J_k_values=J_k, best_uncert_result=best_res)

    # Update convergence metrics
    t_values = resultun[selected_model]['t_values']
    CI_relative = resultun[selected_model]['CI'] / np.abs(resultun[selected_model]['theta'])
    print(f"Round {round}: max CI_rel = {np.max(CI_relative):.3f}")

# ========== PHASE 3: VALIDATION AND REPORTING ==========
# Cross-validation
validres = validation(system, models, iden_opt, round_data)
print(f"\nValidation R²: {validres[selected_model]['R2_val']:.4f}")

# Save all results
save_to_jac(round_data, purpose="iden")

# Generate comprehensive reports
results = load_from_jac()
run_postprocessing(
    round_data=results['iden'],
    solvers=[selected_model],
    selected_rounds=list(range(1, round + 1)),
    plot_global_p_and_t=True,
    plot_confidence_spaces=True,
    plot_p_and_t_tests=True,
    export_excel_reports=True,
    plot_estimability=True
)

print(f"\n{'='*60}")
print(f"Workflow complete: {round} rounds executed")
print(f"Selected model: {selected_model}")
print(f"Final parameters: {resultun[selected_model]['theta']}")
print(f"{'='*60}")

Workflow summary:

Discrimination phase (rounds 1–N): MBDoE-MD designs maximize model separation. After each round, P-test identifies and removes poorly performing models (P < 20%).
Precision phase (rounds N+1–M): Once a single model remains, MBDoE-PP designs minimize parameter uncertainty using D-optimal criterion.
Validation: Cross-validation assesses generalization performance across all experiments.
Reporting: Automated generation of parameter trajectories, confidence regions, and Excel summaries.

Key decision points:

P-value threshold: Adjust threshold based on confidence level (typical: 15–25%).
Stopping criteria: Customize based on application requirements (t-values, relative CI, condition number).
Max rounds: Prevents infinite loops; adjust based on computational budget.

9. Citation

If you use MIDDoE in your research, please cite:

[1]. Tabrizi, Z., Barbera, E., Leal da Silva, W.R., & Bezzo, F. (2025).: MIDDoE: An MBDoE Python package for model identification, discrimination, and calibration. Digital Chemical Engineering, 17, 100276. https://doi.org/10.1016/j.dche.2025.100276
[2]. Tabrizi, S.Z.B., Barbera, E., Da Silva, W.R.L, & Bezzo, F. (2025).: A Python/Numpy-based package to support model discrimination and identification. J.F.M. Van Impe, G. Léonard, S.S. Bhonsale, M.E. Polańska, F. Logist (Eds.). Systems and Control Transactions, 4 , 1282-1287. https://doi.org/10.69997/sct.192104

10. Applied to

MIDDoE has been applied in various EU reports, research works and projects, e.g. :

[1]. Tabrizi, Z., Rodriguez, C., Barbera, E., Leal da Silva, W.R., & Bezzo, F. (2025).: Wet Carbonation of Industrial Recycled Concrete Fines: Experimental Study and Reaction Kinetic Modeling. Ind Eng Chem Res, 64, 45, 21412–21425. https://doi.org/10.1021/acs.iecr.5c02835

11. Documentation Contents

API Reference

12. Resources

GitHub: https://github.com/zuhairblr/middoe
PyPI: https://pypi.org/project/middoe/
Paper: https://doi.org/10.1016/j.dche.2025.100276
Issues & Support: https://github.com/zuhairblr/middoe/issues
Case Studies: https://github.com/zuhairblr/middoe/tree/main/tests/paper