Parameter Identification
(Bioprocess Model)

(Version: June 2022)

The application presented here demonstrates a strategy for the identification of parameters using the example of a simplified UpStream process for the production of a recombinant bio product. The objective of the parameter identification is the parameterisation of a mathematical bioprocess model, which represents the essential system behaviour of the bioprocess. In the considered case, the bioproduct is formed using a genetically modified organism. Subsequently, the parameterised bioprocess model can be used for analysis (e.g. sensitivity analysis) or optimisation (e.g. yield maximisation) of the existing bioprocess.

Application Example


The cultivation of the genetically modified organism $\color{#4daf4a}{X}$ (cf. E. coli) is carried out using an ideally mixed classical stirred tank reactor. The formation of the recombinant bioproduct $\color{#377eb8}{P}$ is induced by the addition of an activator $a$ (cf.  IPTG) at the time $t_{\text{a}}$ [1]. Here, biomass growth and product formation each compete for the same substrate$\color{#ff7f00}{S}$ (cf. glucose). At the end of the cultivation time $t_{\text{harvest}}$, the harvesting and processing of the formed bioproduct is carried out. Furthermore, it is known that high substrate concentrations $\color{#ff7f00}{S}$ inhibit biomass growth. Following the addition of the activator, a large fraction of the substrate is consumed in product formation. In addition, there is a noticeable inhibition of biomass growth due to the increasing product concentration. After depletion of the substrate, product formation stagnates. Likewise, the decay of biomass can be observed.

A sufficient experimental dataset and a suitable mathematical bioprocess model are prerequisites for carrying out the parameter identification. The dataset must represent the investigated system with at least all essential properties. For this purpose, a large number of experiments must be carried out in order to acquire data for the parameter identification. During the experiments, predetermined control variables are varied in a suitable combination. Control variables are those variables that allow an external manipulation of the respective system. In the example considered, the start biomass concentration $X_0$, the start substrate concentration $S_0$, the activation time of the product formation $t_{\text{a}}$ and the harvest time $t_{\text{harvest}}$ are determined as control variables.

Relevant experiments can be planned and carried out by means of experimental design including the variation of control variables. In the application example considered, the data basis is given by 9 experiments, which are visualised in the following figure (click to expand):

Measurement data

Furthermore, the required mathematical bioprocess model is known and given in the form of the following differential equation system [2,3]:

\begin{align} \dot{X}=&\,\mu(S,P)\cdot X -k_{\text{d}}\cdot X \label{eq:ode_cX} \\[10pt] \dot{S}=&\,-\dfrac{\mu(S,P)\cdot X}{Y_{\text{X,S}}} -\dfrac{\rho(S,a)\cdot X}{Y_{\text{P,S}}} - m \cdot X \label{eq:ode_cS} \\[10pt] \dot{P}=&\,\rho(S,a)\cdot X \label{eq:ode_cP} \end{align}

The parameters $k_{\text{d}}$ and $m$ correspond to the rates of biomass decay as well as the consumption of substrate for the maintenance metabolism of the cell. The two yield coefficients $Y_{\text{X,S}}$ and $Y_{\text{P,S}}$ reflect the ratio between substrate consumed and biomass and product formed, respectively. The following equations \eqref{eq:mu_kinetics} and  \eqref{eq:pi_kinetics} represent the model kinetics for the description of biomass growth and decay as well as product formation [2].

\begin{eqnarray} \mu(S,P) & = & \frac{\mu_{\max}\cdot S}{K_{\text{S}}+S+\frac{S^2}{K_{\text{I,S}}}+\frac{S\cdot P}{K_{\text{I,P}}}}\label{eq:mu_kinetics}\\[10pt] \rho(S,a) & = & \begin{cases} 0 & \text{for }a=0\\ \frac{\rho_{\max}\cdot S}{K_{\text{P}}+S} & \text{for }a=1 \end{cases}\label{eq:pi_kinetics} \end{eqnarray}

Methodical Background


The parameter identification, i.e. the determination of all model parameter values, is implemented as a mathematical minimisation. The aim is to identify those values for the model parameters which minimise the deviation (error) between the experimental data and the simulated processes. A number of established methods are available for this purpose. In this application example, the least squares method is used together with a direct heuristic search method (downhill-simplex [4]). Furthermore, it is given that all model parameters are structurally as well as practically identifiable [5].

The following diagram visualises the process of parameter identification and its essential components, the interaction of which is described below:

Parameter Identification Process

The initialisation of the parameter identification requires suitable bounds (constraints) and start values $\theta_0$ for the vectorised model parameters $\theta$. In most applications, the bounds can be restricted based on preliminary knowledge as well as formal kinetic or mathematical conditions. The following table lists the model parameters used in this application example and their corresponding value ranges:

Symbol Description Bounds Unit
lower upper
$\mu_{\max}$ maximum specific growth rate $0.1$ $0.7$ $\text{h}^{-1}$
$\rho_{\max}$ maximum specific product formation rate $0.01$ $0.05$ $\text{h}^{-1}$
$K_{\text{S}}$ half-saturation constant for substrate $0.5$ $5$ $\text{g}_{\text{S}} \cdot \text{L}^{-1}$
$K_{\text{P}}$ half-saturation constant for product $0.5$ $5$ $\text{g}_{\text{S}} \cdot \text{L}^{-1}$
$K_{\text{I,S}}$ inhibition constant for substrate $5$ $50$ $\text{g}_{\text{S}} \cdot \text{L}^{-1}$
$K_{\text{I,P}}$ inhibition constant for product $0.05$ $2$ $\text{g}_{\text{P}} \cdot \text{L}^{-1}$
$k_{\text{d}}$ specific biomass decay rate $0$ $0.1$ $\text{h}^{-1}$
$m$ specific substrate maintenance rate $0$ $0.1$ $\text{g}_{\text{S}} \cdot \text{g}_{\text{X}}^{-1} \cdot \text{h}^{-1}$
$Y_{\text{X,S}}$ yield coefficient biomass/substrate $0.1$ $1$ $\text{g}_{\text{X}} \cdot \text{g}_{\text{S}}^{-1}$
$Y_{\text{P,S}}$ yield coefficient product/substrate $0.01$ $0.1$ $\text{g}_{\text{P}} \cdot \text{g}_{\text{S}}^{-1}$

The determination of suitable start values is generally a great challenge. Ideally, these starting values are already given by existing preliminary knowledge, or can be determined empirically for simple models with very few parameters. If, as in the present application example, the model is a dynamic non-linear model with a large number of model parameters, suitable starting values $\theta_0$ are determined using various methods. The generation of these start-value combinations can, for example, be done (fully) combinatorially or using various stochastic or probabilistic sampling strategies [6-8].

The optimal values of the model parameters $\Omega$ are then obtained by minimising (Eq. \ref{eq:optimisation}) the following objective function $J\left(\theta\right)$ (Eq. \ref{eq:costfun}).

\begin{equation} \Omega=\underset{\theta}{\mathrm{argmin}}\left(J\left(\theta\right)\right) \label{eq:optimisation} \end{equation} \begin{equation} J\left(\theta\right)=\sum_{j=1}^{k}\sum_{i=1}^{n}\left(y_\text{data}\left(t_{j,i}\right)-y_\text{model}\left(\theta,t_{j,i}\right)\right)^{2} \label{eq:costfun} \end{equation}

The objective function above sums the squared deviations between the observed variable $y_{\text{data}}\left(t_{j,i}\right)$ and the simulated variable $y_{\text{model}}\left(\theta,t_{j,i}\right)$. The indices $i$ ($i=1,\ldots,n$) and $j$ ($j=1,\ldots,k$) referencing to the discrete value ($i$) within the time series of the respective experiment under consideration ($j$).

Results


For this application example, $106$ start-value combinations have been generated according to the Saltelli sampling scheme within given model parameter bounds and evaluated using the objective function $J\left(\theta\right)$ (Eq. \ref{eq:costfun}). The subsequent parameter identification by minimising the objective function starting from suitable start-value combinations finally converged to the parameter values listed in the following table.

Symbol Description Value Unit
$\mu_{\max}$ maximum specific growth rate $0.32$ $\text{h}^{-1}$
$\rho_{\max}$ maximum specific product formation rate $0.020$ $\text{h}^{-1}$
$K_{\text{S}}$ half-saturation constant substrate $2.3$ $\text{g}_{\text{S}} \cdot \text{L}^{-1}$
$K_{\text{P}}$ half-saturation constant in product formation $1.6$ $\text{g}_{\text{S}} \cdot \text{L}^{-1}$
$K_{\text{I,S}}$ inhibition constant substrate $9.1$ $\text{g}_{\text{S}} \cdot \text{L}^{-1}$
$K_{\text{I,P}}$ inhibition constant product $0.10$ $\text{g}_{\text{P}} \cdot \text{L}^{-1}$
$k_{\text{d}}$ specific biomass decay rate $0.024$ $\text{h}^{-1}$
$m$ specific substrate maintenance rate $0.034$ $\text{g}_{\text{S}} \cdot \text{g}_{\text{X}}^{-1} \cdot \text{h}^{-1}$
$Y_{\text{X,S}}$ yield coefficient biomass/substrate 0.60 $\text{g}_{\text{X}} \cdot \text{g}_{\text{S}}^{-1}$
$Y_{\text{P,S}}$ yield coefficient product/substrate 0.053 $\text{g}_{\text{P}} \cdot \text{g}_{\text{S}}^{-1}$

The figure below visualises how the experimental dataset can be represented by the parameterised bioprocess model in all essential system properties (click to expand):

Comparison between measurement and simulation data

Subsequently, the parameterised bioprocess model can be used to analyse (e.g. sensitivity analysis) or optimise (e.g. yield maximisation) the existing bioprocess.

References


  1. [1] L. Gao, Y. Ren, Y. Ma, J. Lin, J. Lin: Modeling and simulation of production of metallothionein and red fluorescent fusion protein by recombinant Escherichia coli using graphical programming. In: Modeling, Programming and Simulations Using LabVIEW Software., IntechOpen, London, United Kingdom, 2011. doi: 10.5772/14091
  2. [2] D. Voet, J. G. Voet: Biochemistry. John Wiley & Sons, New York, 1990. ISBN: 0-471-61769-5.
  3. [3] A. Kremling: Grundlagen der mathematischen Modellierung. In: Kompendium Systembiologie, Vieweg+Teubner Verlag, 2012. doi: 10.1007/978-3-8348-8607-1_3.
  4. [4] J. A. Nelder, R. Mead: A simplex method for function minimization. In: The Computer Journal, 7(4):308–313, 1965. doi: 10.1093/comjnl/7.4.308
  5. [5] A. Raue, C. Kreutz, T. Maiwald, J. Bachmann, M. Schilling, U. Klingmüller, J. Timmer: Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. In: Bioinformatics, 25(15):1923–1929, 2009. doi: 10.1093/bioinformatics/btp358
  6. [6] B. Tang: Orthogonal array-based latin hypercubes. In: Journal of the American Statistical Association, 88(424):1392–1397, 1993. doi: 10.2307/2291282
  7. [7] I. M. Sobol: Distribution of points in a cube and approximate evaluation of integrals. Zh. Vych. Mat. Mat. Fiz., 7:784–802, 1967.
  8. [8] A. Saltelli: Making best use of model evaluations to compute sensitivity indices. In: Computer Physics Communications, 145:280–297, 1967. doi: 10.1016/S0010-4655(02)00280-1