In silico Analysis of 1,4-butanediol Heterologous Pathway Impact on Escherichia coli Metabolism

ZSOLT BODOR1, LEHEL TOMPOS1, AURELIA CRISTINA NECHIFOR2, KATALIN BODOR3* 1Sapientia Hungarian University of Transylvania, Faculty of Economics, Socio-Human Sciences and Engineering, Department of Bioengineering, 1 Libertatii Sq., 530104, Miercurea Ciuc, Romania 2Politehnica University of Bucharest, Faculty of Applied Chemistry and Material Science, Analytical Chemistry and Environmental Engineering Department, 1-7 Gheorghe Polizu Str., 011061, Bucharest, Romania 3University of Pecs, Faculty of Natural Sciences, Doctoral School of Chemistry, Ifjusag 6, 7624, Pecs, Hungary

Constraint Based Reconstruction and Analysis (COBRA) methods are widely used to analyze genome-scale metabolic networks and to guide metabolic engineering efforts [1][2][3][4][5][6]. The literature is replete with studies describing rational metabolic engineering approaches to generate strains with improved functions to produce native and nonnative chemicals [1,[7][8][9][10][11][12]. Current metabolic engineering relies on genome-scale models (GEMs) in the process of cell factories design for production of fine and commodity chemicals.
For example, the GEM for E. coli, S. cerevisiae or Basfia succiniciproducens was used to produce succinic acid [12][13][14], 1,4-butanediol [9,11], erythromycin [15], or ethanol [16], respectively. The number of metabolic reconstructions is increasing day by day and using the GEMs the microorganisms are modified to produce value added chemicals from renewable raw materials such as agriculture and industrial waste. For example, a huge amount of glycerol is produced as a by-product of biodiesel production which is an inexpensive and abundant carbon source [17][18][19] as well as glucose from agriculture waste. On the other hand it is mandatory to understand the impact of air pollution on the environment as well as on the rainwater chemistry [20][21][22][23][24][25].
E. coli is often used in fermentation processes to produce different compounds, because it is a well known organism and it has the ability to grow in low cost mineral medium. Experiments were carried out in order to produce nonnatural metabolites [9][10][11], while a new metabolic pathway has been incorporated into the organism. Previous studies have shown that the huge complexity and number of interactions can only be addressed in detail by metabolic models [26,27]. BDO is an organic compound used industrially as a solvent and furthermore it is used as an intermediate in the synthesis of plastic and other chemicals [28]. BDO production is strongly related to acetylene and formaldehyde, which is the well-known Reppe chemistry, a petroleum-based chemistry. Genomatica [29] is the only company with commercial scale production, hence further analysis are necessary to develop new heterologous pathways and analyze in detail the biochemical processes, interactions between native and non-native reactions taking place inside the microorganisms. Using genomescale metabolic models the metabolic processes can be simulated and analyzed in silico in order to better understand the connections taking place.
In silico methods were successfully used to design, analyze and create strains being capable to produce the target metabolite [9,11,15,[30][31][32][33][34]. The new heterologous pathway has not been analyzed in detail before. In order to evaluate the production potential, the impact of mutations the following analyzes were carried out: flux balance analysis (FBA) to estimate the maximum production potential and the changes occurred in reaction fluxes under given conditions [15,30], flux variability analysis (FVA) [35] to calculate the range of flux variability for wild-type and mutant strains that achieves optimal objective states. Flux correlations were estimated using random sampling to characterize the space of all flux distributions [31, [36][37][38]; flux coupling analysis (FCA) [39,40] for classifying the reactions depending of the coupling status [15,36,41,42]. Simulations were carried out using the most complex GEM of E. coli published in 2011 [43] and completed by the biosynthetic pathway necessary for BDO synthesis.
This study provides an example of metabolic network analysis using the tools of systems biology, which may provide an important contribution to decipher and understand the metabolic changes taking place in metabolically engineered strain for non-native compounds production.

Experimental part
During the research different methods were used, which are described in detail below. All studies were done by using the Constraint-Based Reconstruction and Analysis (COBRA) Matlab Toolbox (Mathworks Inc., http:// www.mathworks.com/) [36,41]. Flux distribution, optimization was solved using TOMLAB CPLEX (Tomlab Optimization Inc., San Diego, CA, USA). Computations were carried out according to the methods described by [36,41]. The genome scale metabolic networks of E. coli used through this work are as follows: wild-type model iJO1366 (WT) [43], model with BDO pathway (WT+BDO) [10] and optimized model -knocked out reactions-for BDO production (WT+BDO+KO) [10].
Flux balance analysis (FBA) for optimal flux distribution To evaluate the heterologous pathway impact on cell metabolism reactions necessary for BDO synthesis from succynil-CoA were added into this model (WT+BDO). Two reactions were added as follows adhE1BUT (double specificity alcohol/aldehyde dehydrogenase) to convert succinyl-CoA to 4-hydroxybutyrate and in the final step 4hydroxybutyryl-CoA to 1,4-butanediol and sucCD or cat (succinyl-CoA synthetase or catalase) to activate the carboxylic group to obtain 4-hydroxybutyryl-CoA. Steps involved in BDO production are outlined in figure 1 [10]. Furthermore, a preliminary optimization was carried out by eliminating the competing pathways, formate, lactate and ethanol for glucose and one more reaction was eliminated in case of glycerol, glutamate dehydrogenase, to block the formation of glutamate from 2-Oxoglutarate, respectively. During FBA [44] analysis the objective function was biomass formation and the solved linear programming (LP) is summarized below: where Z is the objective function, c is a vector of weights (the contribution of each reaction to the objective function), S is the stoichiometric matrix with m metabolites and n reactions, flux vector with n elements, v lb and v ub represent the lower and upper limits on the fluxes determined by physiological and thermodynamic constraints, respectively.
The uptake rate of the main carbon source was constrained in each simulation to a maximum of 20 mmol gDW -1 h -1 [30]. To simulate microaerobic conditions the oxygen uptake rate was set to -5 mmol gDW -1 h -1 , while 0 mmol gDW -1 h -1 was set for anaerobic conditions. This method has been widely used to study the capabilities of different organisms to synthesize different molecules, biofuels or value-added chemicals [9,15,[45][46][47].

Flux variability analysis (FVA)
FVA is a method widely used to identify the maximum and minimum possible fluxes through each reaction of the genome-scale metabolic network for a given maximum objective value [31,35]. Calculations were carried out by following the descriptions presented by [36]. Briefly, the optimal predicted growth rate was constrained to 100% of the optimal value under various environmental conditions and the minimum and maximum values were estimated to decipher the redundancy of reactions in the network. Two optimization problems need to be solved during FVA simulations for each flux v j of interest, where Z biomass is biomass optimal solution. Considering n as the number of reactions then FVA requires the 2n LP problems to be solved as mentioned previously. This method was employed to determine flux ranges and an attempt was made to classify the coupled reactions in the wild-type+BDO and mutant metabolic networks [48]. Reactions are identified and categorized into four categories by calculating the minimum and maximum flux ratio for i.e. respectively.
Analyzing flux correlations wild-type and mutant strains using sampling Uniform random sampling was performed in order to characterize the entire solution space of all flux distributions that are allowed by constraints [31, 36,49]. The method is based on the artificially centered hit and run (ACHR) algorithm. Characterization of solution spaces of strains harboring the biosynthetic BDO pathway and mutant strains was carried out using ACHR with a total of 10 million steps [38]. Reactions to be sampled were selected considering the central carbon metabolism pathways and the correlations between pairs of reactions were compared under different environmental and genetic conditions. The correlations between reaction pairs obtained for metabolic network with BDO pathway were compared with optimized genome-scale metabolic model.

Results and discussions
Modified GEM of E. coli with BDO pathway The recently published GEM of E. coli [43] was modified by introduction the biosynthetic pathway of BDO (Fig. 1). This model has already been used to experimental studies (10), however the possible complex interactions, impact that take place between native and heterologous pathways are yet to be discovered. In our previous study in silico optimization was used under various environmental conditions (i.e. carbon source was glucose or glycerol, microaerobic and anaerobic conditions were tested with multiple knockout) to reroute the carbon flux toward BDO [50].
Flux changes during theoretical maximum production of biomass and BDO Changes in flux distributions can be addressed by first, optimization of biomass production and then simulating the maximum production potential of the desired product while the growth rate is fixed to a minimum value of 0.1 h -1 . Number of reactions that showed a flux change at least 1 mmol gDW -1 h -1 for glucose and glycerol are as follows: aerobic conditions 100, microaerobic 76, anaerobic 63 and 85, 67 and 52, respectively. During the evaluation of the maximum production potential reactions with highest flux changes were identified for environmental conditions specified earlier. To understand the importance of conditions on 1,4-butanediol production the most important 100 reactions were selected and compared, the results from this analysis are given in Fig. 2.
A number of conclusions can be drawn, even if the oxygenation level was modified in case of glucose more than 80% of the reactions with significant flux changes considered in this analysis did not change. Similar tendency was observed for glycerol, with 79% agreement. The highest production rate was predicted for aerobic conditions (maximum theoretical production), however to obtain in vivo a stable strain for this condition is difficult.
Flux variability analysis within the metabolic network for wild-type and mutant strains FVA was carried by setting growth rate as the objective function and the maximum and minimum possible fluxes through each reaction were identified. The main difference between FVA and FBA is that in the former the biological objective is a constraint not the value to be optimized. Briefly, the optimized objective value obtained with FBA was used as a constraint during calculations of feasible ranges of reaction fluxes (minimization and maximization of each flux value). Substrate consumption rates were set to 20 mmol gDW -1 h -1 , while the oxygen uptake rate was modified to create microaerobic or anaerobic conditions. The genetically modified strains for BDO production were simulated by knocking out the pflB, adhE and ldhA reactions, and one more reaction was blocked (upper and lower bounds were set to 0), gludy, on glycerol in order to reroute the carbon flux toward BDO. To analyze the impact of environmental conditions and the genetic modifications on flux ranges the flux variations were calculated for each condition to decipher the feasible ranges of metabolic fluxes. In Fig. 3 are presented only reactions with at least 1 mmol gDW -1 h -1 flux change and those reactions with biologically irrelevant high fluxes are not presented here. Results are presented for wild-type and mutant strains on glucose and glycerol under microaerobic and anaerobic conditions where possible.
The reactions affected due to metabolic engineering were the same as obtained for wild-type strains under microaerobic conditions on glucose, there is 100% agreement. Differences were observed only in flux change values (Fig. 3. A). However, by changing the environmental conditions, blocking oxygen uptake, in mutant strains a significant flux change was predicted for FRD2, FRD3 (fumarate reductase), NADH18pp (NADH dehydrogenase (demethylmenaquinone-8; 3 protons) (periplasm)) and NADH17pp (NADH dehydrogenase (menaquinone-8; 3 protons) (periplasm)) which were not identified in wildtype. On the other hand, by changing the substrate to glycerol and the Ägludy the number of reactions for strains was very different: 7 WT and 28 for mutant strains.
Flux span differences between wild-type and mutant models were also estimated [31] for reactions and grouped into different categories depending on the flux span change (SC): glucose microaerobic -SC>10, 1<SC<10 and 0.01<SC<1 (reactions with SC=0 were not included), with 8.89%, 13.33% and 77.78% of a total of 45 reactions, respectively. During anaerobic conditions flux span changes were identified in case of 46 reactions. The percentage distribution was 15.22%, 10.87% and 73.91%, respectively. On the other hand, the reactions affected due to metabolic engineering in case of glycerol under microaerobic conditions are presented below: 24.69%, 13.58% and 61.73%. Significant span change was observed in 81 reactions, almost double compared to glucose. In both cases the majority of the affected reactions lie between 0.01 -1, representing over 70% for glucose and over 60% for glycerol.
Reactions with SC of more than 1 mol gDW -1 h -1 were chosen for further analysis. Were selected those reactions with most changed fluxes among wild-type and mutant models under oxygen limited conditions. Reactions most affected by genetic engineering are involved in different metabolic pathways: on glucose under microaerobic conditions the reactions (FRD3, NADH12pp, NADH18pp, FRD2, ACKr, ACS, PTAr, DHAPT, PYK, F6PA) are from citric acid cycle, oxidative phosphor ylation, pyruvate metabolism, alternate carbon metabolism and glycolysis pathways. The results obtained for anaerobic conditions were the same, only the ACt2rpp and ACt4pp transport reactions flux span was higher than the defined threshold. On the other hand, by changing the carbon source from glucose to glycerol 31 reaction was most significantly affected. Figure 4 shows the pathways in which selected reactions are involved. Flux span changes obtained for wildtype models usually are less than that for the mutant models, the opposite was observed during glucose growth for three reactions (DHAPT, PYK, F6PA) under microarobic and five reactions (DHAPT, F6PA, PYK, ACt2rpp, ACT4pp) under anaerobic conditions, respectively. The majority of high flux variability reactions are involved in redox balance maintenance during steady-state growth.
Analyzing flux correlations using sampling Sampling the allowed flux distributions is a simple way to characterize the solution space, to identify the most likely flux value through a reaction or to estimate the correlation coefficient between pairs of reactions within the network [36]. This method is an efficient approach to analyze the perturbations caused by the introduction of heterologous pathways or by the elimination of genes [31, 37,51]. On the other hand, the changes in the relationships between fluxes before and after metabolic engineering can also be studied. During uniform random sampling the substrate consumption rates were fixed to 20 mmol gDW -1 h -1 in both models (wild-type and mutant) and the correlation coefficient between reaction fluxes with SC over 1 was evaluated. This analysis was carried out to decipher the correlation patterns for wild-type and mutant strains. The correlation coefficients, probability distribution of fluxes and pairwise correlation are calculated on glucose and glycerol to identify how interdependent these reactions are.
Environmental conditions for simulations were set to anaerobic for glucose and microaerobic for glycerol (Fig.  5). The diagonal histogram indicates the magnitude of the flux and the pairwise scatter plots on the off-diagonal elements show the relationships between these reactions. It is obvious that in both models the highly correlated fluxes were almost the same, however significant differences were identified in six pairwise (N17/FRD3, N18/N17, FRD2/ FRD3, FRD2/N18, PYK/DHAPT, PYK/F6PA) (Fig. 5). with magenta rectangular boxes. For example, the first four pairwise are almost fully correlated (negative correlation) in the engineered metabolic model (correlation coefficients (cf) -0.9), but low correlations were observed in the wildtype model (cf ~-0.01) -these reactions are from oxidative phosphorylation and TCA pathways. The opposite was observed for the last two reactions, namely good correlation in wild-type (cf -0.96) and less correlations in modified model (cf -0.47) -reactions from glycolysis and alternate carbon metabolism.
Analyzing the scatter plots, it is obvious that the production strain (model) is more scattered compared to the original model. The BDO production optimization during anaerobic conditions leads to higher correlation between selected reactions, hence reactions flexibility is influenced by genetic modifications. We can also identify that in the engineered model, the probability distribution of BDO fluxes show a significant increase.
The correlation coefficients and flux distributions obtained for glycerol under microaerobic conditions are presented in Fig. 6. The probability distribution of fluxes and pairwise correlation presented here are for reactions with SC>10. Reaction pairs with high differences in correlation coefficient are represented by magenta boxes, and those with at least 0.3 change are shown by a turquoise rectangular boxes. For example, a good correlation was observed between BDO production and P (cf 0.75) in engineered model, and no correlation was predicted in wildtype model (cf -0.02). Reaction pairs with high or low correlation coefficients (magenta boxes) are involved in amino acid synthesis, pyruvate, electron transport and glycolytic pathways. On the other hand, pairwise scatter plots with correlation coefficient differences between wildtype and engineered model at least 0.3 includes reactions from alternate carbon metabolism, such as GLYK or GLYCDx, oxidative phosphor ylation, amino acid metabolism, pyruvate metabolism, from pentose phosphate pathway and electron transport pathways. In this case, opposite to glucose many selected pairs of reactions have 0 correlation coefficients, indicating that these reactions in the network are not interdependent.
Flux coupling analysis of wild-type and mutant models GEMs allow us to broadly decipher and classify the functionally related coupled reactions found in the network under various genetic and environmental conditions [39] using flux coupling analysis. Briefly, the degree of dependency between any two reactions can be determined by minimization and maximization of flux ratios (FVA). Then reactions are categorized based on the relations they have with biomass for example. FVA was applied during coupling analyses and the model was set to 100% of the optimal growth rate obtained for the conditions and the classifications used in this study are as follow: a) hard coupled to biomass (no differences in reaction and biomass flux variation); b) partially coupled to biomass (more flexible in the range); c) not coupled to biomass (zero or non-zero flux while maintaining the biomass); and d) zero flux reactions (cannot carry flux in growth conditions) [52] Fig. 7.
The number of hard-coupled to biomass reactions was between 6-14 in glucose and 7-10 in glycerol. Nearly 50% of the reactions are not coupled to biomass under these conditions even if BDO production is optimized. For example, BDO production is partially coupled to biomass in mutant models under microaerobic and anaerobic conditions if glucose is used as substrate and under microaerobic on glycerol, respectively.

Conclusions
Using curated GEMs we are able to predict the behavior of complex biological systems and analyze, design industrially important strains with improved characteristics. In this study we have used the GEM of E. coli with the BDO biosynthetic pathway to study and characterize the mutant models compared to wild-type. The biosynthetic pathways were introduced into the complex and updated genomescale metabolic model of E. coli. To optimize the production potential the competing pathways were eliminated and the impact on cell metabolism was analysed for wild-type and mutant models. Metabolism and metabolic changes were analysed using well defined in silico methods: FVA, sampling flux distributions, FCA.
The maximum production potential and changes in metabolic fluxes were simulated by different objective functions (biomass and BDO) and the reactions with highest differences were identified under different environmental conditions. As presented earlier 80% of the reactions with significant flux change were the same for all conditions simulated. FVA was carried out to decipher the variation of fluxes under various genetic and environmental conditions. Flux span changes were calculated and were further analysed reactions with SC over 1 mmol gDW -1 h -1 and to calculate the correlation coefficients for WT and mutant strains uniform random sampling was carried out. Most important variations in correlation patterns were observed for reactions in the mutant model. As stated out by [31] the addition and optimization of new metabolic pathways to the model significantly affected the entire solution space of the network as well as the reaction correlation patterns. Furthermore, minimal changes were identified between glucose and glycerol. The majority of affected reactions over 70% for glucose and 60% for glycerol lie between 0.01 and 1, and the most affected by genetic modifications are involved in the central carbon metabolism. Analyzing the scatter plots, it is obvious that the production strain (model) is more scattered compared to the original model. The BDO production optimization during anaerobic conditions leads to higher correlation between selected reactions, hence reactions flexibility is influenced by genetic modifications. We can also identify that in the engineered model, the probability distribution of BDO fluxes show a significant increase. In case of glycerol For example, a good correlation was observed between BDO production and P (cf 0.75) in engineered model, and no correlation was predicted in wild-type model (cf -0.02).
The in silico approach enable us to better understand BDO production potential and the complex interactions faced during metabolic engineering. Industrially competent strains can be designed for native, non-native and nonnatural value added products in a rational and systematic way.