The Factorial Analysis of the Activated Sludge Treatment Process Parameters

The main environmental problems comprise two main directions: air and water. Water treatment plant is a critical infrastructure, especially in large cities. The activated sludge process is a typical example of highly nonlinear system. The associated models found in literature are mainly analytical and complex. In this paper there are proposed data-driven models obtained from plant operation data. A factor reduction procedure, namely principal component analysis is used to find meaningful correlations between process measurements. The selected correlations are obtained via simple and multiple regression algorithm. The resulted models are specific to the studied plant, simpler than the analytical ones, and with sufficient accuracy if used in plant monitoring and operation. The proposed procedure of using data-driven models for inferential measuring decreases the analysis costs (even eliminating the necessity of measuring equipment). If the experience in operating the plant is used to predict parameter trends this procedure can provide a useful tool for developing a decision support system for the plant operator. A real time prediction module associated with a warning system can be applied for every active sludge process, having as condition the availability of plant operating data.


Introduction
The activated sludge process takes part from the water treatment plant. Wastewater treatment plant (WWTP) is one of the most important part of the solution for environmental problems associated to important cities. Even the media recently emphasized the problems within the Romanian water management system.
Since Henze et al. provided the first general model for single sludge wastewater systems [1], the researchers tried to improve or supplement this research direction. The first model, denoted ASM1, captures the high nonlinear feature of the process and it was first improved by Henze and his team. Consequently, ASM2 and ASM3 models were developed [2][3][4]. Other researchers focused on the performance analysis of the proposed models [5,6]. The models were studied for particular cases of activated sludge process such as in [7] (membrane bioreactors for wastewater treatment). Other researchers tried to improve the models including the influence of hydrodynamic parameters [8].
The main control system associated with activated sludge process is dissolved oxygen control. The dissolved oxygen control system was implemented in real time, and a model predictive control based on ASM models was developed [9]. A thorough review of practical applications of the activated sludge model and their development, applied to plant optimization, the extension, upgrading, retrofitting and troubleshooting of wastewater treatment plants is presented in [10] and [11]. The technology used to develop the support for the membranes implicated in the activated sludge process is also very important for the process efficiency (e.g. ultrafiltration polysulfone membranes with ZnO/TiO2 nanohybrid blending [12], biological filters [13], ultrafiltration pilot with polymeric membrane [14]). The characteristics of the aerobic sludge granules also influence the WWTP performance [15,16].
While the above models were analytical, the models proposed in the paper are extracted from operating data (data driven models). The paper aims to identify the representative parameters for the activated sludge treatment process using principal components analysis (PCA) technique. The next step is to use the found correlations to determine a number of models using simple and multiple regression. Selected models can be further used to predict process parameters that are very important in the operation of activated sludge process. The paper is structured as follows. The second part briefly presents the studied activated sludge process from a Romanian wastewater treatment plant. The third part provides an experimental data analysis with the discussion of the results while the last part presents the concluding remarks.

Materials and methods
The activated sludge process from an industrial wastewater treatment plant The studied wastewater treatment has three parts: mechanical, chemical and biological part. The biological part takes place in a cuboid (aprox. 4400 m 3 ) filled of wastewater divided in two equal cuboid reservoirs, namely aerotank and secondary decanter ( Figure 1). The biological treatment is done using the activated sludge process. The aerotank reservoir is the place where using the Messner panels the activated sludge is aerated. The panels have a special construction that improves the efficiency of oxygen mass transfer and reduces the energy consumption. The dissolved oxygen control is made with a group of three blowers (main energy consumption in WWTP). An air layer is formed under each Messner panel membrane which opens the membrane orifices as the blowers increase their speed. Dispersed fine air bubbles rise to the aerotank surface and determine the activated sludge to separate by gravity and maturity. The 148 Messner panels are evenly distributed at the bottom of the two reservoirs. The eight vertical wood panels from aerotank are used to coordinate the wastewater flow and to add or diminish the time of treatment. (ca. 1 to 3 hrs. with a maximum speed of 1m/s depending of the wastewater volume). For the studied plant, an increased water purification degree is obtained with a constant value of the activated sludge volume (300-400 L).
The artificial biotope conditions must be fulfilled. Besides oxygen, the nutrients for activated sludge bacteria must have a proper dosage. The main nutrients, nitrogen N and phosphorus P have a recommended ratio of 5:1. The minimum value for the P element is of 0.6mg/L. https://doi.org/10.37358/RC. 20.5.8115 Another biotope parameter is temperature. Mesophilic microorganisms form the biocenosis. Temperature modifies the microorganisms' metabolism, the oxygen demand and even the length of their living. The recommended temperature for their development is 28-29 ºC [17]. From 35 ºC the oxygen demand increases, but the microorganisms die. At low temperature microorganism's metabolism is slowing down.
The third important biotope parameter is wastewater pH. Usually pH is established at the chemical part and it is around 7 (±1.5). Otherwise, the microorganisms do not survive.
Inhibitors (e.g. detergents, pesticides and phenols) that can intoxicate the activated sludge, preventing microorganisms' development, represent the fourth biotope parameter (that must be limited).
If all the biotope conditions are fulfilled, the microorganisms grow and their excess must be removed with special pumps. The wastewater flux and a good part of the activated sludge goes from aerotank to the secondary decanter that works as gravity separator. The separated activated sludge is recycled then to the aerotank ( Figure 2). The main analysis points are denoted from A1 to A4. If the measuring points are many and dispersed throughout the plant, there is a single control loop for dissolved oxygen control. The measuring point uses an oxymeter transducer ( Figure 1) and the manipulated point comprises three blowers (one of them having variable speed). The data driven models are correlations and not necessarily express causality. The aim is to provide a full analysis panel for the studied plant, even when some transducers or offline analyzers are not available.
The next section presents the main results obtained using PCA, a factorial method applied to the available operating data, data taken from the plant centralized records of the wastewater samples (at the biological step output) analysis. The PCA is a data reduction method, used to reduce a large number of variables that needs to be analyzed into a smaller set of data (called principal components or factors) containing all the information from the initial analyzed variables [18,19]. Also, PCA is a technique used to highlight the correlations between the analyzed variables. This technique also reduces the dimensionality of variables' space by representing it through a number of factors that capture most of initial data variance, behavior [20]. PCA was implemented using IBM SPSS Statistics software (on a 2.13 GHz CPU with 4 GB RAM), that offers to the user, among many others facilities, advanced statistical analysis tools, machine learning algorithms and more important the integration of big data [21,22].

Results and discussions Experimental data analysis
The purpose of applying PCA was to identify the parameters from the activated sludge treatment process that have the greatest influence on the plant operation from a studied industrial wastewater treatment plant (WWTP). The PCA method described in [18,19,20,23,24,25] was applied to the input and output data corresponding to the aerotank and to the secondary decanter levels.
As it was mentioned, the IBM SPSS Statistics software [21,22] was used in order to analyze the available operating data (from January to December, year 2019) supplied by the centralized records of the wastewater samples (at the biological step output) analysis [26]. As the biological process takes place in a cuboid composed of an aerotank and a secondary decanter, the centralizing tables presents the measurement of the monitored parameters for each of the two components. So, according to the records [26], the parameters monitored at aerotank level, are: pH, extractable, detergents, chemical oxygen demand (COD-Cr), total suspended solids (TSS), sludge pH, and sludge volume. The parameters monitored according to the same records for the secondary decanter are: sludge pH, sludge volume, pH, extractable, detergents, TSS, total residue, ammonium (N), phosphorus (P), CODCr, phenols, biological oxygen demand (BOD5).Using the mentioned available data at the aerotank level was developed the data_aerotanc.sav database, and for the available data at secondary decanter level was developed data_decantor.sav (Figure 3) database, both being subject to factorial analysis, respectively to PCA method. The analysis of these parameters is essential, due to the fact the output of the biological step, respectively the plant effluent (that needs to comply the normative-NTPA001 and 002/2005 [27,28]), is transmitted into the plant emissary, respectively the Dâmbu stream, affecting its quality. Applying PCA, through the Analyze-Dimension-Reduction procedure on the aerotank and secondary decanter parameters, a set of statistical results was obtained, such as: 1. The correlation matrix for the data from date_aerotanc.sav database, presented in Table 1; 2. The correlation matrix for the data from date_decantor_sec.sav database, presented in Table 2;   The value of the KMO index shown in Table 1 and Table 2 indicates the existence of one or more common factors (principal components) in both the aerotank and secondary decanter parameters, which justifies the application of a factor reduction procedure, namely PCA.
As it was mentioned, PCA is used to reduce the number of analyzed variables (keeping as much as possible from the variance trend of initial data) with the goal of determine those factors (named principal components) that describes the behavior and variance of analyzed variables. According to [23], from the number of obtained factors are selected only that ones that fulfill the selection criteria (eigenvalue>=1), factors that supplies the most useful information about the initial analyzed variables. So, each eigenvalue is the part of variance explained, respectively captured by each factor (principal component). The PCA method usage supplies, the initial and the final factor solution, respectively the solution before and after rotation procedure, in this case the Varimax rotation method [23,24,25].
The initial factor solution (before rotation procedure) obtained by applying PCA for aerotank and secondary decanter supplied the following results: 1. For aerotank data were supplied a number of seven factors of which only the first two meet the selection criteria (Eigenvalue>=1) (Figure 4a.); 2. For secondary decanter data were supplied a number of twelve factors of which only the first five meet the selection criteria (Eigenvalue>=1) (Figure 4b.). https://doi.org /10.37358/Rev. Chim.1949 a.
b. Figure 4. Scree Plot-a. aerotank data, b. secondary decanter The values charts, respectively the scree plots presented in Figure 4a. and Figure 4b., are very useful in establishing the number of principal components (factors). So, for the analyzed data at aerotank level, only the first two factors provide the most relevant information about the behavior and variance of the initial analyzed parameters, respectively only these two capture the most of the initial data variance. At the secondary decanter level, only the first five factors are the most relevant ones. A better interpretation of the obtained components, is observed after the rotation of the factors, which offers a better image of them. So, in Table 3 are presented the main components (factors obtained after rotation procedure) obtained through PCA, respectively the factorial structure of the analyzed variables at the aerotank and secondary decanter level. According Table 3, the proper operating of the active sludge treatment process at aerotank and secondary decanter level is influenced by the following factors: 1.At aerotank level: •Factor 2 is given by: extractable (0.884), detergents (0.601) and TSS (0.720); •Factor 3 is given by: pH (0.857); •Factor 4 is given by: total residue (0.935); • Factor 5 is given by: COD5 (0.560).
Next, using simple and multiple linear regression [29,30], numerical prediction models were determined at aerotank and secondary decanter level. Table 4 shows the coefficients R (coefficient of regression), R 2 (coefficient of determination of R) and the standard error of the Estimate, obtained by applying the SPSS Analyze-Regression-Linear procedure to the aerotank parameters (variables). As it can be observed from The obtained model can be used to predict the extractable evolution knowing Detergents and TSS parameters. Figure 5 represents the graphic between the dependent variable (extractable) and the regression-derived variables such as the predicted standardized values.  Table 5 shows the coefficients R (coefficient of regression), R 2 (coefficient of determination of R) and standard error of the estimate, obtained by applying the SPSS Analyze-Regression-Linear procedure to the secondary decanter parameters (variables). As it can be observed in Table 5, there are three viable models that can be extracted, respectively regression no. 1, no. 3 and 4, models presented next.
Regression no.1 -67.9% of the dependent variable (TSS) variation is explained by the variation of the independent variables (Extract, phenols) respectively by the regression (model) equation (2); The model (2) can be used to predict the TSS evolution knowing Extractable and Phenols. Figure 6 represents the graphic between the dependent variable (TSS) and the regression-derived variables such as the predicted standardized values.
The model (3) can be used to predict or compute the Phenols evolution knowing TSS, Ammonium, Phosphorus and Chemical oxygen demand. Figure 7 represents the graphic between the dependent variable (Phenols) and the regression-derived variables such as the predicted standardized values. https://doi.org /10.37358/Rev. Chim.1949 (4); COD-Cr=28.880+7.487*Ammonium+124.676*Phenols (4) The model (4) can be used to predict the COD-Cr evolution knowing Ammonium and Phenols. Figure 8 represents the graphic between the dependent variable (COD-Cr) and the regression-derived variables such as the predicted standardized values. It should be mentioned that the developed models are not general ones, they being developed using the available operating data from a studied Romanian industrial plant. The number of models extracted from data is far bigger than the ones presented. Only the best models in terms of statistical indicators were presented in this paper. The results obtained and presented above are useful to identify those parameters that have the greatest influence on the processes from the plant biological part (respectively on the activated sludge treatment process), as well as the correlations (models -used to predict the analyzed parameter trends) between them. https://doi.org /10.37358/Rev. Chim.1949

Conclusions
The proposed models can reduce the number of analyzers or transducers used to monitor the studied plant. At aerotank level the PCA method combined with regression generates a model that can make an inferential measuring or a prediction for Total suspended solids based on Extractable and Phenols with a coefficient of regression R=0.841.
At decanter level three viable models are found. The first model can be used to predict the Total suspended solids evolution knowing Extractable and Phenols with a coefficient of regression R of 0.824. The second model is useful to predict or compute the Phenols trends knowing Total suspended solids, Ammonium, Phosphorus and Chemical oxygen demand with the best coefficient of regression R=0.928. The third model can be used to predict the Chemical oxygen demand trend knowing Ammonium and Phenols with a coefficient of regression R of 0.836.
The resulted models are specific to the studied plant. They are simpler than the analytical ones and the resulted values of R suggest sufficient accuracy if used in plant monitoring and operation. These proposed data-driven models for inferential measuring decreases the analysis costs (even eliminating the necessity of measuring equipment). But the proposed procedure is general and previous experience in operating the plant can be used to predict parameter trends. This is an important tool for developing a decision support system for the plant operator. A real time prediction module associated with a warning system can be applied for every active sludge process, having as condition the availability of previous plant operating data.