Next Article in Journal
Research on the Relationship between Stall Propagation and Flange Leakage of Mixed-Flow Pumps
Previous Article in Journal
Spatiotemporal Response of Fish Aggregations to Hydrological Changes in the Lower Pearl River, China, during the Main Spawning Season
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Machine Learning Models for Soil Saturated Conductivity Prediction

Department of Civil and Mechanical Engineering, University of Cassino and Southern Lazio, 03043 Cassino, Italy
*
Author to whom correspondence should be addressed.
Water 2022, 14(11), 1729; https://doi.org/10.3390/w14111729
Submission received: 4 May 2022 / Revised: 24 May 2022 / Accepted: 26 May 2022 / Published: 27 May 2022

Abstract

:
The hydraulic conductivity of saturated soil is a crucial parameter in the study of any engineering problem concerning groundwater. Hydraulic conductivity mainly depends on particle size distribution, soil compaction, and properties that influence aggregation and water retention. Generally, finding simple and accurate analytical equations between the hydraulic conductivity of soil and the characteristics on which it depends is a very hard task. Machine learning algorithms can provide excellent tools for tackling highly nonlinear regression problems. Additionally, hybrid models resulting from the combination of multiple machine learning algorithms can further improve the accuracy of predictions. Five different models were built to predict saturated hydraulic conductivity using a dataset extracted from the Soil Water Infiltration Global database. The models were based on different predictors. Seven variants of each model were compared, replacing the implemented algorithm. Three variants were based on individual models, while four variants were based on hybrid models. The employed individual machine learning algorithms were Multilayer Perceptron, Random Forest, and Support Vector Regression. The model based on the largest number of predictors led to the most accurate predictions. In addition, across all models, hybrid variants based on all three algorithms and hybridized variants of Random Forest and Support Vector Regression proved to be the most accurate (R2 values up to 0.829). However, all variants showed a tendency to overestimate conductivity in soils where it is very low.

1. Introduction

The hydraulic conductivity of soil in saturated or unsaturated conditions has great importance for several issues of interest in hydrology and hydraulics but has also a paramount role in different geotechnical and geo-environmental problems. It affects various processes that contribute to the phases of the hydrological cycle: infiltration, runoff, groundwater seepage, etc. [1,2]. Its quantification is essential for addressing design problems connected with the withdrawal of groundwater resources and with consequences on the natural and anthropic environment [3]. Water conductivity rules the consolidation process and thus its determination is fundamental to quantify the time evolution of settlements after construction of structures and infrastructures [4]. Seepage induced below water retaining structures (dams, weirs, levees) and leakage from contaminated sites are other non-secondary applications that depend significantly on soil conductivity. The effectiveness of permeation grouting as a ground improvement technique relies on the permeability of the treated soil to the injected fluid.
In saturated conditions, groundwater seepage is well described by Darcy’s law, which is valid for laminar flow regime, i.e., with relatively small gradients through fine-grained or granular sediments having a relatively small dimension of pores. In these cases, hydraulic conductivity is characterised by the permeability coefficient Ksat, which is one of the most widely variable characteristics in nature, being able to assume values ranging from 10−11 cm/s to 102 cm/s [5]. Ksat quantifies the ease of water when seeping through a porous medium under certain hydraulic gradients, and its values mainly depend on size, distribution, and interconnection between the soil pores. These characteristics depend primarily on the soil grading, but also on shape of particles, compaction level and on other factors that affect aggregation and water retention [6]. The latter include the organic matter content, which affects soil aggregation and aggregate stability. The influence of different soil characteristics on hydraulic conductivity has been investigated in several past studies (e.g., [7,8,9,10,11,12,13,14]).
Hydraulic conductivity in saturated zones can be determined, directly or indirectly, by a variety of methods that include empirical formulas, laboratory tests under steady or transient conditions on representative samples, tracer tests, auger hole tests, and pumping tests in wells [15]. A comprehensive review of predictive methods for saturated soils was provided by [16]. However, due to the complexity of the phenomenon at the particle scale, it is difficult to build analytical relationships, between the hydraulic conductivity of a given soil and all the ruling characteristics, which are simultaneously simple, robust, and accurate.
Procedures deriving from Artificial Intelligence studies have proved to be excellent tools for identifying highly nonlinear relationships between natural quantities in many areas [17,18]. Machine Learning algorithms have made it possible to develop highly accurate forecasting models in earth sciences applications [19,20,21,22,23,24,25,26,27,28]. In recent years these algorithms have been widely used to deal with problems of a quantitative and qualitative nature related to groundwater [29,30,31] as well as to model infiltration phenomena [32,33].
As regards the prediction of Ksat by means of Machine Learning algorithms, in recent years some papers of great value and merit have been published. Jorda et al. [34] investigated the key factors that affect saturated and near-saturated hydraulic conductivities in undisturbed soils with a database of tension infiltrometer measurements using boosted regression trees. The authors’ model predicted the hydraulic conductivity at a tension of 10 cm (K10) and the saturated hydraulic conductivity (Ksat) with low values of coefficient of determination. Araya & Ghezzehei [35] compared the results of four well-known machine learning algorithms and different input scenarios. The 10th percentile particle diameter turned out to be the most influential predictor followed by clay content, bulk density, and organic carbon content. The authors also evaluated the effects of structural perturbations on Ksat. Kotlar et al. [36] used parametric and non-parametric machine learning techniques to estimate saturated (Ks) and near-saturated (K10) hydraulic conductivities from easily quantifiable soil properties including soil fabric, organic matter, bulk density, and water content. The applied non-parametric supervised machine learning methods, namely Gaussian process regression, support vector machine, and an ensemble method, showed a significantly improved accuracy compared to the parametric methods when used, namely the stepwise linear model and Lasso regression.
Sihag et al. [37] focused on unsaturated hydraulic conductivity and developed prediction models based on the M5 tree model and Random Forest. In addition, a multivariate nonlinear regression relationship was obtained. In the study by Sihag et al., the Random Forest-based model outperformed both the M5-based model and the multivariate nonlinear regression relationship.
The goal of this study is to assess the effectiveness of some hybrid algorithms and demonstrate that they can outperform some of the more commonly used individual machine learning algorithms, enabling more accurate and reliable Ksat forecasting models to be developed. To the best of the authors’ knowledge, there is no such study in the technical literature. The Multilayer Perceptron, Random Forest, and Support Vector Regression algorithms were considered as basic algorithms to be hybridized and subsequently compared to the obtained hybrid models. These algorithms have been chosen because they have already proved reliable in solving the problem under study and because they have significantly different characteristics, which makes them suitable for a hybridization approach, as better specified below. Five different combinations of input variables were considered, in order to highlight which predictor has the greatest influence on the performance of the prediction models.

2. Methodology

2.1. Base Models

2.1.1. Multilayer Perceptron

A Multilayer Perceptron (MLP) is a simple feedforward Artificial Neural Network [38]. An MLP (Figure 1) includes three types of layers: an input layer, one or more hidden layers, and an output layer. The input layer comprises a set of nodes corresponding to the input features. Each neuron in the hidden layers processes the values of the previous layer with a weighted linear summation, followed by a non-linear activation function. The output layer obtains the values from the last hidden layer and provides the output values. The neurons in the MLP are trained with the supervised technique called back propagation learning algorithm. Based on a set of features and a target, MLP can train a non-linear function to execute regression operations.
In this research, the optimal structures of the neural networks had only one hidden layer, whose number of neurons was equal to (number of input variables + 1)/2. Sigmoid was chosen as activation function. The adopted learning rate was 0.3, while the selected momentum rate for the backpropagation algorithm was 0.2. A preliminary sensitivity analysis has shown that the model is not very sensitive to parameter variations.

2.1.2. Random Forest

A Random Forest (Figure 2) is an ensemble model consisting of many uncorrelated, simple regression trees [39]. Regression Trees derive from decision trees adapted to become forecasting models [40]. The internal nodes progressively define conditions in the input variables, while leaves represent the target variables. Developing a regression tree model is a process that involves recursively splitting the input domain data into subdomains. A multivariable linear regression model is used to achieve predictions in each subdomain.
The tree growth is an iterative procedure, which progresses by splitting each subset into smaller branches, assessing all the possible splits on every field, and determining at each step the subdivision into two separate partitions that leads to the minimum squared deviation:
R ( t ) = 1 N ( t ) i t ( y i y m ( t ) ) 2
where N(t) is the sample size in the node t, yi is the value of the target variable in the i-th unit, while ym is the average value of the target variable in the node t. R(t) represents a measure of the “impurity” at each node. The algorithm stops when a halt condition occurs. Reaching the lowest level of impurity is the most commonly used stopping rule.
The risk of overfitting is reduced by means of a pruning process, that decreases the size of the tree model by removing the splits that do not significantly improve the forecasting ability.
Based on a training dataset, each tree of the forest is built from a different bootstrap sample of the data. Furthermore, in Random Forests the growth process of a single tree is different, since each node is assigned not by referring to the best subdivision among all the input variables but by randomly choosing only a part of the variables to subdivide. The number of these variables does not change during the expansion of the forest. Each tree grows as much as possible, bound only by the assigned number of elements for each leaf, without pruning. The random forests used in this research were made of 600 trees.

2.1.3. Support Vector Regression

The idea behind the Support Vector Regression (SVR) algorithm is to identify a function f(x) with a maximum ε deviation from the experimental target values yi, and as flat as possible (Figure 3). Starting from a training dataset {(x1, y1), (x2, y2), …, (xl, yl)} ⊂ X × R, where X is the space of the input arrays (e.g., X ∈ Rn), and a linear function:
f ( x ) = w , x + b
where ∈ X and b ∈ R, the Euclidean norm ||w||2 needs to be minimized. This involves the solution of a constrained convex optimization problem.
In many cases it is necessary to accept a not very small error, thus slack variables ξι, ξι* need to be introduced in the constraints. Consequently, the optimization problem can be presented as follows:
minimize         1 2 w 2 + C i = 1 l ξ i + ξ i *
subject   to         y i w , x i b ε + ξ i w , x i + b y i ε + ξ i *
where the flatness of the function and the accepted deviations depend on the constant C > 0.
In order to make the SVR algorithm on linear, the training instances xi are pre-processed by a function Φ: X→F, where F is some feature space. Since SVR only depends on the dot products between the different instances, a kernel k ( x i , x j ) = Φ ( x i ) , Φ ( x j ) is used rather than explicitly employing the function Φ(∙).
In this study, the Pearson VII universal function kernel (PUFK) has been chosen:
k ( x i , x j ) = 1 1 + 2 x i x j 2 2 ( 1 / ω ) 1 / σ 2 ω
where the parameters σ and ω affect the half-width and the tailing factor of the peak. The optimal results have been obtained for σ = 0.5, ω = 0.5. Based on preliminary analyzes, it was found that the PUK function led to more accurate predictions than possible alternatives such as Radial Basis Function, Polynomial, or Sigmoid.

2.2. Hybrid Models and Evaluation Metrics

Based on the predictions obtained with the different algorithms, it is possible to develop hybrid models by combining conceptually different machine learning regressors to improve the modelling performances. A framework for the different rules for the combination of classifiers was given by Kittler et al. [41].
In this research the different regressors were combined using the average probabilities approach to obtain the final prediction. This approach, also known as soft voting, can be useful for a set of similarly performing models in order to balance out their individual weaknesses.
Individual models were optimized using a random search procedure. The values of the parameters adopted in the individual algorithms, i.e., MLP, Random Forest and SVM, within the hybrid model, were the same as reported in the previous sections.
Four different metrics were used to assess the effectiveness of the prediction models: the Coefficient of Determination R2, the Mean Absolute Error (MAE), the Root Mean Squared Error (RMSE), and the Relative Absolute Error (RAE).
R2 indicates the proportional amount of variation in the response variable explained by the independent variables. It assesses how the model fits observed results and how well it forecasts future outcomes, providing very good assessment of the model accuracy.
MAE evaluates the average magnitude of the errors in a set of predictions, without considering their direction.
RMSE is the sample standard deviation of the residuals. It measures the data concentration around the best-fit line.
RAE evaluates a normalized total absolute error. These performance metrics are defined as follows:
R 2 = 1 i = 1 m log 10 ( f i ) log 10 ( y i ) 2 i = 1 m log 10 ( y a ) log 10 ( y i ) 2
MAE = i = 1 m log 10 ( f i ) log 10 ( y i ) m
RMSE = i = 1 m log 10 ( f i ) log 10 ( y i ) 2 m
RAE = i = 1 m log 10 ( f i ) log 10 ( y i ) i = 1 m log 10 ( y a ) log 10 ( y i )
where m is the total number of observed data, fi is the predicted value for data point i, yi is the measured value for data point i, and ya is the averaged value of the observed data. The use of the four metrics defined above allows full characterization of the accuracy of the forecast models developed, as they measure the goodness of fit, absolute, and relative errors.

2.3. Training Dataset

The data used for the modelling were extracted from the Soil Water Infiltration Global (SWIG) database [42], a global database of soil infiltration measurements that also provides some Ksat values. SWIG database includes data from 54 different countries, with major contributions from China, Iran, and the USA, collected from 1976 to 2017. Records were extracted from the dataset considering only the cases that included all the variables of interest for this study. Here the fraction of Clay, Silt and Sand, the mean and standard deviation of soil particle diameter, the soil organic carbon content, the soil bulk density, and the saturated soil water content have been considered, insofar as a large part of data was discarded from the entire dataset. A complete statistical description of the assumed dataset, divided by texture classes, is reported in Table 1 and Table 2. The two tables are separated only for layout reasons. For each texture and for each characteristic of interest, the tables show the minimum, maximum and median values, the first and third quartile, mean, standard deviation, and skewness of the distribution. In the tables, data are grouped considering the main soil component.
The characterization of the training dataset is completed by Figure 4 and Figure 5. Figure 4 shows the training dataset composition with reference to soil texture. It can be noted that sandy loams represent by far the most prevalent type of soil, constituting almost 50% of the soils included in the dataset. Figure 5 shows hydraulic conductivity box plots for the different types of soil. It can be noted that all types have a rather limited variability of conductivity, except for sandy loams. Moreover, few data records are characterized by the conductivity range 10−3 < Ksat < 10−2 cm/h. These fall almost exclusively into the sandy clay loams.

3. Results

Based on different combinations of input variables, five models were built for the prediction of Ksat. Seven variants of each model were developed, changing the implemented machine learning algorithm. Model M1 is characterized by the following input variables: the Clay percentage, the Silt percentage, the Sand percentage, the geometric mean diameter dg (mm), the standard deviation of soil particle diameter Sg, the soil organic carbon content OC (%), the soil bulk density Db (g/cm3), and the saturated soil water content WCs (g/g).
Model M2 needs the following input variables: dg, Sg, OC, Db, and WCs. Model M3 requires as input the following quantities: dg, Sg, Db, and WCs. The M4 model is based on dg, Sg, OC, and Db. Finally, the simplest model, M5, requires only dg, Sg, and Db as input variables.
Each model was built through a k-fold cross validation procedure [43], using a set of 640 vectors. In k-fold cross validation, the initial dataset is randomly partitioned into k subsets. Then, k − 1 subsets are employed as training data while the remaining single subset is used as the validation data. The cross-validation process is repeated k times: every subset is used once as the validation dataset. Finally, the k results from the folds are averaged to provide a single outcome. In this study k = 20 led to optimal results. In order to improve the performance of model training, the input data underwent a normalization process (min-max feature scaling), to bring all values into the range [0, 1].
Table 3 and Figure 6 show a general summary of the results, in terms of the evaluation metrics.
Model M1 showed the best predictive capabilities. The hybrid models Hyb_MLP-RF-SVR (R2 = 0.829, MAE = 0.582 log10 (cm/h), RMSE = 0.802 log10 (cm/h), RAE = 57.19%) and Hyb_RF-SVR (R2 = 0.826, MAE = 0.562 log10 (cm/h), RMSE = 0.796 log10 (cm/h), RAE = 55.16%) led to the best outcomes. The two hybrid models Hyb_MLP-RF and Hyb_MLP-SVR showed forecasting capabilities comparable to those of the two models based on RF and VR. The MLP-based model was by far the least accurate (R2 = 0.632, MAE = 0.821 log10 (cm/h), RMSE = 1.079 log10 (cm/h), RAE = 80.63%).
Model M2 underperformed M1 in all its variants. In this case, Hyb_RF-SVR clearly outperformed the other variants. The Hyb_MLP-RF-SVR variant was more accurate than the other two hybrid variants, Hyb_MLP-SVR and Hyb_MLP-RF, while these in turn outperformed RF, SVR, and MLP.
The M3 model showed a further reduction in prediction accuracy, in all variants. The Hyb_RF-SVR variant again proved to be the best performing model (R2 = 0.759, MAE = 0.622 log10 (cm/h), RMSE = 0.910 log10 (cm/h), RAE = 61.04%). The hybrid models once again proved more accurate than the basic models, except for the Hyb_MLP-RF model (R2 = 0.687, MAE = 0.749 log10 (cm/h), RMSE = 1.026 log10 (cm/h), RAE = 73.65%), whose results were barely less accurate than the results provided by RF.
The accuracy of the M4 model was unsatisfactory. The Hyb_RF-SVR variant also, in this case, led to the best predictions, but the superiority of the hybrid models was not as clear as in the case of the M1 and M2 models; indeed, RF outperformed both Hyb_MLP-RF and Hyb_MLP-SVR.
The M5 model led to somewhat poor results. Even the most accurate of the variants, again represented by Hyb_RF-SVR, was characterized by unsatisfactory values of the efficiency metrics (R2 = 0.595, MAE = 0.848 log10 (cm/h), RMSE = 1.164 log10 (cm/h), RAE = 83.37%).
Figure 7, which reports the predicted values versus the observed values for the M1 model, shows that all variants have had better accuracy in the range 100 < Ksat <102 cm/h. Likewise, all variants showed a tendency to overestimate Ksat in the range 10−3 < Ksat < 10−2 cm/h. The reason for this unsatisfactory result lies both in the more limited number of training data falling within this interval, and in the greater heterogeneity of the same as regards the values of the predictors. This trend also characterized the other models with worse performances. The diagrams have not been reported for the sake of brevity.
From the graphs of Figure 8, which show the box plots of the absolute errors = predicted values—actual values, the following can be deduced:
-
All variants of the M1 and M2 models have a negligible bias. A more appreciable, albeit slight bias is observed in the SVR and MLP based variants of the M4 and M5 models.
-
The Hyb_MLP-RF-SVR and Hyb_RF-SVR variants are characterized by the lowest variance of the absolute error within all the considered models, in particular within the M1 and M2 models.
-
Model M1 shows the lowest number of outliers.
-
The distribution of the error in all variants of the M3, M4, and especially M5 models, is clearly asymmetrical.
These results help to better understand the above in terms of metrics analysis.
In order to further highlight the effectiveness of the approach based on machine learning algorithms, a comparison with a classic formulation for the estimation of Ksat is proposed below. The prediction with theoretical, empirical or semi-empirical equations that relate the saturated conductivity coefficient of porous materials to physical properties of the seeping fluid and soil assembly is a classical goal of research. Starting from the Hagen–Poiseuille equation that describes the flow of a fluid in capillary pipes, the Kozeny–Carman equation [7,44] is among the first written equations:
K s a t = ρ w g μ w n 3 C S o 2 ( 1 n ) 2
where ρw and μw are respectively the density and viscosity of water (set equal to ρw = 1000 kg/m3; μw = 0.001 Pa*s)), g is gravity, n is the soil porosity, and So is the surface of soil particles per unit volume. Alternatively, the equation can be expressed as:
K s a t = ρ w g μ w e C ( 1 e ) 2 D g 2
where e is the void ratio (=n/(1 − n)), Dg is geometric mean particle size, obtained by subdividing the grain size distribution into l classes and computing:
D g = exp 1 l f i ln ( d i )
where fi and di are respectively the fraction of contained material and the representative diameter of each class. The coefficient C (equal generally to 180) can be particularized including a dependency on the particle shapes expressed by a sphericity factor. Carman [8] and other researchers showed that this equation is quite effective in estimating permeability for coarse-grained soils.
On the other hand, experimental evidence does not confirm the validity of this relationship for clay soils. Taylor [45] ascribed this difference to the reduction in the effective pore space available for the free flow of fluid due to the film of water attached to the surfaces of clay particles. Olsen [46] considered the difference between water conductivity measured in saturated clay and the values predicted with the Kozeny-Carman relation to the heterogenous pore size distribution of clay materials. Chapuis and Aubertin [47] adopted the Specific Surface Area to predict the vertical permeability coefficient of a homogeneous soil. Ren et al. [48] introduced the concept of effective void ratio subdividing the total volume of voids into two parts, one effective ee occupied by flowing water, the other ineffective ei occupied by immobile water, i.e., attached to the soil particles of located closed pores. These authors proposed the following relation between effective and total void ratio:
e e = e e 1 + e m
where m is a non-negative constant ranging between 0 and 2 (m = 0.05 ± 0.05 for sandy soil, m = 1 ± 0.2 for silty soil, m = 1.5 ± 0.5 for clay).
In the present work, the permeability coefficient has been computed with the following formula extracted from Hong et al. (2020):
K s a t = 1 C ρ w g μ w 1 S o 2 ρ s 2 e e 2 ( 1 + e i ) 2 / 3
Considering the available database, ρs has been fixed as equal to 2650 kg/m3, the effective void ratio ee has been computed with Equation (13), setting the exponent m equal to 1.5, i.e., considering the relevant presence in each dataset of silt and clay components; the ineffective void ratio has been computed as ei = e − ee. The specific surface So has been evaluated as function of the clay fraction, adopting the mean curve among the data collected by Hong et al. [49].
The comparison between the results obtained with the M1 model, Hyb_MLP-RF-SVR variant, and those obtained with the Kozeny-Carman formulation is shown in Figure 9. For the predictions obtained with the Kozeny-Carman equation, the following values of the metrics considered above were found: R2 = 0.187, MAE = 3.52 log10 (cm/h), RMSE = 6.23 log10 (cm/h), RAE = 346%). It is quite evident that an approach based on machine learning algorithms is significantly more effective than a classic approach based on formulations deriving from the studies of Kozeny, Carman, and subsequently. The better performance justifies the greater complexity of the forecasting tool.

4. Discussion

The combination of the different predictors is of considerable importance to the accuracy of Ksat estimate based on the physical characteristics of soil. It is essential that the number of input variables is sufficiently representative of the soil characteristics. The detailed knowledge of the soil grain size distribution, and particularly of the fractions of clay, silt, and sand, together with the mean and standard deviation of soil particle diameter, is a fundamental starting point for a Ksat prediction with machine learning algorithms. However, some preliminary analyses have shown that the parameters obtainable from the grain size distribution curve alone do not enable a sufficiently accurate model, therefore the relative results have not been shown here. Knowledge of additional parameters such as the soil organic content, bulk density, and the saturated soil water content is essential to improve the accuracy of predicting models. Similarly, models based only on global geometric parameters such as dg, Sg, and Db fail to provide acceptable results.
Hybrid models have appreciably outperformed individual base models in predicting Ksat in the more complex cases of models characterized by a greater number of predictors (e.g., M1 and M2). This result agrees with those obtained by other scholars in the context of relevant scientific contributions on other topics. Pham and Prakash [50] proposed a novel use of bagging-based naïve Bayes trees for the assessment of landslide susceptibility. The developed hybrid model was compared to individual models including Rotation forest-based Naïve Bayes Trees, Naïve Bayes Trees, and SVM. The hybrid method proved to be the most accurate model for the assessment of landslide vulnerability, increasing the accuracy of the standalone models. Wu et al. [51] proposed a hybrid model to forecast electricity load in five states of Australia. The developed model included an advanced integration of Extreme Learning Machine, ensemble empirical mode decomposition, and grasshopper optimization algorithm. The hybrid model was compared to some base models in terms of RMSE, MAE and Mean Absolute Percentage Error (MAPE), showing a higher performance and accuracy. Bui et al. [52] used four individual (random forest, M5P, random tree, and reduced error pruning tree) and 12 hybrid ML algorithms to predict water quality indices in a humid catchment of northern Iran. The results of the hybrid models, compared to the individual algorithms, showed that they had improved prediction accuracies, but may not be as successful in all cases.
The above-mentioned literature shows that hybrid methods are becoming more and more popular due capability in improving prediction performance. Hybrid machine learning leads to the best performance when the underlying models are not correlated. For instance, it is possible to train different models such as regression trees, neural network, and support vector machines on different datasets or features. The less correlated the base models are, the better the forecasting performance. The idea behind using uncorrelated models is that each could address a weakness in the other. They also have different strengths which, when combined, will result in a good performing estimator.
Despite the much smaller size of the training dataset, and the smaller number of considered predictors, the predictive ability of the hybrid models developed in this study is close to that of the best models developed by Araya and Ghezzehei [35]. A further comparison with Jorda et al. [34] and Kotlar et al. [36] supports even further the need to train the models with large and varied datasets; otherwise, accuracy of prediction may become unsatisfactory. This aspect might be seen as a main weakness of this study: a too broad classification of the soil types, mostly for those characterized by a very low conductivity, had a significant negative impact on the overall performance of the model. Additionally, the insufficient size of the training dataset for some soil categories might play a negative role too, as well as the predominant presence of data relating to sandy loam samples in the initial dataset. Another factor that negatively affects the performance of prediction is the heterogeneity of the training dataset. Permeability coefficients have been obtained under very different experimental conditions, generally aimed at evaluating the infiltration rate. This variety introduces a considerable noise into the estimate of Ksat as shown by the large variability of results within each soil category. However, the above factors, out of control in the present analysis, negatively impact on the training and performance of any predictive model. In the authors’ opinion, the interpretation of dependencies inherent in the proposed model might serve also to create new databases with a more coherent categorization of soil types, and to the more appropriate definition of relevant variables. In addition, the availability in the future of a dataset as homogeneous as possible as regards the Ksat estimation method represents a necessary condition for obtaining significant improvements in the forecasting capabilities of models based on Machine Learning algorithms.
Future developments of this research will be aimed at further improving the accuracy of forecasting models, especially for soils characterized by low hydraulic conductivity, considering larger and more varied training datasets, a greater number of predictors and hybridizing different basic algorithms. In addition, it could be useful to develop different predictive models for coarse-grained and fine-grained soils, given the considerable differences in the seepage processes observed in them.

5. Conclusions

An accurate prediction of the hydraulic conductivity of a saturated soil is essential to address groundwater issues. If reliable data are available, machine learning algorithms are powerful tools to obtain good predictions. In addition, hybrid models resulting from the combination of multiple machine learning algorithms can further improve the performance of individual models.
In this study, five different models were developed to predict saturated hydraulic conductivity starting from a dataset extracted from the Soil Water Infiltration Global database. The models differed in the input variables. Seven variants of each model were compared, changing the employed algorithm. Three variants were based on individual models, while four variants were based on hybrid models. The selected individual machine learning algorithms were Multilayer Perceptron, Random Forest, and Support Vector Regression.
Model M1, which requires as input variables the clay percentage, the silt percentage, the sand percentage, the geometric mean diameter, the standard deviation of soil particle diameter, the soil organic carbon content, the soil bulk density, and the saturated soil water content, led to the most accurate results. The M4 and M5 models, based on a limited number of soil characteristics, gave unsatisfactory results.
Across all models, hybrid variants based on all three algorithms and hybridized variants of Random Forest and Support Vector Regression provided the most accurate predictions. However, all variants showed a tendency to overestimate Ksat in the range 10−3 < Ksat < 10−2 cm/h, due to the reduced number of training data falling within this interval, and in the high heterogeneity of the same data as concerns the values of the predictors.
A comparison with the classic Kozeny-Carman formulation further demonstrated the convenience of an approach based on machine learning algorithms, given the significantly higher performance.

Author Contributions

F.G.: Conceptualization, data curation, formal analysis, methodology, software, supervision, writing—original draft, writing—review and editing; F.D.N.: conceptualization, methodology, writing—original draft, writing—review and editing; G.M.: conceptualization, methodology, writing—original draft, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and analysed during the current study are available in the repository: https://doi.pangaea.de/10.1594/PANGAEA.885492 (last accessed on 31 March 2022).

Conflicts of Interest

The authors declare that they have no competing interests.

References

  1. Adamowski, J.; Chan, H.F. A wavelet neural network conjunction model for groundwater level forecasting. J. Hydrol. 2011, 407, 28–40. [Google Scholar] [CrossRef]
  2. Alamanis, N.; Papageorgiou, G.; Chantzopoulou, P.; Chouliaras, I. Investigation on the influence of permeability coefficient k of the soil mass on construction settlements. Cases of infrastructure settlements in Greece. Wseas Trans. Environ. Dev. 2019, 15, 95–105. [Google Scholar]
  3. Alyamani, M.S.; Şen, Z. Determination of hydraulic conductivity from complete grain-size distribution curves. Groundwater 1993, 31, 551–555. [Google Scholar] [CrossRef]
  4. Angelaki, A.; Singh Nain, S.; Singh, V.; Sihag, P. Estimation of models for cumulative infiltration of soil using machine learning methods. ISH J. Hydraul. Eng. 2021, 27, 162–169. [Google Scholar] [CrossRef]
  5. Araya, S.N.; Ghezzehei, T.A. Using machine learning for prediction of saturated hydraulic conductivity and its sensitivity to soil structural perturbations. Water Resour. Res. 2019, 55, 5715–5737. [Google Scholar] [CrossRef]
  6. Azamathulla, H.M.; Wu, F.C. Support vector machine approach for longitudinal dispersion coefficients in natural streams. Appl. Soft Comput. 2011, 11, 2902–2905. [Google Scholar] [CrossRef]
  7. Boadu, F.K. Hydraulic conductivity of soils from grain-size distribution: New models. J. Geotech. Geoenviron. Eng. 2000, 126, 739–746. [Google Scholar] [CrossRef]
  8. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  9. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: Abingdon, UK, 2017. [Google Scholar]
  10. Bui, D.T.; Khosravi, K.; Tiefenbacher, J.; Nguyen, H.; Kazakis, N. Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci. Total Environ. 2020, 721, 137612. [Google Scholar] [CrossRef]
  11. Carman, P.C. Permeability of saturated sands, soils and clays. J. Agric. Sci. 1939, 29, 263–273. [Google Scholar] [CrossRef]
  12. Carman, P.C. Flow of Gas through Porous Media; Butterworths Scientific Publications: London, UK, 1956. [Google Scholar]
  13. Chapuis, R.P. Predicting the saturated hydraulic conductivity of soils: A review. Bull. Eng. Geol. Environ. 2012, 71, 401–434. [Google Scholar] [CrossRef]
  14. Chapuis, R.P.; Aubertin, M. On the use of the Kozeny Carman equation to predict the hydraulic conductivity of soils. Can. Geotech. J. 2003, 40, 616–628. [Google Scholar] [CrossRef]
  15. Crawford, J.W. The relationship between structure and the hydraulic conductivity of soil. Eur. J. Soil Sci. 1994, 45, 493–502. [Google Scholar] [CrossRef]
  16. Di Nunno, F.; Granata, F. Groundwater level prediction in Apulia region (Southern Italy) using NARX neural network. Environ. Res. 2020, 190, 110062. [Google Scholar] [CrossRef] [PubMed]
  17. Freeze, R.A.; Cherry, J.A. Groundwater; Prentice Hall Inc.: Hoboken, NJ, USA, 1979. [Google Scholar]
  18. Fushiki, T. Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 2011, 21, 137–146. [Google Scholar] [CrossRef]
  19. Granata, F. Evapotranspiration evaluation models based on machine learning algorithms—A comparative study. Agric. Water Manag. 2019, 217, 303–315. [Google Scholar] [CrossRef]
  20. Granata, F.; Di Nunno, F. Artificial Intelligence models for prediction of the tide level in Venice. Stoch. Environ. Res. Risk Assess. 2021, 35, 2537–2548. [Google Scholar] [CrossRef]
  21. Granata, F.; Di Nunno, F. Forecasting evapotranspiration in different climates using ensembles of recurrent neural networks. Agric. Water Manag. 2021, 255, 107040. [Google Scholar] [CrossRef]
  22. Han, H.; Giménez, D.; Lilly, A. Textural averages of saturated soil hydraulic conductivity predicted from water retention data. Geoderma 2008, 146, 121–128. [Google Scholar] [CrossRef]
  23. Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall Inc.: Hoboken, NJ, USA, 1994. [Google Scholar]
  24. Hu, W.; She, D.; Shao, M.; Chun, K.P.; Si, B. Effects of initial soil water content and saturated hydraulic conductivity variability on small watershed runoff simulation using LISEM. Hydrol. Sci. J. 2015, 60, 1137–1154. [Google Scholar] [CrossRef]
  25. Hong, B.; Li, X.A.; Wang, L.; Li, L.; Xue, Q.; Meng, J. Using the effective void ratio and specific surface area in the Kozeny–Carman equation to predict the hydraulic conductivity of loess. Water 2020, 12, 24. [Google Scholar] [CrossRef] [Green Version]
  26. Jabro, J.D. Estimation of saturated hydraulic conductivity of soils from particle size distribution and bulk density data. Trans. ASAE 1992, 35, 557–560. [Google Scholar] [CrossRef]
  27. Jorda, H.; Bechtold, M.; Jarvis, N.; Koestel, J. Using boosted regression trees to explore key factors controlling saturated and near-saturated hydraulic conductivity. Eur. J. Soil Sci. 2015, 66, 744–756. [Google Scholar] [CrossRef]
  28. Kişi, Ö. Streamflow forecasting using different artificial neural network algorithms. J. Hydrol. Eng. 2007, 12, 532–539. [Google Scholar] [CrossRef]
  29. Kittler, J.; Hatef, M.; Duin, R.P.W.; Matas, J. On Combining Classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 226–239. [Google Scholar] [CrossRef] [Green Version]
  30. Knoll, L.; Breuer, L.; Bach, M. Large scale prediction of groundwater nitrate concentrations from spatial data using machine learning. Sci. Total Environ. 2019, 668, 1317–1327. [Google Scholar] [CrossRef]
  31. Kotlar, A.M.; Iversen, B.V.; de Jong van Lier, Q. Evaluation of parametric and nonparametric machine-learning techniques for prediction of saturated and near-saturated hydraulic conductivity. Vadose Zone J. 2019, 18, 1–13. [Google Scholar] [CrossRef]
  32. Kozeny, J. Ueber kapillare Leitung des Wassers im Boden. Sitzungsberichte Wiener Akademie 1927, 136, 271–306. [Google Scholar]
  33. Kumar, M.; Sihag, P. Assessment of infiltration rate of soil using empirical and machine learning-based models. Irrig. Drain. 2019, 68, 588–601. [Google Scholar] [CrossRef]
  34. Modoni, G.; Darini, G.; Spacagna, R.L.; Saroli, M.; Russo, G.; Croce, P. Spatial analysis of subsidence induced by groundwater withdrawal. Eng. Geol. 2013, 167, 59–71. [Google Scholar] [CrossRef]
  35. Montzka, C.; Herbst, M.; Weihermüller, L.; Verhoef, A.; Vereecken, H. A global data set of soil hydraulic properties and sub-grid variability of soil water retention and hydraulic conductivity curves. Earth Syst. Sci. Data 2017, 9, 529–543. [Google Scholar] [CrossRef] [Green Version]
  36. Najafzadeh, M.; Etemad-Shahidi, A.; Lim, S.Y. Scour prediction in long contractions using ANFIS and SVM. Ocean Eng. 2016, 111, 128–135. [Google Scholar] [CrossRef]
  37. Najafzadeh, M.; Oliveto, G. Riprap incipient motion for overtopping flows with machine learning models. J. Hydroinform. 2020, 22, 749–767. [Google Scholar] [CrossRef]
  38. Odong, J. Evaluation of empirical formulae for determination of hydraulic conductivity based on grain-size analysis. J. Am. Sci. 2007, 3, 54–60. [Google Scholar]
  39. Olsen, H.W. Hydraulic flow through saturated clays. In Clays Clay Miner; Ingerson, E., Ed.; Elsevier: Amsterdam, The Netherlands, 1962; pp. 131–161. [Google Scholar]
  40. Pham, B.T.; Prakash, I. A novel hybrid model of bagging-based naïve bayes trees for landslide susceptibility assessment. Bull. Eng. Geol. Environ. 2019, 78, 1911–1925. [Google Scholar] [CrossRef]
  41. Rahmati, M.; Weihermüller, L.; Vanderborght, J.; Pachepsky, Y.A.; Mao, L.; Sadeghi, S.H.; Moosavi, N.; Kheirfam, H.; Montzka, C.; Van Looy, K.; et al. Development and analysis of the Soil Water Infiltration Global database. Earth Syst. Sci. Data 2018, 10, 1237–1263. [Google Scholar] [CrossRef] [Green Version]
  42. Ren, X.; Zhao, Y.; Deng, Q.; Kang, J.; Li, D.; Wang, D. A relation of hydraulic conductivity—Void ratio for soils based on Kozeny-Carman equation. Eng. Geol. 2016, 213, 89–97. [Google Scholar] [CrossRef]
  43. Saberi-Movahed, F.; Najafzadeh, M.; Mehrpooya, A. Receiving more accurate predictions for longitudinal dispersion coefficients in water pipelines: Training group method of data handling using extreme learning machine conceptions. Water Resour. Manag. 2020, 34, 529–561. [Google Scholar] [CrossRef]
  44. Sammen, S.S.; Ghorbani, M.A.; Malik, A.; Tikhamarine, Y.; AmirRahmani, M.; Al-Ansari, N.; Chau, K.W. Enhanced artificial neural network with Harris hawks optimization for predicting scour depth downstream of ski-jump spillway. Appl. Sci. 2020, 10, 5160. [Google Scholar] [CrossRef]
  45. Sihag, P.; Karimi, S.M.; Angelaki, A. Random forest, M5P and regression analysis to estimate the field unsaturated hydraulic conductivity. Appl. Water Sci. 2019, 9, 129. [Google Scholar] [CrossRef] [Green Version]
  46. Sihag, P.; Dursun, O.F.; Sammen, S.S.; Malik, A.; Chauhan, A. Prediction of aeration efficiency of parshall and modified venturi flumes: Application of soft computing versus regression models. Water Supply 2021, 21, 4068–4085. [Google Scholar] [CrossRef]
  47. Singh, U.K.; Jamei, M.; Karbasi, M.; Malik, A.; Pandey, M. Application of a modern multi-level ensemble approach for the estimation of critical shear stress in cohesive sediment mixture. J. Hydrol. 2022, 607, 127549. [Google Scholar] [CrossRef]
  48. Taylor, D.W. Fundamentals of Soil Mechanics; Wiley: New York, NY, USA, 1948; p. 12. [Google Scholar]
  49. Todd, D.K.; Mays, L.W. Groundwater Hydrology; Wiley: New York, NY, USA, 2004; p. 659. [Google Scholar]
  50. Wang, W.C.; Chau, K.W.; Cheng, C.T.; Qiu, L. A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. J. Hydrol. 2009, 374, 294–306. [Google Scholar] [CrossRef] [Green Version]
  51. Woolhiser, D.A.; Smith, R.E.; Giraldez, J.V. Effects of spatial variability of saturated hydraulic conductivity on Hortonian overland flow. Water Resour. Res. 1996, 32, 671–678. [Google Scholar] [CrossRef]
  52. Wu, J.; Cui, Z.; Chen, Y.; Kong, D.; Wang, Y.G. A new hybrid model to predict the electrical load in five states of Australia. Energy 2019, 166, 598–609. [Google Scholar] [CrossRef]
Figure 1. Typical structure of a Multilayer Perceptron.
Figure 1. Typical structure of a Multilayer Perceptron.
Water 14 01729 g001
Figure 2. Typical architecture of the Random Forest algorithm.
Figure 2. Typical architecture of the Random Forest algorithm.
Water 14 01729 g002
Figure 3. Example of Support Vector Regression. Errors can be neglected if they are less than ε, while larger deviations are penalized.
Figure 3. Example of Support Vector Regression. Errors can be neglected if they are less than ε, while larger deviations are penalized.
Water 14 01729 g003
Figure 4. Training dataset composition with reference to soil texture.
Figure 4. Training dataset composition with reference to soil texture.
Water 14 01729 g004
Figure 5. Hydraulic conductivity box plots for the different types of soil.
Figure 5. Hydraulic conductivity box plots for the different types of soil.
Water 14 01729 g005
Figure 6. Radar charts of the error metrics (left column) and histograms of the coefficients of determination (right column).
Figure 6. Radar charts of the error metrics (left column) and histograms of the coefficients of determination (right column).
Water 14 01729 g006aWater 14 01729 g006b
Figure 7. Hydraulic conductivities predicted versus observed for the different variants of the M1 model.
Figure 7. Hydraulic conductivities predicted versus observed for the different variants of the M1 model.
Water 14 01729 g007
Figure 8. Box plots of the absolute errors in all models and variants.
Figure 8. Box plots of the absolute errors in all models and variants.
Water 14 01729 g008
Figure 9. Comparison between the results obtained with the M1 model, Hyb_MLP-RF-SVR variant, and those obtained with the Kozeny-Carman formulation.
Figure 9. Comparison between the results obtained with the M1 model, Hyb_MLP-RF-SVR variant, and those obtained with the Kozeny-Carman formulation.
Water 14 01729 g009
Table 1. Characteristics of the training dataset (1/2).
Table 1. Characteristics of the training dataset (1/2).
ClaySiltSanddgSgOCDbWC_sLog(Ksat)
[%][%][%][mm] [%][g/cm3][cm3/cm3]Log [cm/hr]
ClayMinimum value40.409.04.600.0026.1470.6500.4610.2170.014
1st Quartile48.50035.09.5250.0059.3393.4130.7540.3260.423
Median51.00037.311.70.00710.3834.3500.9770.3970.777
3rd Quartile55.80038.815.3750.00911.9146.2301.1010.4810.892
Maximum value80.00039.836.00.02421.52011.5721.4680.5901.174
Mean53.55733.66112.7770.00810.8465.2120.9630.4020.668
Standard Deviation9.0718.8186.6600.0043.1142.6610.2420.1020.322
Skewness1.359−1.8121.6562.0281.6031.0440.1270.178−0.887
Silty ClayMinimum value44.90040.51.00.0055.4902.2300.6870.232−0.095
1st Quartile45.20043.58.5250.0088.5452.2300.8610.2860.197
Median45.40043.510.00.0099.2064.6300.9730.3480.777
3rd Quartile45.55046.211.10.0099.7164.9101.2830.4081.457
Maximum value55.80046.413.50.01010.8498.6801.5800.4712.718
Mean46.92243.8839.1940.0088.9054.2401.0640.3480.956
Standard Deviation3.8332.1303.3570.0021.4242.0530.2680.0740.859
Skewness1.976−0.329−1.180−1.618−1.0160.9270.348−0.0730.638
Silty Clay LoamMinimum value27.40442.9693.8720.0086.3390.6900.7580.0150.626
1st Quartile29.31947.213.390.0129.0321.6001.0620.1610.946
Median35.40049.015.090.01510.0172.3711.2020.3931.172
3rd Quartile38.10055.72317.1450.01810.7304.1101.3100.4851.502
Maximum value39.65163.73419.700.02012.4216.7801.4760.5492.787
Mean34.06851.63314.3000.0159.7862.7861.1840.3191.349
Standard Deviation4.3986.1644.4460.0031.8001.5900.1900.1870.592
Skewness−0.3870.326−1.1470.065−0.6230.763−0.353−0.4211.199
Clay LoamMinimum value27.00321.92620.700.01611.3020.5770.6170.0300.134
1st Quartile29.00039.57523.00.01913.1222.1650.9820.3560.572
Median30.60040.525.950.02413.7283.0451.1730.4330.759
3rd Quartile34.83044.02930.8000.03114.6383.5701.4060.5380.969
Maximum value38.30050.64743.4030.05021.2525.2001.5310.6481.265
Mean31.97140.88827.1410.02614.1102.9381.1900.4330.726
Standard Deviation3.3475.7476.1390.0081.9801.1980.2320.1350.344
Skewness0.308−1.0971.3091.4421.477−0.094−0.452−0.832−0.318
Sandy Clay LoamMinimum value20.00010.55545.3790.05515.8400.2931.0310.336−2.870
1st Quartile20.96918.67151.9450.07616.4350.7411.3410.449−2.588
Median22.75821.91053.3750.09217.3681.3891.3990.467−2.448
3rd Quartile27.07625.78456.9260.10319.4366.5061.4490.496−2.125
Maximum value32.27527.31868.3350.16122.4479.6141.5700.5770.915
Mean24.05721.91754.0260.08918.0273.3331.3800.469−2.046
Standard Deviation3.9684.1534.4220.0231.9703.3580.1130.0500.969
Skewness0.915−0.6310.8750.7670.9340.813−1.091−0.2471.917
Table 2. Characteristics of the training dataset (2/2).
Table 2. Characteristics of the training dataset (2/2).
ClaySiltSanddgSgOCDbWC_sLog(Ksat)
[%][%][%][mm] [%][g/cm3][cm3/cm3]Log [cm/hr]
LoamMinimum value8.87028.99326.810.0309.9900.0980.8750.006−1.699
1st Quartile15.60335.76535.4400.05011.9061.0151.3040.2820.156
Median18.63141.00841.9850.06512.5051.6581.3700.4610.585
3rd Quartile22.50245.49645.5630.08614.0772.5211.4480.5090.916
Maximum value25.53549.48851.9590.12317.1765.9681.6530.6791.687
Mean18.63740.49740.8660.06713.0871.9021.3610.3890.521
Standard Deviation4.1375.9136.6010.0211.6511.1810.1640.1890.482
Skewness−0.042−0.353−0.2640.2860.6461.013−0.979−0.963−1.055
Silty LoamMinimum value2.02950.0112.300.0173.8621.0200.3420.0120.057
1st Quartile18.17652.02021.9150.0269.3341.9231.2890.2500.681
Median21.50453.94024.8400.03210.3792.1901.4140.3720.891
3rd Quartile22.73257.44527.6500.04010.9852.4971.4870.4791.086
Maximum value26.78681.60034.3200.07411.61087.9001.6580.8712.153
Mean19.76256.09224.1460.0359.8468.2861.3340.3520.930
Standard Deviation5.8286.5726.0010.0131.75222.3530.3140.2230.432
Skewness−1.9262.102−1.1831.278−1.8223.450−2.3140.5110.631
Sandy LoamMinimum value3.0946.98452.200.0956.8740.1950.4720.032−3.481
1st Quartile10.3018.00659.760.14610.5550.7521.2130.378−0.275
Median11.66721.90066.900.20711.2061.2931.3600.4760.564
3rd Quartile15.27125.85669.600.24013.3973.4901.5030.5281.637
Maximum value19.95438.39779.5370.34916.1959.8971.8520.7403.478
Mean12.52622.22565.2490.20011.8522.3831.3140.4600.375
Standard Deviation3.3675.6966.9150.0641.9682.2520.2630.1011.712
Skewness0.2980.055−0.0740.3440.2641.503−0.937−0.670−0.550
Loamy SandMinimum value0.6849.27974.8700.3594.2770.4801.0100.211−0.614
1st Quartile1.02314.60080.3460.3994.3572.4391.4080.344−0.166
Median1.02315.40783.5700.5424.3575.0001.7240.3880.007
3rd Quartile5.55915.40783.5700.5427.0975.0001.9140.4190.111
Maximum value9.37822.28386.3290.5558.9609.9701.9580.5250.976
Mean3.12014.85582.0250.4855.5714.4281.6370.3880.040
Standard Deviation2.7602.3962.7350.0781.5982.4060.2720.0650.319
Skewness0.866−0.067−0.971−0.6770.8060.431−0.513−0.1191.092
SandMinimum value0.1590.0096.0640.8712.0150.0900.8430.400−0.706
1st Quartile0.1930.59196.6530.8812.0548.0030.8430.481−0.155
Median0.6532.17097.0860.8922.2088.5001.0420.6070.123
3rd Quartile1.7433.18197.6170.9012.5778.5001.3750.6820.281
Maximum value2.3443.73197.6560.9092.8558.7661.6100.6820.915
Mean0.9921.94297.0320.8912.3327.0321.1330.5740.090
Standard Deviation0.9731.6090.6650.0150.3543.4150.3390.1260.545
Skewness0.582−0.150−0.434−0.1510.791−2.4060.463−0.4250.084
Table 3. Summary of the results.
Table 3. Summary of the results.
ModelInput VariablesAlgorithmR2MAE
Log10 [cm/h]
RMSE
Log10 [cm/h]
RAE
M1Clay, Silt, Sand, dg, Sg, OC, Db, WCsHyb_MLP-RF-SVR0.8290.5820.80257.19%
Hyb_RF-SVR0.8260.5620.79655.16%
Hyb_MLP-SVR0.7550.6830.92167.02%
Hyb_MLP-RF0.8030.6420.86163.05%
SVR0.7660.6370.89862.51%
RF0.7730.6770.92966.46%
MLP0.6320.8211.07980.63%
M2dg, Sg, OC, Db, WCsHyb_MLP-RF-SVR0.7860.6340.88462.29%
Hyb_RF-SVR0.8020.5720.83856.19%
Hyb_MLP-SVR0.7470.6840.93767.15%
Hyb_MLP-RF0.7440.6990.95568.76%
SVR0.6850.7211.01970.82%
RF0.7350.6890.97967.72%
MLP0.5510.8821.16485.58%
M3dg, Sg, Db, WCsHyb_MLP-RF-SVR0.7370.6810.95666.96%
Hyb_RF-SVR0.7590.6220.91061.04%
Hyb_MLP-SVR0.7030.7240.99971.07%
Hyb_MLP-RF0.6870.7491.02673.65%
SVR0.6470.7481.06973.51%
RF0.6880.7371.03572.40%
MLP0.4840.9181.22190.19%
M4dg, Sg, OC, DbHyb_MLP-RF-SVR0.6310.7931.08477.89%
Hyb_RF-SVR0.6380.7621.07574.79%
Hyb_MLP-SVR0.590.8291.12681.49%
Hyb_MLP-RF0.6060.8311.11181.61%
SVR0.5540.8271.18881.26%
RF0.6190.7751.10176.14%
MLP0.4430.9571.25293.95%
M5dg, Sg, DbHyb_MLP-RF-SVR0.5740.8561.14284.07%
Hyb_RF-SVR0.5950.8481.16483.37%
Hyb_MLP-SVR0.5610.8611.15584.46%
Hyb_MLP-RF0.5620.8841.15286.79%
SVR0.5060.8511.23583.47%
RF0.5350.8891.19787.26%
MLP0.4970.9411.20892.32%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Granata, F.; Di Nunno, F.; Modoni, G. Hybrid Machine Learning Models for Soil Saturated Conductivity Prediction. Water 2022, 14, 1729. https://doi.org/10.3390/w14111729

AMA Style

Granata F, Di Nunno F, Modoni G. Hybrid Machine Learning Models for Soil Saturated Conductivity Prediction. Water. 2022; 14(11):1729. https://doi.org/10.3390/w14111729

Chicago/Turabian Style

Granata, Francesco, Fabio Di Nunno, and Giuseppe Modoni. 2022. "Hybrid Machine Learning Models for Soil Saturated Conductivity Prediction" Water 14, no. 11: 1729. https://doi.org/10.3390/w14111729

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop