Next Article in Journal
The Effectiveness of Rainwater Harvesting Infrastructure in a Mediterranean Island
Next Article in Special Issue
The Robust Study of Deep Learning Recursive Neural Network for Predicting of Turbidity of Water
Previous Article in Journal
Morphology Recovery and Convergence of Topographic Evolution in the Natori River Mouth after the 2011 Tohoku Tsunami
Previous Article in Special Issue
Evaluating the Landslide Stability and Vegetation Recovery: Case Studies in the Tsengwen Reservoir Watershed in Taiwan
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Performance Assessment of Event-Based Ensemble Landslide Susceptibility Models in Shihmen Watershed, Taiwan

Department of Soil and Water Conservation, National Chung Hsing University, Taichung 402, Taiwan
*
Author to whom correspondence should be addressed.
Water 2022, 14(5), 717; https://doi.org/10.3390/w14050717
Submission received: 29 January 2022 / Revised: 20 February 2022 / Accepted: 21 February 2022 / Published: 24 February 2022

Abstract

:
While multi-year and event-based landslide inventories are both commonly used in landslide susceptibility analysis, most areas lack multi-year landslide inventories, and the analysis results obtained from the use of event-based landslide inventories are very sensitive to the choice of event. Based on 24 event-based landslide inventories for the Shihmen watershed from 1996 to 2015, this study established five event-based single landslide susceptibility models employing logistic regression, random forest, support vector machine, kernel logistic regression, and gradient boosting decision tree methods. The ensemble methods, involving calculating the mean of the susceptibility indexes (PM), median of the susceptibility indexes (PME), weighted mean of the susceptibility indexes (PMW), and committee average of binary susceptibility values (CA) of the five single models were then used to establish four event-based ensemble landslide susceptibility models. After establishing nine landslide susceptibility models, using each inventory from the 24 event-based landslide inventories or a multi-year landslide inventory, we identified the differences in landslide susceptibility maps attributable to the different landslide inventories and modeling methods, and used the area under the receiver operating characteristic curve to assess the accuracy of the models. The results indicated that an ensemble model based on a multi-year inventory can obtain excellent predictive accuracy. The predictive accuracy of multi-year landslide susceptibility models is found to be superior to that of event-based models. In addition, the higher predictive accuracy of ensemble landslide susceptibility models than that of single models implied that these ensemble methods were robust for enhancing the model’s predictive performance in the study area. When employing event-based landslide inventories in modeling, PM ensemble models offer the best predictive ability, according to the Kruskal–Wallis test results. Areas with a high mean susceptibility index and low standard deviation, identified using the 24 PM ensemble models based on different event-based landslide inventories, constitute places where landslide mitigation measures should be prioritized.

1. Introduction

Under the impact of climate change, extreme rainfall events have caused frequent landslides and debris flows in Taiwan’s mountainous areas. In order to effectively reduce the losses caused by the landslides and debris flows, it is necessary to employ landslide susceptibility analysis to delineate those areas in watersheds that are susceptible to landslides and use this information as a reference for overall watershed management plans. The chief methods for landslide susceptibility analysis consist of heuristic, statistical, probability, and deterministic methods [1]. Many types of machine learning methods have been broadly applied to landslide susceptibility analysis in recent years and have yielded excellent results; machine learning algorithms can be classified as either parametric or nonparametric [2].
Parametric machine learning algorithms first select a form of the function and then learn the function’s coefficients through a training process. The advantage of these algorithms is that the methodology is easy to explain and understand, and the training process is short and does not require the collection of vast amounts of data; their limitation is that the prior selection of a function often constrains the learning process, the method is only suitable for simple problems, and the fit is often relatively poor. Parametric machine learning algorithms may employ logistic regression and linear discriminant analysis, and logistic regression, in particular, is often applied in landslide susceptibility analysis [3,4,5,6,7]. For their part, nonparametric machine learning algorithms do not require prior selection of the functional form and may fit the function of any form through the training process. The advantage of this approach is its versatility and ability to generate good performance for training sample data; its limitation is its need for vast amounts of data, a slow training process, and a higher chance of overfitting occurring [2]. When the training sample is too small, the nonparametric algorithm inevitably leads to inadequate training, which will reduce the accuracy [8]. The nonparametric machine learning algorithms most commonly used in landslide susceptibility analysis include the support vector machine [9,10,11,12,13,14,15], random forest [8,9,12,15,16,17,18], kernel logistic regression [10,11,19], and boosted regression tree [15,16,17].
When performing landslide susceptibility analysis, landslide inventories can be classified as either multi-year or event-based, depending on the length of data collection time in the inventory. Apart from establishing a landslide susceptibility model based on a multi-year landslide inventory [20,21,22], when the research area lacks a multi-year landslide inventory, an event-based landslide inventory and triggering factors can be used to perform susceptibility analysis [3]. When establishing an event-based landslide susceptibility model, a landslide inventory for the event and data concerning the spatial distribution of triggering factors must be available; triggering factors, such as rainfall or earthquake intensity, are taken as independent variables in the model [6,23,24,25].
When establishing a landslide susceptibility model, the input sample data set is usually divided into a training set and a testing set. After using the training set to establish a model, the testing set is used to assess the performance of the model. The sample data set commonly contains a 50:50 ratio of landslide and no-landslide samples [9,13,26,27], and the ratio of the training set sample to the testing set sample is typically 70:30 [8,12,13,26,28,29]. Furthermore, when establishing a nonparametric machine learning model, the training set is also used to perform hyperparameter optimization. During the optimization process, fivefold cross-validation [9] and tenfold cross-validation [12,30] are often used to tune the hyperparameters.
Because each modeling method has its own advantages and limitations, different models can be used to perform landslide susceptibility analysis for the same research area, but uncertainty associated with the results of these models may exist. The ensemble method can then be used to aggregate the results of different models and can delineate areas with high susceptibility and low uncertainty [16,31]. The five most commonly used ensemble methods [32] involve the calculation of mean of landslide probabilities (PM), confidence interval for the mean of landslide probabilities (CI), median of landslide probabilities (PME), weighted mean of landslide probabilities (PMW), and committee averaging (CA), respectively. The stacking ensemble method, which uses a meta-learning algorithm to combine different single models [33], was also employed to establish ensemble modes [34,35]. In this study, the landslide susceptibility models were constructed by adopting the ensemble methods, such as PM, PME, PMW, and CA.
In order to assess the performance of different event-based ensemble landslide susceptibility models, this study used event-based and multi-year landslide inventories for the Shihmen watershed to establish single and ensemble landslide susceptibility models. To assess the robustness of the ensemble methods, we used numerous landslide inventories, rather than a single one, and compared the predictive accuracy of these single models and ensemble modes, established using the same inventory. Additionally, the rainfall-triggering factors were incorporated as independent variables into the landslide susceptibility models, which contributes to the development and improvement of landslide early-warning systems. Apart from comparing the landslide susceptibilities in the different models, this study also located those areas with high landslide susceptibility within the research area, which can provide a reference for decision-making when planning landslide mitigation measures.

2. Methods

This study first employed a logistic regression model and 4 nonparametric machine learning models to establish single landslide susceptibility models; then, it used 4 ensemble methods to establish ensemble landslide susceptibility models. In addition to establishing an event-based landslide susceptibility model, based on an event-based landslide inventory, we also combined 24 event-based landslide inventories, i.e., a multi-year landslide inventory, to establish a multi-year landslide susceptibility model. We then use the receiver operating characteristics (ROC) curve, Spearman’s rank correlation coefficient, the Mann–Whitney test, and the Kruskal–Wallis test to assess the performance of different models.

2.1. Single Landslide Susceptibility Model

2.1.1. Logistic Regression (LR) Model

Because the goal of landslide susceptibility analysis is to predict whether landslides will occur in individual slope units, the dependent variables in this model consisted of the binary response variables of “landslide” and “no-landslide”, and the logistic regression developed by Menard [36] was used to establish a parametric machine learning model, which took the form shown in Equation (1):
ln ( p i 1 p i ) = α i + j = 1 k β i j x i j .
Here, p i is the probability of landslide occurrence, α i ,   β i j are the coefficients, x i j is the value of the susceptibility factor, i represents different events, and j represents different susceptibility factors.

2.1.2. Random Forest (RF) Model

The random forest model proposed by Breiman [37] is a decision tree-based ensemble method and establishes multiple decision trees via the random selection of variable subsets. Because random forest models do not require any prior assumptions concerning the relationship between the independent variables and the target variable, this type of model is suitable for the analysis of large datasets with nonlinear correlations [38]. In the process of establishing different decision trees, the re-sampling of the data and the random selection of variable subsets increase the diversity of the decision trees [39]. According to Chang et al. [12], there are three reasons for random forest models’ high performance: (1) it is a form of nonparametric nature-based analysis; (2) it can determine the importance of the variables used; and (3) it can provide an algorithm for estimating missing values. This method has been extensively used in landslide susceptibility analysis in recent years, and has yielded excellent results [8,9,12,16].

2.1.3. Support Vector Machine (SVM) and Kernel Logistic Regression (KLR) Models

Support vector machines, as proposed by Vapnik [40], constitute a supervised classification method. Their special property is their ability to simultaneously maximize the geometric margin and minimize the empirical classification error, which is why they are also referred to as maximum margin classifiers [41]. SVMs perform classification by finding the hyperplane with the largest margin between two types of training data in a higher dimensional space. A non-linear kernel function can be used to map the input data onto a higher dimensional space, where a hyperplane classifying the data can be established. Kernel logistic regression is a kernelized version of linear logistic regression [42]. This method uses a kernel function to project the input data onto a higher dimensional feature space, with the goal of finding a discriminant function of distinguishing the two categories of landslide and no-landslide.
In the two previous models, the most commonly utilized kernel functions consist of linear kernel functions, polynomial kernel functions, radial basis kernel functions (RBF), and sigmoid kernel functions. Of these types, radial basis kernel functions are the most widely used [11] and offer the best predictive ability in most situations, especially in the case of nonlinear data [14]. Radial basis kernel functions are also a very popular choice for the establishment of landslide susceptibility models [43].

2.1.4. Gradient-Boosting Decision Tree (GBDT) Model

The gradient-boosting decision tree (GBDT) model proposed by Friedman et al. [44] is similar to the gradient-boost regression tree (GBRT) and multiple additive regression tree (MART) algorithms. GBDT models combine boosting and regression trees in a single algorithm [41]. Boosting relies on the minimization of the loss function at each tree spilt to improve the decision trees [45] and represents one of the learning methods offering the greatest improvement of model accuracy [17]. Rather than being fitted without any relationship with adjacent trees, GBDT trees are fitted on top of the previous trees.

2.2. Ensemble Landslide Susceptibility Model

Referring to Thuiller et al. [32], this study selected PM, PME, PMW, and CA as the ensemble methods used to aggregate the results of the 5 single models, as shown in Table 1. Among these methods, the PM ensemble model calculates the mean of the susceptibility indexes of the single models; the PME ensemble model calculates the median of the susceptibility indexes of the single models; and the PMW ensemble model calculates the weighted mean of the susceptibility indexes of the single models. To set weights, this study assigned weights to each single model, based on the accuracy calibrated by the training-event data. Additionally, the CA ensemble model first identifies the threshold value of each single model, converts the landslide susceptibility index to a binary value (landslide or no-landslide), and calculates the committee average of binary values of the 5 single models.

2.3. Single Model Establishment Process

2.3.1. Logistic Regression (LR) Model

All slope units with landslides in each landslide inventory are included in the landslide sample, and the no-landslide samples with the same sample number as the landslide sample are also selected. A 10-fold cross-validation is then used to perform model validation. The cross-validation process is repeated 5 times to reduce the error from the split subsets, which yields the mean test accuracy for the models, established from that sample dataset. The foregoing sampling process is repeated 10 times in order to reduce sampling error, and the model with the best mean test accuracy is selected for use in subsequent analysis.

2.3.2. Nonparametric Models (RF, SVM, KLR, GBDT)

Among nonparametric machine learning algorithms, hyperparameters must be set manually before training. For example, two hyperparameters must be set in the RF model used in this study: the number of trees to fit (numtree) and the number of variables for each tree (mtry). In an SVM or KLR model employing an RBF kernel function, two hyperparameters must be set: a penalty parameter (C) and an RBF parameter (γ). In a GBDT model, three hyperparameters must be set: numtree, mtry, and learning rate. The grid search method used to tune the hyperparameters in this study is a conventional optimization method, using our preset hyperparameter subset to perform a comprehensive search. The nonparametric models used in this study and the range of their hyperparameters are shown in Table 2.
The modeling process involved the selection of all slope units with landslides in each landslide inventory, to serve as the landslide sample, and the selection of a no-landslide sample with the same sample number. All the sample data were then split into a training set and testing set in a 70:30 ratio. The model training process began with hyperparameter tuning, which involved the use of the training set data and 10-fold cross-validation to perform an analysis of each hyperparameter subset, which yielded the mean training accuracy of each hyperparameter subset. The next step consisted of establishing a model using the tuned hyperparameter subset and training set data, and the testing set data were then used to perform model validation, which yielded the test accuracy. The sampling process was repeated 10 times, which yielded 10 tuned hyperparameter subsets and their corresponding models, and the model with the best test accuracy was selected for use in the subsequent analysis.

2.4. Model Performance Assessment

2.4.1. Receiver Operating Characteristic (ROC) Curve

The receiver operating characteristic (ROC) curve [46] method employs the use of threshold values to classify prediction results into 4 types: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). After calculating the true positive rate (TPR) and false positive rate (FPR) for each threshold value, the resulting data points are connected up to plot an ROC curve, where the area under the curve (AUROC) represents the model’s performance and predictive accuracy. The closer the AUROC value is to 1, the better the performance of the model.

2.4.2. Inferential Statistics

This study used the Mann–Whitney test and Kruskal–Wallis test to analyze the effect of different model methods and landslide inventories on the predictive ability of the established models.
The Mann–Whitney test, which is also known as the Wilcoxon rank sum test [47,48], is a nonparametric test used to determine whether there is a difference in the dependent variables between two independent populations. The test statistic, U, is calculated using Equation (2):
U = N 1 N 2 + N 1 ( N 1 + 1 ) 2 R 1 .
Here, N1 and N2 are the sample sizes in sample 1 and sample 2, where the sample with the greatest rank sum is taken as the first sample and has a rank sum of R 1 .
The Kruskal–Wallis test was first proposed by Kruskal and Wallis [49], and is a nonparametric test that extends the two-sample Wilcoxon test in the situation where there are more than two groups. The Kruskal–Wallis test does not assume a normal distribution of the underlying data. It ranks the data from smallest to largest, and assigns a rank to the data that is used to calculate the test statistic H, as shown in Equation (3). This test is used to determine whether there is a difference between the medians of K independent populations.
H = 12 n ( n + 1 ) i = 1 k R i 2 n i 3 ( n + 1 )
Here, n = n 1 + n 2 + + n k ,   n i is the sample size of each sample, and R i is the rank sum of each sample.

2.4.3. Spearman’s Rank Correlation Coefficient

Spearman’s rank correlation coefficient, as proposed by Spearman [50], is a nonparametric measure used to assess the strength and direction of the association between two ranked variables, X and Y. Depending on the values of variables X and Y, this measure ranks the data and establishes paired ranks, then calculates the difference in rank for each pair, as shown in Equation (4); the value of this coefficient is between −1 and 1 [51]:
ρ = 1 6 d i 2 n ( n 2 1 )
where d i is the difference in rank between the susceptibility index of slope units in the two models, ρ is the Spearman’s rank correlation coefficient, and n is the sample size.
The model performance assessment methods employed in this study are summarized in Table 3.

3. Research Area and Materials

3.1. Research Area and Topographic Factor

The Shihmen watershed, with an area of 75,243 ha, is located in the north part of Taiwan and is largely characterized by mountainous topography. Elevations in the area range from 236 m to 3526 m, and a slope gradient ranging from 20° to 50° accounts for 77% of the whole area (Figure 1). Because of the relatively well-defined topographic boundaries and topographic meaning in the Shihmen watershed, slope units were employed as analytical units for landslide susceptibility analysis. According to the subdivision method suggested by Xie et al. [52], this watershed was divided into 9181 slope units (Figure 2).
Twelve topographic factors, such as maximum slope, average slope, slope roughness, highest elevation, total slope height, terrain roughness, average elevation, distance from the road, distance from the fault, distance from the river, average aspect, and lithology, were selected as intrinsic susceptibility variables according to the previous study [6]. The values of highest elevation, total slope height, terrain roughness, average elevation, maximum slope, average slope, slope roughness, and average aspect of each slope unit were calculated by employing ArcGIS programs and by utilizing 5 m digital elevation model produced by the Ministry of the Interior. After obtaining the 1:5000 orthophoto base maps issued by the Aerial Survey Office of the Forestry Bureau, the 1:50,000 geologic maps issued by the Central Geological Survey and a road map overlay from the Soil and Water Conservation Bureau, we calculated the horizontal distances of each slope unit from the river, fault, and road, respectively. The lithologic types of each slope unit, such as argillite, quartzitic sandstone, hard sandstone and shale, sandstone and shale, and terrace deposit and alluvium, were analyzed utilizing the 1:50,000 geological maps. The distribution maps of 12 topographic factors are presented in Appendix A.

3.2. Landslide Inventory and Rainfall Factor

After collecting 24 sets of satellite images of the Shihmen watershed during the period from 1996 to 2015, landslide inventories triggered by 24 typhoon events were mapped according to the interpretation procedures proposed by Liu et al. [53]. The landslides recorded in the 24 landslide inventories in each slope unit are shown in Figure 2. The number of landslides for each landslide inventory ranged from 59 to 1350 and the total landslide area ranged from 10.19 ha to 577.04 ha (Figure 3).
Two rainfall factors, namely, maximum 1-h rainfall and maximum 24-h rainfall, were selected as extrinsic triggering variables, according to the previous research [6]. The short-duration rainfall and long-duration rainfall values reflect the rainfall pattern during the typhoon event. After collecting rainfall data from 31 rain-gauge stations (Figure 1), the maximum 1-h rainfall and maximum 24-h rainfall of each station during each typhoon event were analyzed. Then, the rainfall values of each slope unit were calculated, after using the Kriging method to estimate the spatial distribution of rainfall. The average maximum 1-h rainfall and maximum 24-h rainfall for each typhoon event are shown in Figure 3.

4. Results of Analysis

4.1. Results of Single Models

4.1.1. Logistic Regression (LR) Model

This study used LR to establish a parametric landslide susceptibility model. In the modeling process, 10-fold cross-validation was repeatedly used to assess model performance. The repeated application of this process reduced the sampling error and enabled the selection of the model with the best mean test accuracy for subsequent analysis. The test accuracy of 24 event-based logistic regression models (i.e., the AUROC value of the test stage) ranged from 0.740 to 0.862, and the mean accuracy was 0.819 (Table 4). Additionally, the test accuracy of the multi-year logistic regression model was 0.798.
The 24 event-based logistic regression models established in this study enabled the spatial variation in each event’s landslide susceptibility index to be determined. The mean values and standard deviations of the 24 landslide susceptibility indices for each slope unit were then calculated (Figure 4). Similarly, the mean landslide susceptibility indices and standard deviations were calculated for each slope unit in the multi-year logistic regression model.

4.1.2. Random Forest (RF) Model

The hyperparameter tuning results for each event-based model, established using the RF algorithm, are shown in Table 5; it can be seen that the number of trees (numtree) ranged from 100 to 1000 and the number of variables (mtry) ranged from 7 to 14. The test accuracy of the 24 event-based models ranged from 0.772 to 0.944, and the mean was 0.842. Hyperparameter tuning for the multi-year RF model yielded a numtree: 400 and mtry: 14, and the model’s test accuracy was 0.789.
The spatial variation in each event’s landslide susceptibility index could be obtained from the 24 event-based RF models. The mean values and standard deviations of the 24 landslide susceptibility indices for each slope unit were then calculated (Figure 5). Similarly, the mean landslide susceptibility indices and standard deviations were calculated for each slope unit in the multi-year RF model.

4.1.3. Support Vector Machine (SVM) Model

The hyperparameter tuning results for each event-based model, established using the SVM algorithm, are shown in Table 6; it can be seen that the penalty parameter (C) ranged from 0.029 to 754.312 and the RBF parameter (γ) ranged from 0.001 to 0.091. The test accuracy of all event-based models ranged from 0.674 to 0.861, and the mean was 0.754. Hyperparameter tuning for the multi-year SVM model yielded a penalty parameter (C) of 0.1 and an RBF parameter (γ) of 0.774, and the model’s test accuracy was 0.806.
The spatial variation in each event’s landslide susceptibility index could be obtained from the 24 event-based SVM models. The mean values and standard deviations of the 24 landslide susceptibility indices for each slope unit were then calculated (Figure 6). Similarly, the mean landslide susceptibility indices and standard deviations were calculated for each slope unit in the multi-year SVM model.

4.1.4. Kernel Logistic Regression (KLR) Model

The hyperparameter tuning results for each event-based model established using the KLR algorithm are shown in Table 7; it can be seen that the penalty parameter (C) ranged from 0.017 to 244.205 and the RBF parameter (γ) ranged from 0.002 to 0.281. The test accuracy of every event-based model ranged from 0.712 to 0.833 and the mean was 0.754. Hyperparameter tuning for the multi-year KLR model yielded a penalty parameter (C) of 1.0 and an RBF parameter (γ) of 0.1; the model’s test accuracy was 0.812.
The spatial variation in each event’s landslide susceptibility index could be obtained from the 24 event-based KLR models. The mean values and standard deviations of the 24 landslide susceptibility indices for each slope unit were then calculated (Figure 7). Similarly, the mean landslide susceptibility indices and standard deviations were calculated for each slope unit in the multi-year KLR model.

4.1.5. Gradient-Boosting Decision Tree (GBDT) Model

The hyperparameter tuning results for each event-based model established using the GBDT algorithm are shown in Table 8, and it can be seen that the number of trees (numtree) ranged from 100 to 1000, the number of variables (mtry) ranged from 6 to 14, and the learning rate ranged from 0.1 to 1.0. The test accuracy of the 24 event-based models ranged from 0.772 to 0.861 and the mean was 0.820. Hyperparameter tuning for the multi-year GBDT model yielded a numtree of 900, an mtry of 7, and a learning rate of 0.1; the model’s test accuracy was 0.804.
The spatial variation in each event’s landslide susceptibility index could be obtained from the 24 event-based GBDT models. The mean values and standard deviations of the 24 landslide susceptibility indices for each slope unit were then calculated (Figure 8). Similarly, the mean landslide susceptibility indices and standard deviations were calculated for each slope unit in the multi-year GBDT model.

4.2. Results of Ensemble Models

After establishing the single models, this study used the PM, PME, PMW, and CA ensemble methods to aggregate the landslide susceptibility indices of each single model for each event, yielding the landslide susceptibility indices of the 4 ensemble models.
The spatial variation in each event’s landslide susceptibility index could be obtained from the 24 event-based PM ensemble models. The mean values and standard deviations of the 24 landslide susceptibility indices for each slope unit were then calculated (Figure 9). The mean landslide susceptibility indices and standard deviations that were calculated for each slope unit in the multi-year PM ensemble model are shown in Figure 9. Similarly, the mean susceptibility index and the standard deviation of the 24 event-based models, obtained using the PME, PMW, and CA ensemble methods, are shown in Figure 10, Figure 11 and Figure 12.

4.3. Assessment of Model Accuracy

After establishing all the above-mentioned models, this study assessed the predictive ability of each model established using a specific landslide inventory by examining that model’s ability to predict the remaining landslide events. For the sake of clarity, the section below will employ the terminology a i , j , k to indicate the accuracy of the various models established using different landslide inventories and different modeling methods. Here, i = 1–25 indicate the individual landslide inventories used in modeling, where 1–24 are event-based landslide inventories, and 25 is the multi-year landslide inventory; j = 1–24 indicate the predicted events; and k = 1–9 indicate the different modeling methods.
The accuracy of each PM ensemble model is shown in Table 9; AUROC > 75% is indicated in green, AUROC 75–50% is indicated in yellow, and AUROC < 50% is indicated in red. The average predictive accuracy of event-based models (i = 1–24, j = 1–24, ij, k = 6) ranged from 70.9% to 77.9% and the mean was 74.8%; the average predictive accuracy of the multi-year model (i = 25, j = 1–24, k = 6) was 91.1%. The average predictive accuracy of the LR models (k = 1), RF models (k = 2), SVM models (k = 3), KLR models (k = 4), GBDT models (k = 5), PME ensemble models (k = 7), PMW ensemble models (k = 8), and CA ensemble models (k = 9) were also obtained.
The average predictive accuracy of the 5 event-based single landslide susceptibility models (k = 1–5) with regard to the other landslide events is shown in Figure 13. In Figure 13, from top down, the various symbols represent the maximum, third quartile, median, first quartile, and minimum of a box plot of average predictive accuracy. This figure also shows the predictive accuracy distribution of the multi-year single landslide susceptibility models. In particular, the average predictive accuracy of the event-based LR models (i = 1–24, j = 1–24, ij, k = 1) ranged from 48.8% to 76.1%, and the mean was 71.2%; the multi-year LR model had a mean predictive accuracy of 78.8%. Similarly, the average predictive accuracy of event-based RF models ranged from 57.9% to 74.7%, and the mean was 69.5%; the multi-year RF model had a mean predictive accuracy of 79.5%. The average predictive accuracy of event-based SVM models ranged from 50.0% to 76.1%, and the mean was 68.1%; the multi-year SVM model had a mean predictive accuracy of 88.0%. The average predictive accuracy of event-based KLR models ranged from 50.0% to 76.4% and the mean was 67.4%; the multi-year KLR model had a mean predictive accuracy of 88.6%. The average predictive accuracy of event-based GBDT models ranged from 69.2% to 76.3%, while the mean was 72.8%; the multi-year GBDT model had a mean predictive accuracy of 94.3%.
The average predictive accuracy of the 4 event-based ensemble landslide susceptibility models (k = 6–9) with regard to the other landslide events is shown in Figure 13, which also shows the predictive accuracy distribution of the multi-year ensemble landslide susceptibility models. In Figure 13, the average predictive accuracy of event-based PME models (i = 1–24, j = 1–24, ij, k = 7) ranged from 67.0% to 77.5% and the mean was 73.8%; the multi-year PME model had a mean predictive accuracy of 89.2%. The average predictive accuracy of event-based PMW models ranged from 70.9% to 77.8%, and the mean was 74.7%; the multi-year PMW model had a mean predictive accuracy of 91.6%. The average predictive accuracy of event-based CA models ranged from 72.7% to 77.1%, while the mean was 74.8%; the mean predictive accuracy of the multi-year CA model was 89.0%. These results indicate that the event-based ensemble models all had an AUROC > 50% with regard to other landslide events (i = 1–24, j = 1–24, ij, k = 6–9).

5. Discussion

5.1. Comparison of the Performance of Single and Ensemble Models

It can be seen from Figure 13 that the predictive accuracy of the ensemble models is superior to that of single models under most circumstances. In particular, the four ensemble models established on the basis of event-based landslide inventories 1, 3, 4, 9, 10, 17, and 21, as well as the multi-year landslide inventory all had greater predictive accuracy than any single models. In addition, the four ensemble models established on the basis of the event-based landslide inventories 2, 5, 7, 11, 12, 13, 15, 16, 18, 20, and 22 had greater predictive accuracy than at least any four single models. In other words, when establishing a model using the same landslide inventory, most ensemble models will offer superior predictive accuracy.
The mean predictive accuracy of different modeling methods (k = 1–9) are compared in Table 10. It can be seen that the mean predictive accuracy of ensemble models (k = 6–9) ranged from 0.738 to 0.748 and was higher than the accuracy range of 0.674–0.728 for single models (k = 1–5). Since the Kolmogorov–Smirnov test indicated that not all datasets were normally distributed, the Kruskal–Wallis test was used to compare the predictive accuracy of different modeling methods (Table 11). The post hoc test indicated that the predictive accuracy of ensemble models is consistently superior to that of single models. Furthermore, the coefficient of variation (CV) of the predictive accuracy of ensemble models ranged from 0.047 to 0.063, which was lower than the CV range for single models. In summary, our results show that ensemble landslide susceptibility models offer superior predictive ability and relatively low uncertainty.
Prior studies have demonstrated that the predictive ability of the landslide susceptibility models established by different ensemble methods was superior to that of single landslide susceptibility models [16,31,34,35]. In accordance with the previous study results, we found that most ensemble models were superior in terms of predictive accuracy to the single models developed with the same inventory. Moreover, this study used 24 inventories to establish the corresponding ensemble models. The higher predictive ability of the ensemble models for each inventory implied that the PM, PME, PMW, and CA ensemble methods were robust for enhancing the predictive performance of landslide susceptibility models in the study area.
Among the single models, while LR models had the lowest mean training accuracy (i = 1–24, j = 1–24, i = j, k = 1), their mean predictive accuracy of 0.712 (i = 1–24, j = 1–24, ij, k = 1) was higher than that of the RF, SVM, and KLR models. Although the RF, SVM, and KLR models had very good mean training accuracy, their mean predictive accuracy was poor; this may be because these nonparametric models require a greater quantity of data for training and are prone to overfitting [2,8].

5.2. Comparison of the Performance of Event-Based and Multi-Year Models

It can be seen from Figure 13 that among the nine modeling methods, the predictive accuracy of multi-year models is consistently superior to that of the 24 event-based models. Table 10 also reveals that the mean predictive accuracy of multi-year models (i = 25, j = 1–24) ranged from 0.788 to 0.943, which was higher than the values of 0.674–0.748 in the event-based models (i = 1–24, j = 1–24, ij). The results of the Mann–Whitney U test (Table 12) indicate that among the nine modeling methods, the predictive performance of models established based on multi-year landslide inventories is uniformly superior to that of event-based models. Furthermore, the CV of the multi-year models’ predictive accuracy (0.014–0.040) was lower than that of event-based models. In summary, multi-year landslide susceptibility models offer excellent predictive performance and low uncertainty.
The findings of the current study that the relatively excellent predictive performance and low uncertainty of the landslide susceptibility models established using multi-year landslide inventories verifies the advantage of using a combination of event-based inventories and confirms the previous study results. The relatively high predictive abilities of the landslide susceptibility models, built by the combination of different event-based landslide inventories, have been thought to be related to their bigger landslide sample size and the wider numerical range of rainfall parameters in the training sample [23,54,55], or to their lower concentration of landslides in areas with the same lithology and a lower collinearity between rainfall parameters and lithology [56,57].
It can also be seen from Figure 13 that when applying the same modeling method, the predictive accuracy of event-based models depends on the choice of event. For example, in terms of the average predictive accuracy of the 24 event-based LR models, a maximum of 76.1% appeared when employing the event in Jangmi, and a minimum of 48.8% was obtained when using the event in Nuri, while the range is 27.30%. Similarly, the ranges of the average predictive accuracy of the 24 event-based RF, SVM, KLR, GBDT, PM, PME, PMW, and CA models are 16.80%, 26.10%, 26.40%, 7.10%, 7.00%, 10.50%, 6.90%, and 4.40%, respectively. These results confirmed the findings of previous studies that the choice of event has an influence on the predictive ability of the event-based landslide susceptibility model established [22,23,24], which may correlate with the event’s rainfall intensity range [3] and the event’s spatial concentration degree of landslides [56,57].

5.3. Correlations between the Susceptibility Maps of the Optimal Model and Other Models

It can be seen from Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12 that when a specific modeling method is used, the high susceptibility areas in landslide susceptibility maps, based on multi-year models, are similar to those created using the mean susceptibility indices of 24 event-based models. Nevertheless, there are significant differences in the landslide susceptibility index range and standard deviation among the different modeling methods. Because PM ensemble models have optimal predictive accuracy (Table 11) and multi-year models are superior to event-based models (Table 12), this study considered the multi-year PM ensemble model to be the optimal landslide susceptibility model. This model’s landslide susceptibility map is the most representative and can best reflect the probabilities of landslides in different slope units of the research area.
We also compared these landslide susceptibility maps, in order to analyze the correlations in the spatial distribution of susceptibility index between the multi-year PM ensemble model and other models. Rather than performing the mutual subtraction algorithm [6,58,59,60] or the histogram matching method [55,56], we calculated the Spearman’s rank correlation coefficient to assess the degree of difference between the susceptibility maps of the optimal model and other models. As shown in Table 13, the correlation coefficient of the single models ranged from 0.811 to 0.946, with an average of 0.912, which was lower than the 0.940–1.00 correlation coefficient range of the ensemble models. This indicates that there are only relatively small differences between the susceptibility maps of the optimal model and other ensemble models. In addition, when a multi-year landslide inventory is not available, the fact that the average correlation coefficient of the single models was lower than the average correlation coefficient of the ensemble models indicates that ensemble models can effectively reduce the discrepancies between the susceptibility maps of the established models and the optimal model.

6. Conclusions

This study collected 24 event-based landslide inventories for the Shihmen watershed and employed logistic regression, random forest, support vector machine, kernel logistic regression, and gradient boosting decision tree methods to establish event-based single landslide susceptibility models. We also used four ensemble methods to aggregate the results of single models, to establish event-based ensemble models. In addition, the 24 event-based landslide inventories were combined to form a multi-year landslide inventory, which was used to establish multi-year single landslide susceptibility models and multi-year ensemble models.
As shown in Table 10, Table 11 and Table 12, the current study found that an ensemble model based on a multi-year inventory can achieve excellent predictive accuracy. Compared with event-based models, multi-year landslide susceptibility models offer superior predictive ability and lower uncertainty; compared with single models, ensemble landslide susceptibility models have higher predictive ability and lower uncertainty for each inventory, implying that the four ensemble methods are robust for enhancing the model’s predictive performance in the study area.
When relying on an event-based landslide inventory instead of a multi-year inventory to establish a model, the predictive accuracy of single models has considerable uncertainty due to differences in the predicted landslide events. The ensemble models can both reduce uncertainty and achieve better predictive accuracy, while the established PM ensemble models are the most effective of all. The susceptibility map created using the 24 PM ensemble models, based on different event-based landslide inventories, revealed areas where landslides are likely to occur. High-priority landslide mitigation measures should be implemented in places with a high mean susceptibility index and a low variation in susceptibility index to effectively reduce the losses caused by the landslides.
We recommend that other modeling methods, such as neural networks and deep learning, be further employed to establish landslide susceptibility models. When there are large numbers of single models, other ensemble methods, such as the confidence interval of the mean susceptibility index, may be used to establish even more effective ensemble landslide susceptibility models. Finally, due to the influence of the choice of event on the predictive ability of an event-based model and the better predictive ability of the models built by the combination of different event-based landslide inventories, future research can investigate possible improvements in predictive ability by combining two different event-based inventories to create an ensemble model when researchers are lacking a multi-year inventory.

Author Contributions

Conceptualization, C.-Y.W.; data curation, S.-Y.L.; formal analysis, S.-Y.L.; methodology, C.-Y.W. and S.-Y.L.; supervision, C.-Y.W.; visualization, S.-Y.L.; writing—original draft, C.-Y.W. and S.-Y.L.; writing—review and editing, C.-Y.W.; funding acquisition, C.-Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Technology, Taiwan (MOST 108-2625-M-005-003).

Acknowledgments

The authors would like to thank the Soil and Water Conservation Bureau, COA, and Water Resources Agency, MOEA, Taiwan for providing data, and C.L. Hsueh for her work on the event-based landslide inventories.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1. Maps of 12 topographic factors.
Figure A1. Maps of 12 topographic factors.
Water 14 00717 g0a1

References

  1. Van Westen, C.; Van Asch, T.W.; Soeters, R. Landslide hazard and risk zonation—Why is it still so difficult? Bull. Eng. Geol. Environ. 2006, 65, 167–184. [Google Scholar] [CrossRef]
  2. Brownlee, J. Parametric and Nonparametric Machine Learning Algorithms. Machine Learning Mastery. Available online: https://machinelearningmastery.com/parametric-and-nonparametric-machine-learning-algorithms (accessed on 12 January 2021).
  3. Lee, C.T.; Huang, C.C.; Lee, J.F.; Pan, K.L.; Lin, M.L.; Dong, J.J. Statistical approach to storm event-induced landslides susceptibility. Nat. Hazards Earth Syst. Sci. 2008, 8, 941–960. [Google Scholar] [CrossRef] [Green Version]
  4. Lu, A.; Haung, W.K.; Lee, C.F.; Wei, L.W.; Lin, H.H.; Chi, C. Combination of Rainfall Thresholds and Susceptibility Maps for Early Warning Purposes for Shallow Landslides at Regional Scale in Taiwan. In Workshop on World Landslide Forum; Casagli, N., Tofani, V., Sassa, K., Bobrowsky, P.T., Takara, K., Eds.; Springer: Cham, Germany, 2020; pp. 217–225. [Google Scholar]
  5. Rossi, M.; Guzzetti, F.; Reichenbach, P.; Mondini, A.C.; Peruccacci, S. Optimal landslide susceptibility zonation based on multiple forecasts. Geomorphology 2010, 114, 129–142. [Google Scholar] [CrossRef]
  6. Wu, C.Y.; Chen, S.C. Integrating spatial, temporal, and size probabilities for the annual landslide hazard maps in the Shihmen watershed, Taiwan. Nat. Hazards Earth Syst. Sci. 2013, 13, 2353–2367. [Google Scholar] [CrossRef] [Green Version]
  7. Wu, C. Landslide Susceptibility Based on Extreme Rainfall-Induced Landslide Inventories and the Following Landslide Evolution. Water 2019, 11, 2609. [Google Scholar] [CrossRef] [Green Version]
  8. Zhao, L.; Wu, X.; Niu, R.; Wang, Y.; Zhang, K. Using the rotation and random forest models of ensemble learning to predict landslide susceptibility. Geomat. Nat. Hazards Risk 2020, 11, 1542–1564. [Google Scholar] [CrossRef]
  9. Brock, J.; Schratz, P.; Petschko, H.; Muenchow, J.; Micu, M.; Brenning, A. The performance of landslide susceptibility models critically depends on the quality of digital elevation models. Geomat. Nat. Hazards Risk 2020, 11, 1075–1092. [Google Scholar] [CrossRef]
  10. Bui, D.T.; Le, K.T.T.; Nguyen, V.C.; Le, H.D.; Revhaug, I. Tropical forest fire susceptibility mapping at the Cat Ba National Park Area, Hai Phong City, Vietnam, using GIS-based kernel logistic regression. Remote Sens. 2016, 8, 347. [Google Scholar] [CrossRef] [Green Version]
  11. Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 2016, 13, 361–378. [Google Scholar]
  12. Chang, K.T.; Merghadi, A.; Yunus, A.P.; Pham, B.T.; Dou, J. Evaluating scale effects of topographic variables in landslide susceptibility models using GIS-based machine learning techniques. Sci. Rep. 2019, 9, 12296. [Google Scholar] [CrossRef] [Green Version]
  13. Mokhtari, M.; Abedian, S. Spatial prediction of landslide susceptibility in Taleghan basin, Iran. Stoch. Environ. Res. Risk Assess. 2019, 33, 1297–1325. [Google Scholar] [CrossRef]
  14. Oh, H.J.; Kadavi, P.R.; Lee, C.W.; Lee, S. Evaluation of landslide susceptibility mapping by evidential belief function, logistic regression and support vector machine models. Geomat. Nat. Hazards Risk 2018, 9, 1053–1070. [Google Scholar] [CrossRef] [Green Version]
  15. Schratz, P.; Muenchow, J.; Iturritxa, E.; Richter, J.; Brenning, A. Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data. Ecol. Model. 2019, 406, 109–120. [Google Scholar] [CrossRef] [Green Version]
  16. Kim, H.G.; Lee, D.K.; Park, C.; Ahn, Y.; Kil, S.H.; Sung, S.; Biging, G.S. Estimating landslide susceptibility areas considering the uncertainty inherent in modeling methods. Stoch. Environ. Res. Risk Assess. 2018, 32, 2987–3019. [Google Scholar] [CrossRef]
  17. Park, S.; Kim, J. Landslide susceptibility mapping based on random forest and boosted regression tree models, and a comparison of their performance. Appl. Sci. 2019, 9, 942. [Google Scholar] [CrossRef] [Green Version]
  18. Nsengiyumva, J.B.; Valentino, R. Predicting landslide susceptibility and risks using GIS-based machine learning simulations, case of upper Nyabarongo catchment. Geomat. Nat. Hazards Risk 2020, 11, 1250–1277. [Google Scholar] [CrossRef]
  19. Chen, X.; Chen, W. GIS-based landslide susceptibility assessment using optimized hybrid machine learning methods. Catena 2021, 196, 104833. [Google Scholar] [CrossRef]
  20. Gassner, C.; Promper, C.; Beguería, S.; Glade, T. Climate change impact for spatial landslide susceptibility. In Engineering Geology for Society and Territory; Christine, G., Catrin, P., Santiago, B., Thomas, G., Eds.; Springer: Cham, Germany, 2015; Volume 1, pp. 429–433. [Google Scholar]
  21. Lucà, F.; D’Ambrosio, D.; Robustelli, G.; Rongo, R.; Spataro, W. Integrating geomorphology, statistic and numerical simulations for landslide invasion hazard scenarios mapping: An example in the Sorrento Peninsula (Italy). Comput. Geosci. 2014, 67, 163–172. [Google Scholar] [CrossRef]
  22. Ozturk, U.; Pittore, M.; Behling, R.; Roessner, S.; Andreani, L.; Korup, O. How robust are landslide susceptibility estimates? Landslides 2021, 18, 681–695. [Google Scholar] [CrossRef]
  23. Knevels, R.; Petschko, H.; Proske, H.; Leopold, P.; Maraun, D.; Brenning, A. Event-Based Landslide Modeling in the Styrian Basin, Austria: Accounting for Time-Varying Rainfall and Land Cover. Geosciences 2020, 10, 217. [Google Scholar] [CrossRef]
  24. Shou, K.J.; Yang, C.M. Predictive analysis of landslide susceptibility under climate change conditions—A study on the Chingshui River Watershed of Taiwan. Eng. Geol. 2015, 192, 46–62. [Google Scholar] [CrossRef]
  25. Tanyas, H.; Rossi, M.; Alvioli, M.; van Westen, C.J.; Marchesini, I. A global slope unit-based method for the near real-time prediction of earthquake-induced landslides. Geomorphology 2019, 327, 126–146. [Google Scholar] [CrossRef]
  26. Moayedi, H.; Khari, M.; Bahiraei, M.; Foong, L.K.; Bui, D.T. Spatial assessment of landslide risk using two novel integrations of neuro-fuzzy system and metaheuristic approaches; Ardabil Province, Iran. Geomat. Nat. Hazards Risk 2020, 11, 230–258. [Google Scholar] [CrossRef] [Green Version]
  27. Zhang, S.; Li, C.; Peng, J.; Peng, D.; Xu, Q.; Zhang, Q.; Bate, B. GIS-based soil planar slide susceptibility mapping using logistic regression and neural networks: A typical red mudstone area in southwest China. Geomat. Nat. Hazards Risk 2021, 12, 852–879. [Google Scholar] [CrossRef]
  28. Nachappa, T.G.; Kienberger, S.; Meena, S.R.; Hölbling, D.; Blaschke, T. Comparison and validation of per-pixel and object-based approaches for landslide susceptibility mapping. Geomat. Nat Hazards Risk 2020, 11, 572–600. [Google Scholar] [CrossRef]
  29. Sur, U.; Singh, P.; Meena, S.R. Landslide susceptibility assessment in a lesser Himalayan road corridor (India) applying fuzzy AHP technique and earth-observation data. Geomat. Nat. Hazards Risk 2020, 11, 2176–2209. [Google Scholar] [CrossRef]
  30. Luo, L.; Lombardo, L.; van Westen, C.; Pei, X.; Huang, R. From scenario-based seismic hazard to scenario-based landslide hazard: Rewinding to the past via statistical simulations. Stoch. Environ. Res. Risk Assess. 2021. [Google Scholar] [CrossRef]
  31. Di Napoli, M.; Carotenuto, F.; Cevasco, A.; Confuorto, P.; Di Martire, D.; Firpo, M.; Pepe, G.; Raso, E.; Calcaterra, D. Machine learning ensemble modelling as a tool to improve landslide susceptibility mapping reliability. Landslides 2020, 17, 1897–1914. [Google Scholar] [CrossRef]
  32. Thuiller, W.; Georges, D.; Engler, R. Biomod2 Package Manual. 2015. Available online: https://CRAN.R-project.org/package=biomod2 (accessed on 12 July 2021).
  33. Wolpert, D.H. Stacked Generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
  34. Lee, M.J.; Choi, J.W.; Oh, H.J.; Won, J.S.; Park, I.; Lee, S. Ensemble-based landslide susceptibility maps in Jinbu area, Korea. Env. Earth Sci. 2012, 67, 23–37. [Google Scholar] [CrossRef]
  35. Hu, X.; Zhang, H.; Mei, H.; Xiao, D.; Li, Y.; Li, M. Landslide Susceptibility Mapping Using the Stacking Ensemble Machine Learning Method in Lushui, Southwest China. Appl. Sci. 2020, 10, 4016. [Google Scholar] [CrossRef]
  36. Menard, S.W. Applied Logistic Regression Analysis, 2nd ed.; Sage University Paper Series on Quantitative Application in the Social Sciences, Series no. 106; Sage: Thousand Oaks, CA, USA, 1995. [Google Scholar]
  37. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  38. Olden, J.D.; Kennard, M.J.; Pusey, B.J. Species invasions and the changing biogeography of Australian freshwater fishes. Glob. Ecol. Biogeogr. 2008, 17, 25–37. [Google Scholar] [CrossRef]
  39. Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Al-Katheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2016, 13, 839–856. [Google Scholar] [CrossRef]
  40. Vapnik, V.N. The Nature of Statistical Learning Theory. IEEE Trans. Neural Netw. 1997, 8, 1564. [Google Scholar]
  41. Foody, G.M.; Mathur, A. A relative evaluation of multiclass image classification by support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1335–1343. [Google Scholar] [CrossRef] [Green Version]
  42. Cawley, G.C.; Talbot, N.L. Efficient approximate leave-one-out cross-validation for kernel logistic regression. Mach. Learn. 2008, 71, 243–264. [Google Scholar] [CrossRef] [Green Version]
  43. Bui, D.T.; Pradhan, B.; Lofman, O.; Revhaug, I.; Dick, O.B. Landslide susceptibility assessment in the Hoa Binh province of Vietnam: A comparison of the Levenberg–Marquardt and Bayesian regularized neural networks. Geomorphology 2012, 171, 12–29. [Google Scholar]
  44. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  45. Cheong, Y.L.; Leitão, P.J.; Lakes, T. Assessment of land use factors associated with dengue cases in Malaysia using Boosted Regression Trees. Spat. Spatio-Temporal Epidemiol. 2014, 10, 75–84. [Google Scholar] [CrossRef]
  46. Green, D.M.; Swets, J.A. Signal Detection Theory and Psychophysics; Wiley: New York, NY, USA, 1966; Volume 1, pp. 1969–2012. [Google Scholar]
  47. Mann, H.B.; Whitney, D.R. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Statist. 1947, 18, 50–60. [Google Scholar] [CrossRef]
  48. Wilcoxon, F. Some uses of statistics in plant pathology. Biometrics Bull. 1945, 1, 41–45. [Google Scholar] [CrossRef]
  49. Kruskal, W.H.; Wallis, W.A. Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 1952, 47, 583–621. [Google Scholar] [CrossRef]
  50. Spearman, C. The Proof and Measurement of Association Between Two Things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
  51. Myers, J.L.; Well, A.D. Research Design and Statistical Analysis; Routledge: London, UK, 2003. [Google Scholar]
  52. Xie, M.; Esaki, T.; Zhou, G. GIS-based probabilistic mapping of landslide hazard using a three-dimensional deterministic model. Nat. Hazards 2004, 33, 265–282. [Google Scholar] [CrossRef]
  53. Liu, J.K.; Weng, T.C.; Hung, C.H.; Yang, M.T. Remote Sensing Analysis of Heavy Rainfall Induced Landslide. In Proceedings of the 21st Century Civil Engineering Technology and Management Conference, Hsinchu, Taiwan, 28 December 2001; Minghsin University of Science and Technology: Xinfeng Township, Taiwan, 2001; pp. C21–C31. (In Chinese). [Google Scholar]
  54. Dai, F.; Lee, C. A spatiotemporal probabilistic modelling of storm-induced shallow landsliding using aerial photographs and logistic regression. Earth Surf. Processes Landf. J. Br. Geomorphol. Res. Group 2003, 28, 527–545. [Google Scholar] [CrossRef]
  55. Lee, C.T.; Chung, C.C. Common patterns among different landslide susceptibility models of the same region. In Proceedings of the Workshop on World Landslide Forum, Ljubljana, Slovenia, 30 May–2 June 2017; Springer: Cham, Germany, 2017; pp. 937–942. [Google Scholar]
  56. Chien, F.C. The Relationship among Probability of Failure, Landslide Susceptibility and Rainfall. Master’s Thesis, National Central University, Taoyuan City, Taiwan, 2015. [Google Scholar]
  57. Fu, C.C. Event-Based Landslide Susceptibility and Rainfall-Induced Landslide Probability Prediction Model in the Zengwen Reservoir Catchment. Master’s Thesis, National Central University, Taoyuan City, Taiwan, 2017. [Google Scholar]
  58. Lai, J.S.; Chiang, S.H.; Tsai, F. Exploring influence of sampling strategies on event-based landslide susceptibility modeling. ISPRS Int. J. Geo-Inf. 2019, 8, 397. [Google Scholar] [CrossRef] [Green Version]
  59. Xiao, T.; Segoni, S.; Chen, L.; Yin, K.; Casagli, N. A step beyond landslide susceptibility maps: A simple method to investigate and explain the different outcomes obtained by different approaches. Landslides 2020, 17, 627–640. [Google Scholar] [CrossRef] [Green Version]
  60. Lei, X.; Chen, W.; Pham, B.T. Performance evaluation of GIS-based artificial intelligence approaches for landslide susceptibility modeling and spatial patterns analysis. ISPRS Int. J. Geo-Inf. 2020, 9, 443. [Google Scholar] [CrossRef]
Figure 1. Elevation, road, fault, river system, and rain gauge station in the Shihmen watershed.
Figure 1. Elevation, road, fault, river system, and rain gauge station in the Shihmen watershed.
Water 14 00717 g001
Figure 2. The slope units and landslide inventories, triggered by 24 typhoon events in the Shihmen watershed.
Figure 2. The slope units and landslide inventories, triggered by 24 typhoon events in the Shihmen watershed.
Water 14 00717 g002
Figure 3. Landslide inventory and rainfall statistics for 24 typhoon events.
Figure 3. Landslide inventory and rainfall statistics for 24 typhoon events.
Water 14 00717 g003
Figure 4. LR models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Figure 4. LR models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Water 14 00717 g004
Figure 5. RF models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Figure 5. RF models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Water 14 00717 g005
Figure 6. SVM models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Figure 6. SVM models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Water 14 00717 g006
Figure 7. KLR models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Figure 7. KLR models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Water 14 00717 g007
Figure 8. GBDT models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Figure 8. GBDT models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Water 14 00717 g008
Figure 9. PM ensemble models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Figure 9. PM ensemble models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Water 14 00717 g009
Figure 10. PME ensemble models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Figure 10. PME ensemble models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Water 14 00717 g010
Figure 11. PMW ensemble models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Figure 11. PMW ensemble models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Water 14 00717 g011
Figure 12. CA ensemble models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Figure 12. CA ensemble models: (a,b) the mean susceptibility index and standard deviation of 24 event-based models; (c,d) the mean susceptibility index and standard deviation of the multi-year model.
Water 14 00717 g012
Figure 13. Box plot of the average predictive accuracy of single and ensemble models in the prediction of other landslide events: (a) blue, green, and red represent LR, RF, and SVM, respectively; (b) blue, green, and red represent KLR, GBDT, and PM, respectively; (c) blue, green, and red represent PME, PMW, and CA, respectively.
Figure 13. Box plot of the average predictive accuracy of single and ensemble models in the prediction of other landslide events: (a) blue, green, and red represent LR, RF, and SVM, respectively; (b) blue, green, and red represent KLR, GBDT, and PM, respectively; (c) blue, green, and red represent PME, PMW, and CA, respectively.
Water 14 00717 g013
Table 1. The ensemble methods to aggregate the results of the selected models.
Table 1. The ensemble methods to aggregate the results of the selected models.
Ensemble MethodsDescription
PMMean of susceptibility indexes. The PM ensemble model calculates the mean of the susceptibility indexes for the selected models.
PMEMedian of susceptibility indexes. The PME ensemble model calculates the median of the susceptibility indexes for the selected models.
PMWWeighted mean of susceptibility indexes. The PMW ensemble model calculates the relative importance of the weights based on the accuracies of the selected models, and then calculates the weighted mean of the susceptibility indexes for the models.
CACommittee averaging. After identifying the threshold value of each selected model and converting the susceptibility index to binary value, the CA ensemble model calculates the average of binary values for the selected models.
Table 2. Hyperparameter types and the range of nonparametric models.
Table 2. Hyperparameter types and the range of nonparametric models.
ModelHyperparameterRange
RFnumber of trees (numtree)100–1500
number of variables (mtry)3–14
SVMpenalty parameter (C)0.001–1000
RBF parameter (γ)0.001–1000
KLRpenalty parameter (C)0.001–1000
RBF parameter (γ)0.001–1000
GBDTnumber of trees (numtree)100–1000
number of variables (mtry)5–14
learning rate0.1–1
Table 3. Description and explanation of the model performance assessment methods.
Table 3. Description and explanation of the model performance assessment methods.
Assessment MethodsDescriptionExplanation
Receiver operating characteristic (ROC) curveThe area under the ROC curve (AUROC) represented the model’s performance and predictive accuracy.AUROC ranges in value from 0 to 1. An excellent model has an AUROC near 1, and a poor performance model has an AUROC near 0.
Mann–Whitney testThe test was used to compare the predictive accuracy of multi-year model to that of event-based models.A p-value < 0.05 indicates that the null hypothesis is
rejected and a statistically significant difference between the predictive accuracy of multi-year model and that of
event-based models exists.
Kruskal–Wallis testThe test was used to compare the predictive accuracy of different modeling methods.A p-value < 0.05 indicates that null hypothesis is rejected and a statistically significant difference of the predictive accuracy of 9 modeling methods exists.
Spearman’s rank correlation coefficientThe coefficient was used for a quantitative comparison on landslide susceptibility maps.The value ranges between −1 and 1. A coefficient close to 1 means small differences between the susceptibility map of the optimal model and that of other models.
Table 4. Performances of the LR models.
Table 4. Performances of the LR models.
Event Test Accuracy Event Test Accuracy
1-Herb0.82514-Morakot0.832
2-Xangsane0.77615-Parma0.802
3-Toraji0.74016-Fanapi0.815
4-Nari0.76017-Megi0.859
5-Aere0.82418-Meari0.798
6-Haitang0.78919-Nanmadol0.862
7-Matsa0.83220-Talim0.812
8-Talim0.80621-Saola0.815
9-Longwang0.85922-Soulik0.835
10-Shanshan0.82623-Matmo0.835
11-Krosa0.85024-Soudelor0.823
12-Nuri0.842Multi-year0.798
13-Jangmi0.849
Table 5. The tuned hyperparameters and model performances for RF models.
Table 5. The tuned hyperparameters and model performances for RF models.
Event Hyperparameter TunedTest
Accuracy
Event Hyperparameter TunedTest
Accuracy
Numtree MtryNumtree Mtry
1-Herb10080.79714-Morakot300130.815
2-Xangsane500140.77215-Parma1000140.820
3-Toraji700110.85116-Fanapi600140.848
4-Nari400130.82117-Megi200110.872
5-Aere1000120.82018-Meari50070.849
6-Haitang700110.79219-Nanmadol600140.913
7-Matsa700120.84520-Talim600130.824
8-Talim300110.80821-Saola700130.944
9-Longwang600130.84322-Soulik500110.846
10-Shanshan500140.81923-Matmo900130.883
11-Krosa700120.85524-Soudelor400130.827
12-Nuri1000140.881Multi-year400140.789
13-Jangmi900130.855
Table 6. The tuned hyperparameters and model performances for SVM models.
Table 6. The tuned hyperparameters and model performances for SVM models.
Event Hyperparameter TunedTest
Accuracy
Event Hyperparameter TunedTest
Accuracy
CγCγ
1-Herb2.6830.0170.72114-Morakot0.0290.0290.768
2-Xangsane79.0600.0070.71715-Parma2.0240.0520.766
3-Toraji0.4940.0910.75916-Fanapi3.5560.0290.772
4-Nari0.6550.0290.75517-Megi0.1210.0170.786
5-Aere4.7150.0130.73218-Meari2.0240.0290.787
6-Haitang33.9320.0010.72919-Nanmadol1.1510.0170.797
7-Matsa754.3120.0010.75720-Talim4.7150.0020.674
8-Talim2.0240.0690.75821-Saola10.9850.0910.861
9-Longwang14.5630.0050.75422-Soulik19.3070.0020.712
10-Shanshan138.9500.0020.74123-Matmo1.1510.0170.739
11-Krosa2.0240.0070.75824-Soudelor0.3730.0070.728
12-Nuri3.5560.0220.754Multi-year0.10.7740.806
13-Jangmi184.2070.0010.769
Table 7. The tuned hyperparameters and model performances for KLR models.
Table 7. The tuned hyperparameters and model performances for KLR models.
Event Hyperparameter TunedTest
Accuracy
Event Hyperparameter TunedTest
Accuracy
CγCγ
1-Herb244.2050.0050.71214-Morakot0.0170.0390.729
2-Xangsane59.6360.0050.71215-Parma2.6830.0170.771
3-Toraji1.1510.0520.78116-Fanapi33.9320.0130.784
4-Nari10.9850.0170.77517-Megi2.0240.0690.815
5-Aere1.1510.0220.76018-Meari33.9320.0070.725
6-Haitang1.5260.0290.71719-Nanmadol8.2860.0030.829
7-Matsa244.2050.0020.74720-Talim1.1510.0690.712
8-Talim1.5260.0690.74421-Saola14.5630.2810.833
9-Longwang3.5560.0390.76522-Soulik244.2050.0010.730
10-Shanshan0.4940.0290.74523-Matmo1.5260.0390.737
11-Krosa33.9320.0040.74624-Soudelor8.2860.0050.737
12-Nuri59.6360.0290.723Multi-year1.00.10.812
13-Jangmi3.5560.0290.767
Table 8. The tuned hyperparameters and model performances for GBDT models.
Table 8. The tuned hyperparameters and model performances for GBDT models.
Event Hyperparameter Tuned Test
Accuracy
Event Hyperparameter Tuned Test
Accuracy
NumtreeMtryLearning RateNumtreeMtryLearning Rate
1-Herb10080.10.80014-Morakot20080.10.833
2-Xangsane700130.90.77215-Parma300130.50.823
3-Toraji10080.10.77716-Fanapi100130.50.815
4-Nari300110.60.80717-Megi1000130.80.848
5-Aere600140.30.82118-Meari1000140.40.788
6-Haitang200110.70.77219-Nanmadol1000100.70.852
7-Matsa100121.00.83720-Talim100130.70.807
8-Talim10090.30.81721-Saola10070.30.832
9-Longwang20060.10.84522-Soulik20060.20.839
10-Shanshan20060.90.81523-Matmo10061.00.843
11-Krosa200110.50.84124-Soudelor10060.10.811
12-Nuri100090.30.827Multi-year90070.10.804
13-Jangmi100120.10.861
Table 9. AUROCs (%) of each PM ensemble model for the calibration or prediction of other landslide events. AUROC > 75% is indicated in green, AUROC 75–50% is indicated in yellow.
Table 9. AUROCs (%) of each PM ensemble model for the calibration or prediction of other landslide events. AUROC > 75% is indicated in green, AUROC 75–50% is indicated in yellow.
Event for Calibration or Prediction
123456789101112131415161718192021222324
Model trained by event1907170677967757476777577787576757876787374697667
2749075787174707681797878757379798376817580737877
3707592757174737478787674797577778174817278767874
4717572926571677077767875776977767968797080697372
5807774749174817881827879837977788378827473758072
6677369747090768275777878787676727976777479767572
7727368697976917877767977787776727672777076727270
8647065657374729175747678707372676875777265727268
9767572757875767893798280787478777878847579788079
10767574758072767582908080817779778477797377767973
11747467747474767781799274797778717973696875676971
12717070717270727275737390747174777973767077727672
13777774767875807978798275927977757574767182797764
14727066687872797876787878779074697677707169687365
15697672776969686880817882746991778473827577677777
16677171697466726371747472706973938169696777637674
17727373747663716979797279717578789374817370737678
18727169696872707479777178737176727489807579777978
19727270747474767478777279747278748173937483798077
20697063687173727376737179727070727575798980747674
21667470726974737576777479807376748073817391797775
22727267727573747677777180807674727880857583928179
23727370717671727379787379767575768077827475809281
24747469727575757782807781777778768380827680818289
25908889909289929092919194939091899491959192939289
Table 10. Performance assessment of the different modeling methods.
Table 10. Performance assessment of the different modeling methods.
Inventory TypeMetricLR(1)RF(2)SVM(3)KLR(4)GBDT(5)PM(6)PME(7)PMW(8)CA(9)
Event-basedMean training accuracy 0.8130.8780.8330.8560.9770.9090.8830.9120.923
CV of training accuracy 0.0380.0280.0540.0370.0110.0130.0170.0130.019
Mean predictive accuracy0.7120.6950.6810.6740.7280.7480.7380.7470.748
CV of predictive accuracy0.1180.1040.1460.1420.0590.0550.0630.0550.047
Multi-yearMean predictive accuracy0.7880.7950.8800.8860.9430.9110.8920.9160.890
CV of predictive accuracy0.040 0.029 0.026 0.021 0.014 0.0190.022 0.018 0.022
Table 11. Kruskal–Wallis test of the predictive accuracy of different modeling methods.
Table 11. Kruskal–Wallis test of the predictive accuracy of different modeling methods.
NMean Rankd.f.HpPost Hoc Test
LR(1)5762481.798514.1420.000>2–4
RF(2)5762013.05 -
SVM(3)5762080.67 -
KLR(4)5761924.60 -
GBDT(5)5762571.61 >2–4
PM(6)5763134.95 >1–5
PME(7)5762873.52 >1–5
PMW(8)5763120.86 >1–5
CA(9)5763131.45 >1–5
Table 12. Mann–Whitney U test of the predictive accuracy of models, based on different types of landslide inventories.
Table 12. Mann–Whitney U test of the predictive accuracy of models, based on different types of landslide inventories.
Modeling MethodInventory TypeNMean RankSum ofUp
LR (1)Event-based 552279.95154,531.001903.0000.000
Multi-year 24485.2111,645.00
RF (2)Event-based 552277.78153,332.00704.0000.000
Multi-year 24535.1712,844.00
SVM (3)Event-based 552276.50152,629.001.0000.000
Multi-year 24564.4613,547.00
KLR (4)Event-based 552276.50152,628.000.0000.000
Multi-year 24564.5013,548.00
GBDT (5)Event-based 552276.50152,628.000.0000.000
Multi-year 24564.50135,48.00
PM (6)Event-based 552276.50152,628.000.0000.000
Multi-year 24564.5013,548.00
PME (7)Event-based 552276.50152,628.000.0000.000
Multi-year 24564.5013,548.00
PMW (8)Event-based 552276.50152,628.000.0000.000
Multi-year 24564.5013,548.00
CA (9)Event-based 552276.50152,628.000.0000.000
Multi-year 24564.5013,548.00
Table 13. Degree of difference between the susceptibility maps of the multi-year PM ensemble model and other models.
Table 13. Degree of difference between the susceptibility maps of the multi-year PM ensemble model and other models.
Susceptibility MapSpearman’s Rank Correlation CoefficientSusceptibility MapSpearman’s Rank Correlation Coefficient
Event-based LR model0.924Multi-year LR model0.928
Event-based RF model0.917Multi-year RF model0.913
Event-based SVM model0.938Multi-year SVM model0.811
Event-based KLR model0.946Multi-year KLR model0.864
Event-based GBDT model0.938Multi-year GBDT model0.936
Event-based PM ensemble model0.962Multi-year PM ensemble model1.000
Event-based PME ensemble model0.954Multi-year PME ensemble model0.990
Event-based PMW ensemble model0.963Multi-year PMW ensemble model1.000
Event-based CA ensemble model0.940Multi-year CA ensemble model0.950
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wu, C.-Y.; Lin, S.-Y. Performance Assessment of Event-Based Ensemble Landslide Susceptibility Models in Shihmen Watershed, Taiwan. Water 2022, 14, 717. https://doi.org/10.3390/w14050717

AMA Style

Wu C-Y, Lin S-Y. Performance Assessment of Event-Based Ensemble Landslide Susceptibility Models in Shihmen Watershed, Taiwan. Water. 2022; 14(5):717. https://doi.org/10.3390/w14050717

Chicago/Turabian Style

Wu, Chun-Yi, and Sheng-Yu Lin. 2022. "Performance Assessment of Event-Based Ensemble Landslide Susceptibility Models in Shihmen Watershed, Taiwan" Water 14, no. 5: 717. https://doi.org/10.3390/w14050717

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop