Research Question
Over the duration of this course, my research question has taken on a form different from that presented in my opening blog post, but still equally valuable to my research. Instead of asking how I could bring statistics into a single landslide hazard analysis, I am now asking how statistics may be used to add consistency to the entire landslide mapping process (Figure 1).
Figure 1: Correspondence of landslide map types and selected inputs.
Mapping landslides can be a complicated task that, along the way, incorporates a sequence of three map types:
- Inventory – the mapped extents of past landslides. These extents can simply be one polygon, or they can be several polygons representing the features that compose a landslide; features such as scarps and deposits.
- Hazard – mapped predictions of the probability of landslide occurrence or the amount of ground deformation associated with the advent of some natural hazard, such as heavy rainfall or seismic ground motions.
- Risk – a mapped combination of landslide hazard and the vulnerability and exposure of infrastructure. Typical risk maps attempt to express the costs incurred by a landslide’s occurrence in a particular location.
In addition to these three types, there is also susceptibility mapping. Susceptibility maps, which show where ground conditions may be conducive to landsliding, are useful for some applications, but they are not necessary in this context.
Inventory, hazard, and risk maps should be viewed as a progression, as each new map is dependent on the content of its predecessor. A lack in geotechnical design parameters (i.e. friction angle, depth to groundwater, soil stratigraphy) at the regional scale requires that landslide inventories be used to back-calculate conditions at failure. These conditions can then be interpolated across the region to improve the inputs to a hazard map. This approach has many imperfections, but it represents the most informed analysis on many occasions.
Additionally, the hazard map is a primary input for a risk map. A good way to think about the relationship between hazard and risk maps is by answering the age-old question, “If a [landslide occurs] in the woods, does anyone [care]?” The answer is typically no, but on the occasion that the landslide wipes out their driveway, railroad track, or doghouse, the answer becomes YES! A risk map considers whether the infrastructure’s location corresponds with that of high landslide hazard, and sometimes, how much the repair of the damaged infrastructure might cost. For these reasons, risk maps are the ultimate goal for land managers, like the Oregon Department of Transportation. Knowing the costs in advance allows for improved allocation of resources and better budgeting estimates.
Datasets
The datasets used for this course were:
- Statewide Landslide Information Database for Oregon (SLIDO) – Points representing the location of historic landslides and polylines representing the extents of landslide scarps
- Northwest River Forecast Center Rainfall Data – Weather station points with past rainfall amounts and future rainfall predictions.
- Oregon Lidar Consortium Digital Elevation Models (DEM) – 3 foot resolution bare-earth elevation rasters.
All datasets were evaluated for various locations in the Oregon Coast Range.
Hypotheses
The hypotheses related to rainfall-induced landslide mapping are as follows:
- Topography and soil strength account for most of a soil’s strength, but these two factors are not solely responsible for most landslides.
- Rainfall is the factor that most often leads to slope failure. A slope is in equilibrium until the addition of pore water pressures from rainfall induces a failure.
These hypotheses must be broken down into more specific hypotheses to address my research question. The specific hypothesis are listed below:
- The adequacy of any topographic data for landslide mapping is determined by its resolution and the scale at which it is evaluated.
- Different elevation derivatives (i.e. curvature, slope, roughness) are better for identifying specific landslide features. For example, one derivative might be better at identifying landslide deposits, while another might be better at identifying the failure surface.
- The intensity and timing of rainfall determines how soil strength is diminished.
Approaches
Each of the three specific hypotheses were evaluated as coursework this quarter, and their role in the landslide mapping process is shown in Figure 2. Hypothesis one was addressed using fourier transforms, hypothesis two was addressed using principal component analysis (PCA), and hypothesis three was evaluated using kriging. A non-hypothesis related approach also came in the form of a hot spot analysis performed in order to identify locations of more costly landslide activity.
Figure 2: Relationship between hypotheses and the landslide mapping process.
Results
Documentation for the implementation and results associated with the hot spot and kriging analyses has been provided in previous blog posts, but PCA and fourier transforms will be discussed here.
Principal Component Analysis
The purpose of performing a principal component analysis was to determine which topographic derivatives were most closely associated with the crest of a landslide scarp. The values used as inputs to the PCA were the mean slope, standard deviation of elevation, profile curvature, and planform curvature that corresponded with the location of each scarp polyline. Table 1 shows the results of the PCA.
Table 1: Coefficients resulting from principal component analysis.
Principal Component | Slope | Profile Curvature | Standard Deviation of Slope | Planform Curvature |
1 | 0.99 | 0.00 | 0.00 | -0.16 |
2 | 0.00 | 0.80 | 0.59 | 0.04 |
3 | 0.16 | -0.06 | 0.02 | 0.98 |
4 | -0.01 | -0.59 | 0.80 | -0.05 |
Table 1 shows that the first principal component is strongly correlated with slope, while the second principal component is strongly correlated with profile curvature and the standard deviation of slope. Table 1 was not considered further because it relies assumption that the scarp polylines represent the true location of landslide scarps, which was later determined to be unlikely. The PCA results still do provide useful information, as the strong correlations of both profile curvature and standard deviation of slope with the second principal component spurred an additional investigation.
Having two variables strongly correlated with the same data implies that the two variables are also correlated with each other. To confirm this understanding, profile curvature and standard deviation of slope were compared (Figure 3). The results show a nearly linear relationship between the two variables. Based on these results, standard deviation of slope was no longer considered in future analyses related to landslide scarps.
Figure 3: Comparison of profile curvature and standard deviation of slope.
Fourier transforms
Fourier transforms use a weighted sum of pairs of sine and cosine functions to represent some finite function. Each paired sine and cosine function has a unique frequency that is plotted against its amplitude to develop what is termed the frequency domain. In common practice, the frequency domain is used to identify dominant frequencies and to remove frequencies associated with noise in the data. In the case of this project, fourier transforms were used to determine the frequency of topography (a digital elevation model, with results in Figure 4), which in turn provides its period. Knowing the period of topography is a useful way of determining the scale at which it may be identified in an entire landscape.
Figure 4: Example of the frequency domain of topography.
The primary failure of this approach is that most topography is dominated by very low frequencies (high periods), meaning that topography is inherently flat, which makes clear identification of small landslide features impossible. Future work filtering the frequency domain will be necessary before any conclusions may be drawn from this approach.
Significance
The significance of this work has two major aspects:
- Land managers and the public benefit from landslide maps because they show the cost of building in certain locations.
- The statistical framework can provide a better way to threshold continuous natural data to improve consistency in the implementation of landslide mapping procedures.
The two aspects come together in that the consistent production of landslide maps will yield products that are easier to interpret. Improved interpretation of the maps will hopefully influence future construction and mitigation for existing infrastructure.
Course Learning
The primary research-oriented lessons learned during this course are:
- Combinations of software programs are often necessary to efficiently complete a task. While some software may have almost infinite capabilities, the time needed to implement some approaches may favor the use of other software.
- Most programming languages have significant libraries of code that are already written. While an individual may have the ability to write code to perform a task, unnecessary time is spent rewriting what someone else has already done. Often, that same time can be used to explore additional opportunities that may lead to innovative ideas.
From these two lessons it should be evident that I value efficiency. The breadth of my problem is great and what I was able to accomplish during this course is only a small piece of the problem (see Figure 2). On top of only scratching the surface of my problem, many of my efforts also ended without meaningful results. Despite these failures, several approaches that I first thought would not apply to my work, surprised me. For this reason my greatest lesson learned is that it is important to try many different approaches to the same problem; some may work and some may not. Improved efficiency simply makes it possible to implement more analyses.
Statistical Learning
Of the activities performed during this course, the hot spot analysis and kriging were univariate analyses. Based on these two analyses, below are several advantages and limitations to univariate statistical approaches.
- Relatively easy to apply (in terms of time required and interpretation of results)
- May reveal patterns that are not obvious, which is most evident in the results of my hot spot analysis.
- Require large sample sizes, which is also most evident in the results of my hot spot analysis.
- Kriging was particularly sensitive to geographic outliers.
- Sensitive to the spatial distribution of sampling sites. Geographic biases, such as the selection of only landslides along roadways in my hot spot analysis, may produce deceptive results. I would not trust isolated sampling sites.
- One variable is not capable of modeling many processes.
Other statistical approaches performed during this course involved transformations that brought data into unfamiliar states. Both the fourier frequency domain and principal component loadings are abstract notions can only be interpreted with specific knowledge.