Research Question

Over the duration of this course, my research question has taken on a form different from that presented in my opening blog post, but still equally valuable to my research. Instead of asking how I could bring statistics into a single landslide hazard analysis, I am now asking how statistics may be used to add consistency to the entire landslide mapping process (Figure 1).

Initial_Flowchart

Figure 1: Correspondence of landslide map types and selected inputs.

Mapping landslides can be a complicated task that, along the way, incorporates a sequence of three map types:

  1. Inventory – the mapped extents of past landslides. These extents can simply be one polygon, or they can be several polygons representing the features that compose a landslide; features such as scarps and deposits.
  2. Hazard – mapped predictions of the probability of landslide occurrence or the amount of ground deformation associated with the advent of some natural hazard, such as heavy rainfall or seismic ground motions.
  3. Risk – a mapped combination of landslide hazard and the vulnerability and exposure of infrastructure. Typical risk maps attempt to express the costs incurred by a landslide’s occurrence in a particular location.

In addition to these three types, there is also susceptibility mapping. Susceptibility maps, which show where ground conditions may be conducive to landsliding, are useful for some applications, but they are not necessary in this context.

Inventory, hazard, and risk maps should be viewed as a progression, as each new map is dependent on the content of its predecessor. A lack in geotechnical design parameters (i.e. friction angle, depth to groundwater, soil stratigraphy) at the regional scale requires that landslide inventories be used to back-calculate conditions at failure. These conditions can then be interpolated across the region to improve the inputs to a hazard map. This approach has many imperfections, but it represents the most informed analysis on many occasions.

Additionally, the hazard map is a primary input for a risk map. A good way to think about the relationship between hazard and risk maps is by answering the age-old question, “If a [landslide occurs] in the woods, does anyone [care]?” The answer is typically no, but on the occasion that the landslide wipes out their driveway, railroad track, or doghouse, the answer becomes YES! A risk map considers whether the infrastructure’s location corresponds with that of high landslide hazard, and sometimes, how much the repair of the damaged infrastructure might cost. For these reasons, risk maps are the ultimate goal for land managers, like the Oregon Department of Transportation. Knowing the costs in advance allows for improved allocation of resources and better budgeting estimates.

Datasets

The datasets used for this course were:

  1. Statewide Landslide Information Database for Oregon (SLIDO) – Points representing the location of historic landslides and polylines representing the extents of landslide scarps
  2. Northwest River Forecast Center Rainfall Data – Weather station points with past rainfall amounts and future rainfall predictions.
  3. Oregon Lidar Consortium Digital Elevation Models (DEM) – 3 foot resolution bare-earth elevation rasters.

All datasets were evaluated for various locations in the Oregon Coast Range.

Hypotheses

The hypotheses related to rainfall-induced landslide mapping are as follows:

  1. Topography and soil strength account for most of a soil’s strength, but these two factors are not solely responsible for most landslides.
  2. Rainfall is the factor that most often leads to slope failure. A slope is in equilibrium until the addition of pore water pressures from rainfall induces a failure.

These hypotheses must be broken down into more specific hypotheses to address my research question. The specific hypothesis are listed below:

  1. The adequacy of any topographic data for landslide mapping is determined by its resolution and the scale at which it is evaluated.
  2. Different elevation derivatives (i.e. curvature, slope, roughness) are better for identifying specific landslide features. For example, one derivative might be better at identifying landslide deposits, while another might be better at identifying the failure surface.
  3. The intensity and timing of rainfall determines how soil strength is diminished.

Approaches

Each of the three specific hypotheses were evaluated as coursework this quarter, and their role in the landslide mapping process is shown in Figure 2. Hypothesis one was addressed using fourier transforms, hypothesis two was addressed using principal component analysis (PCA), and hypothesis three was evaluated using kriging. A non-hypothesis related approach also came in the form of a hot spot analysis performed in order to identify locations of more costly landslide activity.

Final_Flowchart

Figure 2: Relationship between hypotheses and the landslide mapping process.

Results

Documentation for the implementation and results associated with the hot spot and kriging analyses has been provided in previous blog posts, but PCA and fourier transforms will be discussed here.

Principal Component Analysis

The purpose of performing a principal component analysis was to determine which topographic derivatives were most closely associated with the crest of a landslide scarp. The values used as inputs to the PCA were the mean slope, standard deviation of elevation, profile curvature, and planform curvature that corresponded with the location of each scarp polyline. Table 1 shows the results of the PCA.

Table 1: Coefficients resulting from principal component analysis.

Principal Component Slope Profile Curvature Standard Deviation of Slope Planform Curvature
1 0.99 0.00 0.00 -0.16
2 0.00 0.80 0.59 0.04
3 0.16 -0.06 0.02 0.98
4 -0.01 -0.59 0.80 -0.05

 

Table 1 shows that the first principal component is strongly correlated with slope, while the second principal component is strongly correlated with profile curvature and the standard deviation of slope. Table 1 was not considered further because it relies assumption that the scarp polylines represent the true location of landslide scarps, which was later determined to be unlikely. The PCA results still do provide useful information, as the strong correlations of both profile curvature and standard deviation of slope with the second principal component spurred an additional investigation.

Having two variables strongly correlated with the same data implies that the two variables are also correlated with each other. To confirm this understanding, profile curvature and standard deviation of slope were compared (Figure 3). The results show a nearly linear relationship between the two variables. Based on these results, standard deviation of slope was no longer considered in future analyses related to landslide scarps.

PCA

Figure 3: Comparison of profile curvature and standard deviation of slope.

Fourier transforms

Fourier transforms use a weighted sum of pairs of sine and cosine functions to represent some finite function. Each paired sine and cosine function has a unique frequency that is plotted against its amplitude to develop what is termed the frequency domain. In common practice, the frequency domain is used to identify dominant frequencies and to remove frequencies associated with noise in the data. In the case of this project, fourier transforms were used to determine the frequency of topography (a digital elevation model, with results in Figure 4), which in turn provides its period. Knowing the period of topography is a useful way of determining the scale at which it may be identified in an entire landscape.

Frequency_domain

Figure 4: Example of the frequency domain of topography.

The primary failure of this approach is that most topography is dominated by very low frequencies (high periods), meaning that topography is inherently flat, which makes clear identification of small landslide features impossible. Future work filtering the frequency domain will be necessary before any conclusions may be drawn from this approach.

Significance

The significance of this work has two major aspects:

  1. Land managers and the public benefit from landslide maps because they show the cost of building in certain locations.
  2. The statistical framework can provide a better way to threshold continuous natural data to improve consistency in the implementation of landslide mapping procedures.

The two aspects come together in that the consistent production of landslide maps will yield products that are easier to interpret. Improved interpretation of the maps will hopefully influence future construction and mitigation for existing infrastructure.

Course Learning

The primary research-oriented lessons learned during this course are:

  1. Combinations of software programs are often necessary to efficiently complete a task. While some software may have almost infinite capabilities, the time needed to implement some approaches may favor the use of other software.
  2. Most programming languages have significant libraries of code that are already written. While an individual may have the ability to write code to perform a task, unnecessary time is spent rewriting what someone else has already done. Often, that same time can be used to explore additional opportunities that may lead to innovative ideas.

From these two lessons it should be evident that I value efficiency. The breadth of my problem is great and what I was able to accomplish during this course is only a small piece of the problem (see Figure 2). On top of only scratching the surface of my problem, many of my efforts also ended without meaningful results. Despite these failures, several approaches that I first thought would not apply to my work, surprised me. For this reason my greatest lesson learned is that it is important to try many different approaches to the same problem; some may work and some may not. Improved efficiency simply makes it possible to implement more analyses.

Statistical Learning

Of the activities performed during this course, the hot spot analysis and kriging were univariate analyses. Based on these two analyses, below are several advantages and limitations to univariate statistical approaches.

  1. Relatively easy to apply (in terms of time required and interpretation of results)
  2. May reveal patterns that are not obvious, which is most evident in the results of my hot spot analysis.
  3. Require large sample sizes, which is also most evident in the results of my hot spot analysis.
  4. Kriging was particularly sensitive to geographic outliers.
  5. Sensitive to the spatial distribution of sampling sites. Geographic biases, such as the selection of only landslides along roadways in my hot spot analysis, may produce deceptive results. I would not trust isolated sampling sites.
  6. One variable is not capable of modeling many processes.

Other statistical approaches performed during this course involved transformations that brought data into unfamiliar states. Both the fourier frequency domain and principal component loadings are abstract notions can only be interpreted with specific knowledge.

Question

How is the occurrence of landslides in western Oregon related to the rate and timing of rainfall? The Northwest River Forecast Center (NWRFC) archives 6-hour rainfall accumulation from more than 2000 recording stations throughout the Pacific Northwest (Figure 1). The challenge is that landslides do not tend to occur in the immediate proximity of these stations, and spatial interpolation must be performed to estimate rainfall amounts at landslide locations.

StationLocations

Figure 1: Location of rainfall recording stations provided by the NWRFC.

The Tool and its Implementation

Kriging was selected over inverse distance weighting to better account for variations in rainfall from east to west. Performance of kriging occurred in Matlab to allow for a better selection of inputs, and to simplify the task, which involved kriging every 6-hour measurement for December (124 times).

Kriging works by weighting measured values using a semivariogram model. Several semivariograms were examined to better identify which model would best fit the dataset (Figure 2).

Spherical Gaussian Exponential

Figure 2: Examples of semivariogram fits to NWRFC rainfall data.

Based on Figure 2, the exponential semivariogram appeared to be the best choice. Another option is the search radius (i.e. how far the model looks for data points). This value was also varied to illustrate the notion of a poor semivariogram fit (Figure 3).

Exponential Exponential020deg

Figure 3: Examples of varying the search radius (lag distance). Search radii from left to right: 1 degree, 5 degrees, 0.2 degrees.

Once each of the 124 surfaces were produced, values were extracted to the locations of major landslides from this past winter. The extracted values were later used to produce rainfall versus time plots, which are described in the next section.

Results

To simplify results for this project, only one landslide is shown. The Highway 42 landslide occurred on December 23, 2015, closing the highway for nearly one month and costing the Oregon Department of Transportation an estimated $5 million to repair. Rainfall versus time profiles were produced for three popular semivariograms (spherical, exponential, and Gaussian) to gauge the influence of choosing one method over another (Figure 4).

Accumulation42 Intensity42

Figure 4: Comparison of results obtained from the different semivariograms and PRISM.

Figure 4 shows little effect due to changing the semivariogram model, which is likely a result of having limited variability in rainfall measurements and the distribution of recording stations near the location of the Highway 42 landslide.

To verify the results of this exercise, PRISM daily rainfall estimates were downloaded for the corresponding time period, and compared (Figure 4). This comparison shows that, while the PRISM data does not capture spikes in rainfall amount, the overall accumulation of rainfall appears to be similar, implying that kriging was effective for this application.

 

Question

The Statewide Landslide Information Database for Oregon (SLIDO, Figure 1) is a GIS compilation of point data representing past landslides. Each point is associated with a number of attributes, including repair cost, dimensions, and date of occurrence. For this exercise, I asked whether or not SLIDO could be subjected to a hot spot analysis, and if so, could the results be insightful.

Small_SLIDOFig_Re

Figure 1: SLIDO (version 3.2).

The Tool

Hot spot analysis is a tool in ArcGIS that spatially identifies areas of high and low concentrations of an inputted weighting parameter. The required input is either a vector layer with a numerical attribute that can serve as the weighting parameter, or a vector layer whose features indicate an equal weight of occurrence. Outputs are a z-score, p-value, and confidence level bin for each feature from the input layer.

Description of Exercise

Performing the hot spot analysis was more than simply running the tool with SLIDO as an input with the weighting field selected. Selecting an input field was easier said than done, as the SLIDO attribute table is only partially completed. Based on a survey of fields in the SLIDO attribute table, it was clear that repair cost was the best choice. All points having a repair cost were then exported to a new layer, which was then inputted into the hot spot analysis. An important note is that this step greatly reduced the number of points, and their scatter, and the output map looks slightly different than Figure 1.

Outputs

The output of this exercise is a comparison of SLIDO points colored by their repair cost with SLIDO points colored by confidence level bin (Figure 2).

Small_HotSpotsMap

Figure 2: Comparison of coloring by hot spots to simply coloring by cost.

Discussion

The second map in Figure 2 shows the presence of a major hot spot and a major cold spot regarding landslide costs in Oregon. The figure shows that, on average, landslides in the northwest corner of the state are more expensive. This observation can only be made because there appears to be a similar density of points, located at similar distances away from their neighbors, across the entire network of points. The figure also shows that single high-cost landslides do not play a major role in the positioning of hot spots, which is a nice advantage of the hot spot analysis.

In general, I think that the hot spot analysis did a good job illustrating a trend that may not have been obvious in the original data.

Bonus Investigation

In the hot spot documentation, it is stated that the analysis is not conducive to small datasets. An example of performing a hot spot analysis on a small dataset is provided in Figure 3. While there may be a trend in points colored by normalized infiltration rate, the hot spot map shows not significant points.

Small_ColdSpots

Figure 3: Hot spot analysis on a small dataset.

  1. Description of Research Question:

The Oregon Coast Range routinely plays host to disastrous landslides. The primary reason for these landslides is that the range provides a unique combination of high annual precipitation with the presence weak marine sediments (Olsen et al. 2015). During winter storms, it is not uncommon for major transportation corridors to become inoperable, impacting local economies and the livelihoods of residents (The Oregonian 2015a, 2015b). Overall, landslides in Oregon cost an average of $10 million annually, with losses from particularly severe storms having cost more than $100 million (Burns and Madin 2009).

While these rainfall-induced landslides may sometimes be large, deep-seated failures, they most frequently occur in the form of shallow translational failures. These shallow landslides typically occur in the upper few meters of the soil profile, and may result in heavy damage to forest access roads or the temporary closure of major roads.

Recently, I developed a limit equilibrium slope stability model for use in mapping shallow landslides during rainfall. In its current form, the model a deterministic equation that computes a factor of safety against failure for each cell of a digital elevation model (DEM). The problem with this approach is that if fails to account for spatial and temporal variation of input parameters and it only considers a single DEM resolution. My research question is to explore how the incorporation of a probabilistic framework, which expresses the confidence in each input and multiple scales of application, influences the predictive power of the model.

  1. Datasets:

The dataset analyzed for this project consists of three parts:

  1. Data from Smith et al. (2013), who performed hydrologic monitoring of a clear-cutted hillslope in the Elliot State Forest of southwestern Oregon. Monitoring was performed over a three year period, with measurements of rainfall, volumetric water content, and negative pore water pressure taken at hourly increments. Volumetric water content and negative pore water pressure were measured in eight separate soil pits, with each pit being instrumented three times between 0 and 3.0 meters in depth.
  2. Lidar derived DEM from the Oregon Lidar Consortium for the Elk Peak quadrangle in southwestern Oregon.
  3. The Statewide Landslide Information Database for Oregon (SLIDO) corresponding to the Elk Peak quadrangle.
  1. Hypotheses:

The existing model, despite being insufficient to meet the goals of this project, has provided valuable insight into the influence of rainfall on slope instability. Like other slope stability methods, topography and soil strength will account for most of the stability. These two factors combined are expected to bring soils to a critical state, but not a state of failure. The addition of rainfall will then determine whether slopes fail or not. This approach should be most interesting when using the model to forecast landslide hazards based on predicted weather.

  1. Approaches:

I am not clear on exactly what types of analyses need to be undertaken to further my project. My hope is that the advice from peers and assignments associated with this course will help me choose the necessary steps, given my set of goals. I anticipate that most work will be performed in either ArcGIS or Matlab.

  1. Expected outcome:

This project is expected to produce a statistical model that estimates the probability of failure for a given set of conditions. The model is intended for use in mapping applications, and the primary outcome will be rainfall-induced landslide hazard maps for the Elk Peak quadrangle.

  1. Significance:

Accurate hazard maps allow land managers and homeowners to better understand the risk posed by landslides. This method is expected to go a step forward by using rainfall predictions to produce pre-storm maps, which will provide hazard maps specific to a severe rainfall event. Maps of this nature would be especially important because they would allow agencies like the Oregon Department of Transportation to know where resources might be needed before any damage has actually occurred.

  1. Your level of preparation:
    1. I have extensive experience with ArcGIS and model builder from coursework and research during my master’s degree. I have also served as a TA for the OSU CE 202 course (a civil engineering course on GIS), which gave me greater abilities in troubleshooting ArcGIS and working with Modelbuilder.
    2. My experience with GIS programming in Python is moderate, and mainly the resulting of taking GEO 578.
    3. I have no experience with R.

References

Burns, W.J., and Madin, I.P. (2009). “Protocol for Inventory Mapping of Landslide Deposits from Light Detection and Ranging (LIDAR) Imagery.” Oregon Department of Geology and Mineral Industries, Special Paper 42.

Olsen, M.J., Ashford, S.A., Mahlingam, R., Sharifi-Mood, M., O’Banion, M., and Gillins, D.T. (2015). “Impacts of Potential Seismic Landslides on Lifeline Corridors.” Oregon Department of Transportation, Report No. FHWA-OR-RD-15-06.

Smith, J.B., Godt., J.W., Baum, R.L., Coe, J.A., Burns, W.J., Lu, N., Morse, M.M., Sener-Kaya, B., and Kaya, M. (2013). “Hydrologic Monitoring of a Landslide-Prone Hillslope in the Elliot State Forest, Southern Coast Range, Oregon, 2009-2012.” United States Geological Survey, Open File Report 2013-1283.

The Oregonian (2015a). “U.S. 30 closes and reopens in various locations due to landslides, high water.” December 17, 2015. <http://www.oregonlive.com/portland/index.ssf/2015/12/high_water_closes_one_us_30_ea.html>

The Oregonian (2015b). “Landslide buckles Oregon 42, closing it indefinitely,” December 25, 2015. <http://www.oregonlive.com/pacific-northwest-news/index.ssf/2015/12/landslide_buckles_oregon_42_cl.html>