Cluster analysis of watershed lithology in Oregon coastal systems

The question that I asked for Exercise 3 was: how similar are the lithological profiles of Oregon coastal watersheds?

I used cluster analysis to answer this question. Broadly, cluster analysis is a way of grouping data that minimizes within-group variance and maximizes between-group variance. There are many ways of performing a cluster analysis, usually involving statistical optimization computations. For this exercise, I used hierarchical clustering, which builds a hierarchical grouping network based on the distribution of numerical attributes in a dataset. My dataset was the lithological profile of 35 watersheds in the Oregon Coast Range. The lithological profile refers to the percentage of each lithology, including metamorphic, plutonic, sedimentary, surficial sediments, tectonic, and volcanic types.

Methods

Computing lithological profile for each watershed
1. Downloaded “Oregon Geologic Data Compilation – 2015” from Oregon Spatial Data Library. https://spatialdata.oregonexplorer.info/geoportal/details;id=e71e1897f5864b689a3a4a131287a309
2. Used ArcMap to dissolve the geology layer based on the GEN_LITH_TY field, which classified each geological type into either: metamorphic, plutonic, sedimentary, surficial sediments, tectonic, or volcanic.
3. Used ArcMap to intersect geology data with study watersheds. The Intersect tool subsets the dissolved lithology layer to only include data bounded by the study watersheds.
  1. Ran Intersect on multiple different shapefiles: wshd4, wshd5, and all Coos watersheds
  2. The output of the Intersect tool is a shapefile with each discrete lithological type as a single feature. As a result, there was often more than one polygon per lithology in each watershed.
4. Used Calculate Geometry tool in ArcMap attribute table to calculate area (in square kilometers), and the latitude and longitude of each watershed polygon centroid.
5. Exported attribute tables of each output shapefile and standardized columns in Excel. Exported as CSV and reformatted table in R so that rows demonstrated each watershed and their lithological profile. Used dplyr to group each lithological type and sum percentage. Used tidyr (spread function) to make each lithology a column.
Cluster analysis
1. Used the daisy function in R (package: cluster) to compute all pairwise dissimilarities between the 35 study watersheds. The dissimilarity distances were calculated using the Euclidean method.
2. Performed hierarchical clustering analysis (HCA) on the dissimilarity matrix computed in Method 2a. I used the agnes function to compute the HCA using Ward’s method. The minimum variances between groups in Ward’s method are calculated by squaring the Euclidean distance.
3. Validated hierarchical clustering procedure using eclust (package: factoextra) and fviz_silhouette. The latter function is just for visualizing the results from the former, the cluster validation.

Results

The HCA resulted in a fairly successful clustering with an agglomerative coefficient of .96. The agglomerative coefficient is a measure of the clustering structure, with values closest to one representing a high degree of dissimilarity between clusters. The cluster validation demonstrated that all but two watersheds were adequately grouped (Figure 1).

Figure 1.Results from cluster analysis silhouette plot.

Figure 2. Dendritic hierarchical clustering results. Three numbers are written below each final branch: the top is the watershed ID, the middle is the percent sedimentary lithology (blue), and the bottom is the volcanic geology (purple).

The HCA resulted in two main groups with many sub branches (Figure 2). By comparing the lithology percentages to the clusters, I determined that one main branch includes watersheds dominated by volcanic lithology and the other branch includes watersheds dominated by sedimentary lithology. Eight of the 35 watersheds are primarily volcanic, 26 are primarily sedimentary, and one watershed does not fit this classification (Watershed 31: 85% surficial sediments).

Critique of Method

This method was very useful for me because it helped me characterize the lithology of my study watersheds in a quantitative way, and then identify the dissimilarities between the lithological profiles of each watershed. The results from this exercise will advance my ability to compare hydrologic regime characteristics across coastal watersheds and the driving controls on such hydrological processes. For example, the dendrogram in Figure 2 shows that there are five watersheds with 100% sedimentary lithology and five watersheds with predominantly volcanic geology. Such statistics will help me stratify my study design and guide my flow regime analysis.

GEOG 566

Advanced spatial statistics and GIScience

Cluster analysis of watershed lithology in Oregon coastal systems