The goal of this project was to use R to run statistical operations on data from ESRI shapefiles, and see about getting the results of these operations back into a format that can be used in ArcGIS. This was my first experience using R, and I was surprised at how easy it was (or at least parts of it), though having experience with Python and Java helps. This may be pretty basic for those who already have used R, but hopefully it will be a useful introduction for neophytes like myself.
An easy way to import ESRI shapefile data into R is through the use of the maptools package. You can install the package in the R gui by selecting Packages>Install Package(s), selecting your mirror site (we have one at USA (OR)), and finding maptools in the list. You’ll also need the rgl, spatstat, sm, and misc3d packages to run all the commands here. Note that Digital Earth uses R 3.0.1, for which spatstat is not available, so some of these commands are not possible in the lab.
For this exercise, we’ll be using the Mount Hood earthquake data (hood3d.shp):
To read a shapefile, you can use one of the following commands in R, depending on the type of shapefile:
readShapeLines(file.choose())
readShapePoints(file.choose())
readShapePoly(file.choose())
The file.choose
command is an alternative to hard-coding a file path and name; it will bring up a file selection dialog that you can use to interactively select your file.
Run the following command to create a SpatialPointsDataFrame object called shape, which will contain the geometry and attributes of the shapefile:
shape <- readShapePoints(file.choose())
You can view the data with the summary
or attributes
commands:
Note that the X, Y, and Z coordinates are named here as coords.x1, coords.x2, and coords.x3. We can assign each to a more intuitively-named vector (or list for those who are more used to Python):
x <- shape$coords.x1
y <- shape$coords.x2
z <- shape$coords.x3
Now we can do a number of things with these points. One of the more fun options is to compute the 3d density of these points and create density isosurfaces. This uses the kde3d (kernel density in 3d) and contour3d functions in the misc3d package.
Our data is in a UTM projection, which means that the units are meters, which means that the distances between points are large numbers. I haven’t yet been able to get good results using the raw data in the kde3d function, but do get good results when dividing the input values by 1000:
mx <- x/1000
my <- y/1000
mz <- z/1000
Through trial and error, I got decent results using the following parameters:
with(shape, {
d <- kde3d(mx, my, mz, h=0.2, n = 150, lims=c(range(mx),range(my),range(mz)))
contour3d(d$d, 0.02, d$x, d$y, d$z,
color = "red", color2 = "gray", scale=TRUE
)
})
The result displays in an interactive window that can be zoomed and rotated:
There is another package that depends on the misc3d package, called sm. It has a density function that calls contour3d and configures some more advanced parameters behind the scenes. It’s also somewhat simpler to use:
m <- cbind(x,y,z)
sm.density(m, panel=TRUE, panel.plot = TRUE)
Note that the raw x, y, and z coordinates worked here, though it is necessary to create a matrix with them first. The result:
Notice the slider for the size of kernel used in the density calculation, as well as the bounding frame and 3d axes. There are a number of arguments that can be used in both the density and contour3d functions to customize the appearance of the data and the layout. By either using them or by writing your own R code, you could conceivably make 3d density plots to whatever level of sophistication you wish.
In addition to visualizing your data, you can perform whatever statistical analysis you would use with any other dataset in R. For example, we can easily make histograms of depth or magnitude:
layout(matrix(c(1,2),1,2,byrow=TRUE))
hist(shape$Depth, breaks=20)
hist(shape$Magnitude, breaks=40)
The layout function allows for the creation of two side-by-side plots when given a matrix of one row and two columns; otherwise the second plot would overwrite the first:
Histograms can just as easily be made in ArcGIS, but there are much more complex data operations available in R. For example, it is possible to approximate the above density surfaces and get them back to ArcScene with nearest neighbor distances.
First, we make a 3d points object using the x, y, and z values as well as their ranges. In this case, this object is called p3. Then, use spatstat’s nndist.pp3 function with p3 as an argument to get a list of distances. You can then send the result back to shape; simply referring to a column that doesn’t exist yet will create it:
p3 <- pp3(x,y,z, as.box3(range(x), range(y), range(z)))
dists <- nndist.pp3(p3)
shape$nndist<-dists
The final bit of R code is very similar to the first:
writePointsShape(shape, file.choose())
This brings up the same dialog as before, which allows you to name the output file. R will warn you that the file does not exist and ask you to make it, though you’ll need to make sure that you give the file a .shp extension.
If you look at the results of the summary function from earlier, you can see that the projection information did not get imported with the attribute data. However, the coordinates are all the same as the original shapefile, so you can simply define the coordinate system to be the same as the original. If you have the original loaded, its coordinate system will appear in the Layers list in the Spatial Reference selection dialog:
The attribute table for this shapefile includes new columns, including nndist, the one we just added:
You could use this new column to adjust the color or size of the points. Although it is not possible, as far as I know, to create a 3D isosurface in ArcScene, you can approximate it by creating 3D buffers, which can use a field value as the radius. Since nndist is small in clustered data, and we would prefer larger spheres for our clusters, it’s best to create a new field calculated from the first. In the table options menu, add a field. Make it a float value. Right-click on the header after it is created, select Field Calculator, and enter your conversion. After some trial and error, I found that the conversion that I liked best was 1/nndist*20000.
Once you create the column, you can use the Buffer 3D tool found in 3D Analyst Tools> 3D Features to create the buffers. You’ll want to use the size field rather than a constant value for distance:
Now the clusters can be viewed in ArcScene:
So, that’s about it for this project. These operations were accomplished with very little code; most of the work is done by the packages and functions already in R, and it’s possible to make your own as well. The most difficult part of this process is learning how to make the various functions work. A great resource is the help(“f”) function, where f is a function you are trying to use. Running this command will bring up the documentation for that function, usually with examples.
Dr. Narayanaraj kindly provided some example R code that served as the starting point for my research in this project. I’ll upload the files to the Data folder in the S drive.
The seismic data used in this exercise is from the Advanced National Seismic System catalog, maintained by the University of California Berkely.