The goal of this project was to use R to run statistical operations on data from ESRI shapefiles, and see about getting the results of these operations back into a format that can be used in ArcGIS.  This was my first experience using R, and I was surprised at how easy it was (or at least parts of it), though having experience with Python and Java helps.  This may be pretty basic for those who already have used R, but hopefully it will be a useful introduction for neophytes like myself.

An easy way to import ESRI shapefile data into R is through the use of the maptools package.  You can install the package in the R gui by selecting Packages>Install Package(s), selecting your mirror site (we have one at USA (OR)), and finding maptools in the list.  You’ll also need the rgl, spatstat, sm, and misc3d packages to run all the commands here.  Note that Digital Earth uses R 3.0.1, for which spatstat is not available, so some of these commands are not possible in the lab.

For this exercise, we’ll be using the Mount Hood earthquake data (hood3d.shp):

HoodData2

To read a shapefile, you can use one of the following commands in R, depending on the type of shapefile:

readShapeLines(file.choose())
readShapePoints(file.choose())
readShapePoly(file.choose())

The file.choose command is an alternative to hard-coding a file path and name; it will bring up a file selection dialog that you can use to interactively select your file.

Run the following command to create a SpatialPointsDataFrame object called shape, which will contain the geometry and attributes of the shapefile:

shape <- readShapePoints(file.choose())

You can view the data with the summary or attributes commands:

shapeData

Note that the X, Y, and Z coordinates are named here as coords.x1, coords.x2, and coords.x3.  We can assign each to a more intuitively-named vector (or list for those who are more used to Python):
x <- shape$coords.x1
y <- shape$coords.x2
z <- shape$coords.x3

Now we can do a number of things with these points.  One of the more fun options is to compute the 3d density of these points and create density isosurfaces.  This uses the kde3d (kernel density in 3d) and contour3d functions in the misc3d package.

Our data is in a UTM projection, which means that the units are meters, which means that the distances between points are large numbers.  I haven’t yet been able to get good results using the raw data in the kde3d function, but do get good results when dividing the input values by 1000:

mx <- x/1000
my <- y/1000
mz <- z/1000

Through trial and error, I got decent results using the following parameters:

with(shape, {
d <- kde3d(mx, my, mz, h=0.2, n = 150, lims=c(range(mx),range(my),range(mz)))
contour3d(d$d, 0.02, d$x, d$y, d$z,
color = "red", color2 = "gray", scale=TRUE
)
})

The result displays in an interactive window that can be zoomed and rotated:

iso1

There is another package that depends on the misc3d package, called sm.  It has a density function that calls contour3d and configures some more advanced parameters behind the scenes.  It’s also somewhat simpler to use:

m <- cbind(x,y,z)
sm.density(m, panel=TRUE, panel.plot = TRUE)

Note that  the raw x, y, and z coordinates worked here, though it is necessary to create a matrix with them first.  The result:

iso2

Notice the slider for the size of kernel used in the density calculation, as well as the bounding frame and 3d axes.  There are a number of arguments that can be used in both the density and contour3d functions to customize the appearance of the data and the layout.  By either using them or by writing your own R code, you could conceivably make 3d density plots to whatever level of sophistication you wish.

In addition to visualizing your data, you can perform whatever statistical analysis you would use with any other dataset in R.  For example, we can easily make histograms of depth or magnitude:

layout(matrix(c(1,2),1,2,byrow=TRUE))
hist(shape$Depth, breaks=20)
hist(shape$Magnitude, breaks=40)

The layout function allows for the creation of two side-by-side plots when given a matrix of one row and two columns; otherwise the second plot would overwrite the first:

histGrams

Histograms can just as easily be made in ArcGIS, but there are much more complex data operations available in R.  For example, it is possible to  approximate the above density surfaces and get them back to ArcScene with nearest neighbor distances.

First, we make a 3d points object using the x, y, and z values as well as their ranges.  In this case, this object is called p3.  Then, use spatstat’s nndist.pp3 function with p3 as an argument to get a list of distances.  You can then send the result back to shape; simply referring to a column that doesn’t exist yet will create it:

p3 <- pp3(x,y,z, as.box3(range(x), range(y), range(z)))
dists <- nndist.pp3(p3)
shape$nndist<-dists

The final bit of R code is very similar to the first:

writePointsShape(shape, file.choose())

This brings up the same dialog as before, which allows you to name the output file.  R will warn you that the file does not exist and ask you to make it, though you’ll need to make sure that you give the file a .shp extension.

If you look at the results of the summary function from earlier, you can see that the projection information did not get imported with the attribute data.  However, the coordinates are all the same as the original shapefile, so you can simply define the coordinate system to be the same as the original.  If you have the original loaded, its coordinate system will appear in the Layers list in the Spatial Reference selection dialog:

define

The attribute table for this shapefile includes new columns, including nndist, the one we just added:

nnAtt

You could use this new column to adjust the color or size of the points.  Although it is not possible, as far as I know, to create a 3D isosurface in ArcScene, you can approximate it by creating 3D buffers, which can use a field value as the radius.  Since nndist is small in clustered data, and we would prefer larger spheres for our clusters, it’s best to create a new field calculated from the first.  In the table options menu, add a field.  Make it a float value.  Right-click on the header after it is created, select Field Calculator, and enter your conversion.  After some trial and error, I found that the conversion that I liked best was 1/nndist*20000.

Once you create the column, you can use the Buffer 3D tool found in 3D Analyst Tools> 3D Features to create the buffers.  You’ll want to use the size field rather than a constant value for distance:

buffer

Now the clusters can be viewed in ArcScene:

HoodQuakesTrimmed

So, that’s about it for this project.  These operations were accomplished with very little code; most of the work is done by the packages and functions already in R, and it’s possible to make your own as well.  The most difficult part of this process is learning how to make the various functions work.  A great resource is the help(“f”) function, where f is a function you are trying to use.  Running this command will bring up the documentation for that function, usually with examples.

Dr. Narayanaraj kindly provided some example R code that served as the starting point for my research in this project.  I’ll upload the files to the Data folder in the S drive.

The seismic data used in this exercise is from the Advanced National Seismic System catalog, maintained by the University of California Berkely.

 

 

LiDAR point information is usually available as a set of ASCII or LAS data files:

LASdirectory

ArcGIS only supports LAS data files; to use ASCII LiDAR data with Arc, you’ll need to use an external tool to convert to LAS.

LAS files cannot be added directly; they must be combined into an LAS dataset that sets a consistent symbology and spatial reference for the entire collection.  To create a LAS dataset, go to ArcCatalog and right-click the folder you want to store the dataset in, and select new>LAS Dataset:

LASDcreation

Note that you will need 3D Analyst or Spatial Analyst activated to do this.  I recommend checking all the extensions to be sure your tools run the first time.

Right-click the dataset in ArcCatalog and choose Properties.

Ensure your dataset has an appropriate coordinate system in the XY Coordinate System tab, which you’ll need to get from the metadata for the LiDAR.  Next, activate the LAS Files tab and click Add Files or Add Folders.   Once you are done adding files, activate the Statistics tab and press the Calculate button.

calculated

At this point, you can import your data into either ArcMap or ArcScene.  There are pros and cons to both.  As far as I’ve been able to determine, it is impossible to plot 3D point clouds in ArcMap with a DEM or map base.  This is possible in ArcScene, and it is also possible to color points according to intensity in 3D view, but unlike in ArcMap there is no ability to adjust point size, and very limited ability to adjust colors of points, at least as far as I’ve been able to determine over the last few days.

Some LAS datasets will include RGB information in addition to intensity, which allows 3D true-color visualizations.

midLookout

This image shows the Middle Lookout Creek site in the HJ Andrews Experimental Forest as a point cloud colored by intensity in ArcScene.  The creek and some log jams are visible on the right, and a road is visible on the left.

To convert LiDAR points to a DEM, you’ll need to convert the dataset to a multipoint feature class first.  Open ArcToolbox and go to 3D Analyst Tools> Conversion>From File>LAS to Multipoint.  Select the specific LAS files you want.  It’s likely that you’ll want to use the ground points only.  The usual class code for ground points is 2, but you’ll want to check this by coloring the points by class.

Once you’ve created the multipoint feature, you need to interpolate the values between the points.  There are several ways to do this, with advantages and disadvantages.  Popular methods are Spline and Kriging.  Spline generates a smooth surface but can create nonexistent features, and doesn’t handle large variations over shorter than average distances very well.  Kriging is hard to get right without experience in what the parameters do, and can take a long time to achieve the best results, but attempts to take spatial autocorrelation into account.  In general, Kriging is better for relatively flat areas, and Spline is better for sloped areas.  Inverse Distance Weighting is popular, but produces results similar to a very poorly configured (for terrain anyway) Kriging interpolation.  I find that a safe but time-consuming bet is Emprical Bayesian Kriging, which can be found in the toolbox under Geostatistical Analyst Tools> Interpolation, along with a few other advanced interpolation methods that I am not as experienced with.  If anyone else is familiar with these, I’d welcome a post explaining how best to use them.