Thursday, December 19, 2013

GIS II Post 3: Geocoding Frac Sand Mines of Wisconsin

Goals and objectives

This exercise will focus on the process of finding addresses, specifically sand mines, normalizing those addresses, and learning about the different forms and sources of error.

Outline:
  1. Download and explore data from the Trempealeau County Land Records Division
  2. Download an updated list of mines from the WisconsinWatch website
  3. Connect to the geocoding service from ESRI
  4. Geocode the mines with street addresses using the ESRI address locator
  5. Connect to the department ArcGIS server
  6. Geocode the mines with PLSS manually
  7. Compare your results with the results of your colleagues in class

Methods  

This lab started out at the Trempealeau County Land Records Division, where a database of Trempealeau County was downloaded. This database contained a variety of boundary feature classes along with emergency system features, recreational classes, and a few transportation feature classes. Next a list of existing sand mines in the state of WI was downloaded from WatchWisconsin.org. Upon further review it was clear that the addresses were not normalized, thus they had to be prepared before the mine locations could be geocoded. Geocoding is the process of finding associated geographic coordinates, often expressed as latitude and longitude, from other geographic data, such as street addresses, or ZIP codes. Table 1 shows the mine locations before normalization. Some had complete street addresses, others were in the Public Land Survey System (PLSS) format, and many were incomplete versions of one or the another.

Table 1 Shows the 14 mines I needed to find. The data is not ready for geocoding.

These incompletions could be the result from inaccurate records or error. There are three types of error: 1) gross, 2) systematic, or 3) random. The first source of error is a mistake, a blunder, or the technical name, a gross error. These can occur from writing down the wrong value, reading an instrument wrong, etc. They are not specific to humans and the only way to correct for them is with careful procedures and persistent checking of our work. Systematic errors are those which can be accounted for by mathematical models. This is because systematic errors have a pattern to them. Lets say a remote sensing instrument consistently measures data erroneously because of bad calibration--if the problem in the calibration can be understood and accounted for, then that error is called systematic. Systematic errors usually affect accuracy. The final source of error is random. Random error cannot be controlled. Random errors are often introduced in little bits at each stage of data collection and processing. Random error cannot be corrected but it can be accounted for. By using statistics like mean, median, and mode, the severity of random error can be decreased.

Whatever the reason for the lack of consistent complete addresses, they still needed to be found. Although there was a field in the spreadsheet which stated the business in charge of the mine, most often this didn't help locate the mines. Usually searching across city websites, blogs, aerial photographs and user intuition were combined to located the addresses. As a class we geocoded all the mines combined, but individually we only had to geocode 14 mines. Table 2 shows those 14 mines in a normalized for geocoding format.

Table 2 Shows the data as normalized as possible with the provided information.



Once the addresses were found we uploaded them into a community folder and waited for the rest of the classmates to contribute. Once everyone uploaded their mines it was time to commence with he geocoding.

The mines with correct addresses were geocoding friendly and a point was added without a hitch. However the addresses in PLSS were a manual process. You had to "pick the address from the map," which was the technical term used in ArcMap's geocoding service. This process was carried out by adding a shapefile of PLSS quarter-quarter-sections and given a hollow symbol. This grid, along with the identify tool, allowed for a systematic approach of locating the mine. Table 3 shows a table with the match score for 14 addresses I geocoded.

Table 3 The "score" field shows how close the location of the address was to an address in real life.

Next a new shapefile was created compiling all the mine locations, except for the mines personally located. The new shapefile was then queried again to separate only those mines which shared the same Unique ID as those mines that I had found. This shapefile was then used in combination with the "point distance" tool to test how accurate the mines were. Figure 1, shown below, illustrates how the point distance tool works.


Figure 1, The point distance tool takes two input point datasets and finds the distance from one point to the next. The output is a table which contains record of the input point (INPUT_FID), the nearest point (NEAR_FID), and the distance between the two points.

Results

After both methods of geocoding were performed, my mines finally had a spatial component to them. Image 1 shows the distribution of those mines. Another interesting distribution came when looking at all the mines sharing the same UNIQUE_ID as the sand mines given to me.

Image 1 Shows most the 14 mines I found
spread across West-Central Wisconsin.



Figure 2 Shows the spatial distribution of
all my mines and those that shared the
same UNIQUE_ID field.















Table 4 (below) shows just how close, and far, some of the mines were, in relation to my own mines.
Table 4 A selection from the 750+ results the point distance tool gave as an output.

The mines appear to be similar, however some of the mines that were far off were by thousands of meters. Image 2 shows one such mine which clearly is not in the correct location. Some of the mines I had to add manually, in this case scale could have been to blame.

Image 2 A geolocated mine in the middle of an urban area, a clear error.

Conclusion

In conclusion, this was an effective exercise in demonstrating an important skill set. The process of geocoding is relatively straight forward, but when not all the data is provided, creative and critical thinking are needed. Becoming more familiar with the process of geocoding as well as the many types and sources of error is an important first step on the way to mastery. As more larger scale projects and complex geocoding jobs come along it will be important to remember the foundations behind geocoding and how to avoid error.

Sources

Trempealeau County Land Records Division
http://www.tremplocounty.com/landrecords/

WatchWisconsin.org
http://www.wisconsinwatch.org/2012/07/22/map-frac-sand-july-2012/

No comments:

Post a Comment