Skip to Main Content

Data Collection & Creation

This guide provides information about how to gather data from existing collections as well as methods to create your own data based on your research goals.

Methods of Data Creation - GIS

Manual Digitizing

This is traditionally the most common way to convert paper-based sources of spatial information (e.g. maps) to digital data. The paper map is attached by tape to a digitizing table (or tablet as the smaller digitizers are known). Usually between 4-6 initial points of which the coordinates are known are logged. Optimally these points are such locations as the intersections of graticule lines. In the absence of an overlying grid system, points are taken from identifiable locations such as street intersections or landmarks.

The data is then digitized by tracing the features of interest with a mouse like hand held device called a puck. Once all the features are traced the newly acquired data is transformed from table units (the coordinates of the digitizing table) to real world units using and algorithm. This algorithm takes the known table coordinates of the initial points and warps the data to match the real world coordinates assigned to those points.

To error in the adjustment from the table units to real world coordinates is called the RMS error. Results are reported as root-mean-square (RMS) error and average error. The RMS value reflects the range of the error; the precision of the digitized data.

Factors contributing to this error can be human error, shrinkage or physical alteration of the paper map and projection differences.

Heads up digitizing

With the proliferation of low cost sources of digital imagery and large format scanners, heads up digitizing is becoming a popular method of digital conversion. Also know as on-screen digitizing, this method involves digitizing directly on top of an orthorectified image such as a satellite image or an aerial photograph. The features of interest are traced from the image. The benefit of this over manual digitizing is that no transformation is needed to convert the data into the needed projection. In addition, the level of accuracy of the derived dataset is taken from the initial accuracy of the digital image. Heads-up digitizing is also utilized in extracting data from scanned and referenced maps.

Coordinate Geometry (COGO)

Coordinate geometry is a keyboard-based method of spatial data entry. This method is most commonly used to enter cadastral or land record data. This method is highly precise as entering the actual survey measurements of the property lines creates the database. Distances and bearings are entered into the GIS from the original surveyor plats. The GIS software builds the digital vector file from these values.

Geocoding

Geocoding is another keyboard-based method. Geocoding uses addresses from a flat file (such as a .dbf file, MS Access database or excel spreadsheet) to create x,y coordinate locations interpolated from a geocodable spatial database. These spatial databases are most commonly street centerline files but can be other types. The resultant geocoded database is a point file. The geocode process can be done in most of the GIS softwares, such as ArcGIS, QGIS, Ilwis. It can also be done online using Google API and ArcGIS Online. In the learning resources you will find some tutorials on how to execute this procedure.

Global Positioning Systems (GPS)

GPS is a highly accurate navigation system using signals from satellites to determine a location on the Earth’s surface, irrespective of weather conditions. Originally devised in the 1970s by the Department of Defense for military purposes. Through interpolation, these satellite signals received by a data logger can pinpoint the holder’s location. Depending on the unit, the locational accuracy can reach to the millimeter. Combined with attribute data entered at the time of collection, GPS is a rapid and acccurate method of data collection. For more information access Trimble’s excellent tutorial on GPS.

Surveying and mapping was one of the first commercial adaptations of GPS, as it provides a latitude and longitude position directly without the need to measure angles and distances between points.

However, it hasn’t entirely replaced surveying field instruments such as the theodolite, Electronic Distance Meter, or the more modern Total Station, due to the cost of the technology and the need for GPS to be able to ‘see’ the satellites therefore restricting its use near trees and tall buildings.

In practice, GPS technology is often incorporated into a Total Station to produce complete survey data. GPS receivers used for base line measurements are generally more complex and expensive than those in common use, requiring a high quality antenna.

There are three methods of GPS measurement that are utilised by surveyors.

  • Static GPS Baseline. Static GPS is used for determining accurate coordinates for survey points by simultaneously recording GPS observations over a known and unknown survey point for at least 20 minutes. The data is then processed in the office to provide coordinates with an accuracy of better than 5mm depending on the duration of the observations and satellite availability at the time of the measurements.
  • Real Time Kinematic (RTK) Observations. This is where one receiver remains in one position over a known point – the Base Station – and another receiver moves between positions – the Rover Station. The position of the Rover can be computed and stored within a few seconds, using a radio link to provide a coordinate correction. This method gives similar accuracy to baseline measurements within 10km of the base station.
  • Continuously Operating Reference Stations (CORS). This where a survey quality GPS receiver is permanently installed in a location as a starting point for any GPS measurements in the district. Common users of CORS are mining sites, major engineering projects and local governments. Surveyors’ GPS receivers can then collect field data and combine it with the CORS data to calculate positions. Many countries have a CORS network that are used by many industries. Australia’s CORS network is the Australian Regional GPS Network, and uses an online processing system to deliver data over the internet within 24 hours, and give positions within an accuracy of a few centimetres. Local CORS networks are also used to provide instant positions similar to the RTK method by using a mobile phone data link to provide a coordinate correction to the surveyor and their rover.

TOTAL STATION

It is a surveying instrument that integrates an electronic theodolite with an electronic distance meter.

A theodolite uses a movable telescope to measure angles in both the horizontal and vertical planes. Traditionally they are manual instruments that come in two types – transit, which rotates in a full circle in the vertical plane, and non-transit, rotating in a half-circle.

Total Stations use electronic transit theodolites in conjunction with a distance meter to read any slope distance from the instrument to any particular spot. They are hence two essential surveying instruments in one and when used with other technology such as mapping software are able to deliver the ‘total’ surveying package, from measuring to mapping.

Image Processing

Geodatasets can be derived from digital imagery. Most commonly satellite imagery is utilized in a process called supervised classification in which a user selected a sampling of pixels for which the user knows the type (vegetation species, land use, etc). Using a classification algorithm, remote sensing software such as R, ERDAS, ENVI classifies a digital image into these named categories based on the sample pixels. In contrast to the other methods discussed, supervised classification results in a raster dataset.

 

LIdar

LIdar is for Light Detection And Ranging is a remote sensing product that measures distance to a target using pulsed laser. Differences in laser return times and wavelengths can be then used to make 3D models and DEM and DSM. In common basis, Lidar has been used in airborne to scan the surface. Some new technology allows to use ground laser to get information on a perspective of the objects in the surface.

 

GIS data collection (download)

You can find GIS data available in many different sources and websites to download. One thing that is important in this process of getting the metadata. The metadata is the file that will tell more about the data, how it was created, the source, date, author, coordinate system. With this metadata you will be able to say if this data will fit in your needs in terms of scale, error, method. be derived from digital imagery. In this session you will find some links where you can search and download spatial data.