Geographic Information Systems Data Input and Editing
Mark Foley mark.foley@dit.ie
Data Input and Editing
Learning Outcomes
Explain the difference between analogue and digital data sources for GIS
Give examples of different methods of data encoding
Describe how paper maps are digitized
Explain how remotely sensed images are imported into GIS
Describe some of the problems that may be faced when encoding spatial data
Give examples of methods of data editing and conversion
Outline the process required to create an integrated GIS database
Encoding
Getting your data into a computer
Problems
Lots of formats both analogue and digital
Error-prone therefore requires editing
May need to be
Re-formatted
Re-projected
Generalized
Matched/joined
Analogue & Digital Data
Analogue
Usually in paper form
Maps. Stats, aerial photographs etc.
Need to be converted to digital form before use
Digtial
Maps, stats etc. As above
Data from databases
Data collected from devices such as GPS
Format, scale & resolution will vary
The data stream
The data stream (continued)
Methods of data input
Keyboard entry
Manual or OCR
Manual digitizing
Acquiring data from paper maps
Registration
Digitizing features
Adding attributes
Major source of positional errors
Digitizing table and PC workstation
Point and stream mode digitizing
Point mode – person digitizing decides where to place each individual point such as to most accurately represent the line within the accepted tolerances of the digitizer. Points are placed closer together where the line is most complex and where the line changes direction. Points are placed further apart where the line is less complex or made up of straight line segments.
Point and stream mode digitizing
Stream mode – person digitizing decides on time or distance interval between the digitizing hardware registering each point as the the person digitizing moves the cursor along the line. Points are placed closer together where the line is most complex only as the person digitizing slows the movement of the cursor down to more accurately follow the line. Points are placed further apart where the line is less complex or made of straight line segments allowing the person digitizing to move the cursor more quickly
Bézier curves and splines
Bézier curves are defined by four points; a start and end point (nodes) and two control points. When using Bézier curves to define curves when digitizing a curve between two points, the control points are used to mathematically determine the arc (path) of the curve on leaving the start point and on arriving at the end point. Bézier curves are used in many vector drawing/drafting packages such as Adobe Postscript.
Bézier curves and splines
Splines are mathematically interpolated curves that pass through a finite number of control points (e.g. P0–P3). They can be exact (i.e. pass exactly through the control points) or approximating (i.e. pass approximately)
Automatic digitizing
Scanning
Appropriate where raster is required
Automatic line following
Problems with scanning
Optical distortion
Unwanted information (coffee stains, annotations etc.)
Selection of scanning tolerances to ensure important data encoded and background ignored
File formats produced
Amount of editing required
Electronic data transfer
Issues
Data from many sources
Heterogeneous data formats
Conversion
Georeferencing
Most GIS software handles these issues
Role of metadata in publishing and finding suitable data
Digitizing software
On-screen digitizing
Data editing
Detecting and correcting errors
Attribute and spatial data checking
Re-projection, transformation and generalization
Edge matching and rubber sheeting
Developing an integrated database
Detecting and correcting errors
3 main sources of error
Errors in source data
Errors introduced during encoding
Errors propagated during data transfer and conversion
Common errors
Missing entities
Duplicate entities
Mis-located entities
Missing labels
Duplicate labels
Under/overshoots, loops & spikes etc.
Examples of spatial error in vector data
Attribute and spatial data checking
Impossible values
Range checks
Extreme values
Cross check against source document
Internal consistency
Statistics, totals and means
Scattergrams
Check any data item that departs markedly from regression line
Trend surfaces
Highlight points that depart from the norm
Examples of original data problems and the corrected data after processing
Radius Topology Feature Snapping
Filtering noise from a raster data set
Filtering noise from a raster data set (Continued)
Re-projection, transformation and generalization
Provide a common frame of reference
Translation and scaling
Varying coordinate systems
Varying scales
Create a common origin
Shift (dx, dy) coordinate values
Rotation
Coordinates rotated to fit common grid using trigonometry
Generalise to common scale
Topological mismatch between data in different projections
Douglas-Peucker Algorithm
Join start/end nodes of a line with a straight line
Examine the perpendicular distance from this straight line to vertices along digitized line
Discard points within a certain threshold of the straight line
Move the straight line to join start point to vertex greatest distance from original straight line
Repeat until no more points closer than the threshold distance
The results of repeated line thinning
The results of repeated line thinning (Continued)
Edge matching and rubber sheeting
Edge matching
Process of joining separately digitized but adjacent map sheets
Solve mismatches at sheet boundaries
Solve incompatible classifications
Rebuild topology
Eliminate redundant sheet boundaries
Rubber sheeting
“Stretch” a map as if it was drawn on a rubber sheet
“Tack down” points that are accurately placed, strech others to fit using control points
Edge matching
Rubber sheeting
Developing an integrated database
Important to ensure quality is maintained and that data is up-to-date