Geographic Information Systems Data Quality Issues
Mark Foley mark.foley@dit.ie
Data Quality Issues
Learning Outcomes
Explain the key concepts and terminology associated with data error and quality
Describe errors in spatial data
List types of error that arise in GIS
Outline typical ssources of error in a GIS project
Explain how GIS errors can be modelled and traced
Describe how errors in GIS can be managed
Error
Flaws in data
Individual or persistent / widespread
Accuracy
Within tolerances
Precision
Does not imply accuracy
Spurious accuracy
Bias
Systematic variation
Resolution
Smallest feature that can be displayed
Generalisation
Simplifying complexity
Completeness
Spatial
Temporal
Sample data
Field checking
Compatability
Resolution
Data type
Accuracy versus precision
Resolution and generalization of raster datasets
Scale-related generalization
Sources of Error
Understanding and modelling of reality
Source data
Recording
Instrument callibration
Classification
Data encoding
Digitizing errors
Data editing and conversion
Automatic error correction
Format conversion
Data processing and analysis
Scale
Aggregation
Classification
Data output
Poor communication
Conceptual view of uncertainty
Two mental maps of the location of Northallerton in the UK
Multiple representations of Tryfan, North Wales
Multiple representations of Tryfan, North Wales
Problems with remotely sensed imagery: (left) example of a satellite image with cloud cover (A), shadows from topography (B), and shadows from cloud cover (C); (right) an urban area showing a building leaning away from the camera
Land use change detection using remotely sensed imagery
Digitizing errors
Topological errors in vector GIS: (a) effects of tolerances on topological cleaning and (b) topological ambiguities in raster to vector conversion
Vector to raster classification error
Topological errors in vector GIS: (a) loss of connectivity and creation of false connectivity and (b) loss of information
Effect of grid orientation and origin on rasterization
Generation of sliver polygons
Finding and modelling errors in GIS
Checking for errors
Visual checking
Statistical – measure extreme values etc
Error modelling
Monte Carlo simulation
Random noise introduced
Process repeated & results compared
Distribution of results gives idea of possible error
Visibility analysis example
Point-in-polygon categories of containment
Simulating effects of DEM error and algorithm uncertainty on derived stream networks
Simulating the effects of DEM error and algorithm uncertainty on radio communications in the Happy Valley area
Simulating uncertainty in the siting of nuclear waste facilities
Bootstrapping or ‘leave one out’ analysis
Managing error
Lineage
History of development of dataset from source to present format