Last week we looked at some data management principles and investigated the use of FME as a way to manage your data flow for projects. We had a quick look at the infamous shapefile and how ESRI and ALL other pieces of GIS software use it.
We also quickly made use of our first relational database type of data by using spatialite files. Along the way we downloaded KML files and you have installed PostGIS on your VMs
Today we are going to dig a bit deeper into these files. We wil revisit the shapefile, KML and explore spatialite data. If all goes well – we will also play around with PostGIS.
Quick reminder of Simple Features Specification
Data model as defined by the simple features specification
We may not recall that we are basically working with vector data (in the GIS world that is) that follows the Simple Features Specification from the Open Geospatial Consortium.
If we have a quick look at the document form pages 14 – 17, we can see the basic definition of spatial features as well as some spatial operation definitions.
Many define this as a data model – and it is. The data model is loosely based on the collection of features that are created from connecting points (points to lines and then lines to polygons). This does not account for mathematically defined objects such as a arch (i.e. a three point arc) . This does not mean that GIS software cannot handle these types of data, but the data model we are using (i.e. file types following the simple features spec) only stores data in that form.
Other definitions of data models may be derived form how data is stored – how you manage it for instance. The logic of how you store your data (how you set up a folder tree) is a definition of a data model. People working with you would have to know the logic behind the model to find and store data.
Data structures on the other hand are more granular and can be a basic as to how data is stored on a computer (i.e. we can go as far a bits and bytes). We may think of modelling the data in a way that makes sense (using the simple features spec for instance), but we need to store them at a level that can be retrieved by software. There is an often blurred thin line between where a data model begins and a data structure begins. We do not have to worry about the difference too much in this course, but we should have a rudimentary understanding of the two.
Wikipedia has a nice quick definition and examples of data structures. These definitions are using a computer science reference, but many of the types discussed are used in GIS.
Data models and structures in GIS vector file formats
Let’s take a look at Wikipedia’s definition the shapefile. We can see that there are a number of similarities to the shape file model and that of the simple features spec (not bad for a data format created in the early 90s.
Within the document on wikipedia, we see a reference to the ESRI coverage (specifically around topology – remember topology?!!) that had a different data model for vectors. It focused on creating topological 2 dimensional model.
The model for this file system, folders for each layer and an info folder for the attributes. The data structure are a bunch of binary files in these folders
KML – KMZ
Open up the KML file you downloaded from last week. Open up this file and look through to see the logic behind how data is stored in this file type. Do they follow the simple features specification.
Open up this KMZ file of Scott’s.
We can load this directly into QGIS, but we need to have a deeper look at it. Unzip the file, using the unzip command of GUI in your Linux box, or 7zip on osmotar.
What is the logic behind data storage – i.e. images and the actual KML data? Is this a structure of a model – or both?
Down load this saif file from last week.
Again, we need to unzip this file. Lets bang around what we have just unzipped. What about this logic – it predates any XML based GIS files as well as any OGC standards. Pretty well organised for a couple guys from Surrey BC!
Let’s create a spatialite file in QGIS as we did last week. Grab some data from wherever – i.e. from Open Data portal for the city of Prince George.
Now install spatialite gui on your VM. Scott will walk you through this (he has some person habits installing – i.e. needing aptitude installed first).
Once installed, open the GUI and connect to your spatialite file (database).
Postgis needs to be installed on your VM. Let’s install it following the same methods as above.
We will play around with this for the rest of the tutorial.