This lab is the first of three parts discussing how we use spatial data. We all have a great deal of experience in GIS and data, but are not necessarily managing or understanding the data we are working with. We are going to look at the structure of some of the data formats we use in the GIS lab and discuss what benefits each may have.
The data for this lab is in /home/labs/geog413/data/datastructures
- Review different data formats
- Optimize the use of databases by preliminary data handling
- Review a bit of FME
- Look at integrating interactive media – such as photos
- Start a discussion around data management
Understanding Data Structures by examining Simple Features
There will be a plenty of technical materials discussed in this section – take notes…
List of Considerations when deciding which data type (format to use)
We need to think of a few topics that are associated with how different spatial data are put together in respective formats such as:
Uses of the format – i.e KML files are really only intended to work with Google Earth, but GML files are dessigned to be be a complete data standard format for sharing spatial data (both are XML based)
How are the geometries desinged (i.e. simple features or draing rule based (simple features vs CAD mefhods)
How are attributes associated with the spatial data (i.e. limitations with Shape files vs database type of storage)
Are the data indexed to speed up performance and these indexes effects on storage
How are other the data types extended – i.e. What about topology!!, Networks, Linear Segmentation
What are the limitations of the file types
GIS Data file formats
There are a ton of GIS file formats for both raster and vector types. Look at the list from FileInfo. This is a crazy list – if you travel down the list you may recognize many types of files you have already worked with in your work in the GIS Lab. We are going to explore a number of these files in order to get an understanding of data formats and data structures.
Lets look at another list of GIS formats from wikipedia Wikipedia list of GIS Formats
Looking at Shapefiles and other simple feature file types
The The Ubiquitous Shapefile
If we click on the shape file link in the Wikipedia page , we can see a description of the file we are most familiar with. Lets review what we know of this format. If you look down the page you can see a link to ESRI’s publication of the shape file format (describing how the guts of it work). We are going to discuss the idea of how simple features are used in GIS and other fields by reviewing some of the links in the Wikipedia page as well as others. Below are some of the link we will be looking at. This will be a bit of a class led discussion of theory and application.
Shape file Library – ESRI – 1999 – https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
Simple Features Specification – OGC – 2011 http://portal.opengeospatial.org/files/?artifact_id=25355
Feature Class Basics – ESRI – 2016 – http://desktop.arcgis.com/en/arcmap/10.3/manage-data/geodatabases/feature-class-basics.htm
Limitations of Shapefile – http://desktop.arcgis.com/en/arcmap/latest/manage-data/shapefiles/geoprocessing-considerations-for-shapefile-output.htm#GUID-B845DF9F-78C9-439F-9674-2BFEEFF8D58E
Shapefiles are a binary data type – in other words, you cannot open them in a text editor (although try the command more bcparks.dbf from a terminal in the /home/labs/geog413/data/bc folder – you will see attribute text information). We will be looking at text data that represent simple features using different types of text data.
Exercise 1 Text Data:
Using a text file to create a point layer – This should be a simple task for students at this level, but if you have forgotten how to do it – Scott will illustrate it. We will use the old postal code data we have hanging around for some time to create points by:
- Open QGIS
- Layer –> Add Delimeted Text Layer
- Navigate to /home/labs/geog413/data/datastructures/text_data and load the postal.txt file
- Determine the parameters in the interface and load into the canvas
Now open the postal code file in a text editor – such as gedit on Linux or Notepad++ on Windows. What do you see there – not much to it. How does it compare to others using the list of considerations?
Lines as text – Using the text editor open the pgroads.gen file. What does this data set represent – what is the feature type of this potential layer (this type of file is called an ESRI ArcInfo Generate file)? Drag it and the pgbound.gen file into QGIS.
Open the pgroads_attributes.csv into QGIS (just drag it in). How can you use the text information from the CSV file to create a road line layer with its attributes?
How does the generate text method work in the consideration list?
Forshadowing – by createing the line layer with its attribute you are simulating the creation of the ESRI coverage. This type of layer has topology built in..
Exercise 2: XML Based text data
Open Google earth and create a polygon feature with a web image link in the file. Try to see if you can do this GIS Professionals…
- Open Google Earth
- Right click on Temporary Places –> Add –> Polygon –> draw it leaving the description panel open
- Add a name and descrtption and a web image location.
- close the panel (press OK)
- right click on the new layer and save place as –> to your working folder
Drag this KML file into the QGIS project as well as opening it in a text editor.
What does this file provide in regards simple features specifications? How does it compare to a shapefile of the other text files?
Drag in Scott’s Walk file in the GPX folder as well as a text editor (tracks, track points and waypoints). How are the spatial and attribute data stored here?
We have left GML data for later in the lab – when we get to play with FME!!
Non-Simple Features data types
Exercise 3 – ESRI Coverage and ASCII data:
Lets open a new project in Q to start this section
Relational models of data formats are heavily used for corporate data management. We will be looking at this next lab when we work with a relational database, but ESRI’s data format for decades (not that the shape file has not been around for decades..), was the coverage. Instead of the wide open object model of the simple features, the relational model makes use of a strict data structure. This coverage model is very powerful in that it has topology built into it (what Scott calls old style topology), but it has limitations as well – it only works in 2D for instance.
There is a coverage called campus in the coverages directory of the datastructure folder. You can bring it into QGIS by the usual drag, or by adding a vector layer, but choose directory instead of a file (and select Arc Info Coverage. When bringing it into QGIS it asks you what layers you want from it (as we saw with the GPX file). \
Once the layers are in the Table of Contents of Q, have a look at the four layers present and consider the following questions:
Why can there be more than one feature type in this model (as a matter of fact there always has to be at least points and lines in the layer)?
Are there other layers that should have be available for loading as well?
What data are the four layers representing (i.e. the two point layers)?
How are the attributes in the line and polygon layers connected?
The campus layer is a directory on the file system – why is there also an info directory with all the other layers (which are also folders)?
Any idea what files such as arc0037.nit and pal.adf are used for?
How does this datatype fall out in the considerations list?
The ESRI interchange data format e00 file
The ESRI interchange format was the way ESRI shared coverage layers. It was robust enough to be used to export both raster (Arc Grid data) and vector data (Coverages and TINS). If data was really large it would span it over multiple files (i.e e00, e01, e02..) and was either in text (ASCII) or binary formats.
Open the roads.e00 file from the e00 folder into QGIS. What do you notice about it once it is loaded?
Open the same file in a text editor.
What do you see here? Crazy amount of data! How does this contain the data?
If this data represents coverage formats – is it topology sound?
DEM ascii file Look at the contents of the *.asc file dem_ascii directory in a text editor as well as loading it into QGIS. This is our fist look at raster data.
Can you think of a way you can clip out a corner (say a 30×30 pixel sample) of the file before you bring it into a piece of software. Lets open it in LibreOffice Calc (or Excel if you wish).
What steps did you take (following along with Scott)?
Exercise 4: CAD Drawings
Open up the roads.dxf file into QGIS. This is an unremarkable layer, but if we try to open it in the FME viewer we can see the uses of CAD data. Lets give it a shot.
Back on Osmotar Open all three files (one at a time) in the FME Data Inspector. Two of them are AutoCad formats and one is Bentley Microstation Design..
What types of data to we get here?
Are these simple features?
Can you determine the untis?
Is there a projection?
What is this data depicting (what is it in the real world)?
Working with FME to better understand data structures
FME SAIF File
You may recall using FME translator to work with the saf format in labs or projects previously in the GIS Lab. We are now going to spend some time working with FME Workbench to better understand data structures, but also have some fun with one of the best pieces of software available. This software is made in Surrey BC (Burnaby when Scott first started in the lab and Dale and Don started Safe Software). Our GIS Lab was one of the fist uses of the software (before it was fun to use). Scott has been too lax in using FME and did not even realize it ran on Linux!! As he did not get it on the lab machines for the lab – we are going to use osmotar – but fear not, it is one of the few things that runs well on osmotar.
Lets replicate the translation you did in the last assignment – but using workbench instead. We are going to perform some simple tasks that just scratch the surface of the power the software has by:
- Converting a saf file to shape file
- Looking at the data structure of the saf file to alter our translation
- Add to our workspace to create a DEM directly within the workbench
- Create GML output format – then review the format
Exercise 5: Simple use of Workbench
Open FME workbench and hit the generate button. You will get an interface whereby you can specify your inputs and outputs. Translate a TRIM saf file to a shape file by:
- Generate workspace interface
- Find the saf file (/home/labs/geog413/data/datastructures/safe_fme/saf/93g096.saf) -> the interface should automatically detect input type
- Select the output type
- Select output folder
- Click OK
- Keep all the features
- OK again
You should now be presented with a connect set of inputs to outputs. If you hit the run button, you will get all the trim layers. Run the translation and look at the data in QGIS.
Exercise 6: Data Structure again – the saf file
Use the data structure for your benefit
Let’s now break down the saf data format that was created for use by the provincial government for exchanging trim data. This is a unique data type that reflects the object approach that the software designers at Safe Software utilized in creating the FME suite of tools used globally.
Make a directory called saf_fme in your datastructure directory. Once that is made copy the file 93G096.saf from the safe_fme folder to it. Now open that file in 7zip program on osmotar or the archive manager on linux (why this program?) by right clicking on it and choosing 7zip.
What do you see in the file? What do the osn files represent? Look at the classdef.csn file, does it mean anything to you. Look at the *.dir file, how do the entries relate to the osn?
You can edit the exports.dir file by extracting it to your folder –> editing it in notepad++ (see next sentence)–> right click on it –> 7zip add to archive –> add it back to the saf file. Remove all the objects in the file except: GlobalMetadata, ReferenceSystem, breakline, DEMPoint, island,lake reservoir, rivers stream,road and sandgravel. Now open the file the saf file in FME workbench. Use the workspace dialogue to convert it to a shape file. What do you notice about this file compared to other saf files used above?
Create a DEM and polygons
Open your new saf file into workbench the same way you did with the original TRIM dataset. What is different?
Now experiment with the Transformer Gallery to build the sandbars, lakes and islands into polygons and create a DEM. First hint – you will need to use the RasterDEM generator and also add a second writer in the Navigator Pane (use a manual definition file. Scott will illustrate.
Exercise 6: Data structures GML file
Within the project we started in FME, lets add a second destination type – GML2. Follow along with Scott, as hopefully this will be review. Once the GML file is created, look at it in an editor. Can you figure out how the features are laid out? Rerun the FME flow, but add the rivers to the output GML file. Does this mean that GML can handle different feature types? Can it handle 3 dimensional layers? What is GML? Who was responsible for its creation? What country was it created in? Would it be easy to remove/update features using a text editor?