Introduction

As we live in the information age, we often need ways of wading through huge amounts of data, as computers get faster they can in many cases work faster than we can. It is often very advantagous to automate repetitive tasks in GIS to allow you to do more analysis in the same amount of time.

What is a script?

A script is simply a set of instructions to to be completed. Scripting is very similar to a model as we looked at in the beginning of the class. When we talk about automation the conversation will start moving from how do I complete this task, to how do I complete this task many times.

To write a script you need to know what you want do do (develop a model). Each of those steps can then be implemented in code. Our goal here is not to learn every command line tool, but to think about what we want to acomplish it and know how to find the commands to do it.

The Tools we are using today

Bash: Or also known as the Bourne-again Shell is the primary terminal language used on Debian. Bash allows us to run various commands and make shell scripts that can automate the running of these tasks.

GDAL: Or the Geospatial Data Abstraction Library provides a set of tools that we will use for working with spatial data.

The Motivation

Lets look a a scenario where you are working at a company, and your boss asks you to convert some gpx files from a gps into kml files that can be viewed on google earth. Being the GIS wiz that you are you open the files in QGIS and save as KML, and all is well, until your boss finds a few thousand more files to convert.

Unix File Permissions

On unix based operating systems, file permissions work a bit different than Windows, on windows we use Acess Control Lists where we select what users or groups have what permissions in a list. However on Unix operating systems we use the file mode. 

Owner: Every file in Unix has an owner and a group, you can change who owns files with the chown command. It is in the format

chown <user>:<group> <filename>

Permissions: File permissions are set using a bit mask (series of 0’s or 1’s to set options). There are 3 sets of three permissions. The sets are ordered Owner, Group, Everyone, and the permissions read, write, excicute. And are set with chmod in the format “chmod <mode> filename” (Ie. chmod 755 filename).

Permission Bit Masks

Permissions starting with * should be avoided.

PermissionsBitmaskDecimal
No Access0000
* Execute0011
*Write0102
*Write & Execute0113
Read Only1004
Read and Execute1015
Read and Write1106
Full Permissions1117

Most of the files in your home directory are likely either 550, 540 or 500.

Conversation topic: Why might these permissions be common.

When setting up a webserver the most common file permissions are 755 and 754. Why might the files be set like this?

As we are writing scripts to automate our workflows it may also be benificial to simply add the execute permission this can be done with the command

chmod +x <filename>

GDAL

GDAL can be used to perform many tasks, there is a complete list available here: https://gdal.org/programs/index.html.

For this task we will be taking a closer look at ogr2ogr (In the world of GDAL vectors all use the OGR drivers which has that name mostly for historical reasons, so you can think of this as vector to vector.)

By clicling on the program you will be taken to a page showing a Synopsis, and Description of what the program does, and what arguments can be used to do it.

You can download the data for this lab to your VM from https://fileshare.gis.unbc.ca/index.php/s/FRzbbZo6myJidcB once you have it (You will need both the GPX and images folders) downloaded, unzip and and place it in ~/data/ (you will need to create this directory, do not save data in your git repository!)

Make a directory for KML’s beside the GPX directory.

In this case the basic command we will be using is

ogr2ogr -f KML ~/data/KML/<output.file> ~/data/GPX/<input.file>

Try it now by by replacing input.file and output.file with one of the GPX files to be converted.

Automation Using Bash

Now that we can do something once let try to do it multiple times, create a file in your favorite text editior (if you don’t have a favorite try Visual Studio Code) name the file gpx2kml.sh, save the file and run the command

chmod +x gpx2kml.sh

In this case chmod changes the file mode, +x add executable to the file mode, and then gpx2kml.sh is the file we are altering.

start by giving the file the following contents

FILES=`ls ~/data/GPX`
for File in $FILES
do
	echo File
done

enter ./gpx2kml.sh in the terminal, you should see some text showing the statistics for each GPX file.

Next we need to look at how to seperate file exensions from files in Bash, this is done using a process called parameter expansion

parameter="filename.extension"

echo $parameter
#  Output should be "filename.extension"

echo "${parameter%.*}" #  %. ignores everything after .
#  Output should be "filename"

echo "${parameter##*.}" #  #. ignores everything before .
#  Output should be ".extension" 

echo "$parameter".newextension #  Example of bash concatenation
#  Output should be "filename.extension.newextension"

Using the examples above how might you alter the for loop to complete the file conversion? Please try to work on this yourself before viewing the answer.

Some old but useful tools

“|” The pipe operator, the pipe is the vertical bar, on most computers this is typed by pressing [Shift} + \ (The key right above enter), though some laptops or non US keyboard layouts may be slightly differet.

The Pip operator is used to pipe the output of one command to a 2nd command and is used in a format like this

<command1> | <command2>

grep (https://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_02.html)

Grep is a tool we can use to search the output of commands using regular expressions, there is no need for you to know what a regular expression (also called RegEx) is at this point. For now it is enough to know it is a search tool.

As an example run the following command

gdalinfo ~/data/images/DJI_0556.JPG

You will notice that there is a lot of text here, but what if we wanted to know the Aperature value of the camera, we can pipe this into grep, and search for aperature

gdalinfo ~/data/images/DJI_0559.JPG | grep EXIF_Aperture

notice now we only get one line back.


awk(https://www.gnu.org/software/gawk/manual/gawk.html): the last tool we are going to look at is for seperating text, by deliminators (characters that mark a change in data type). This is important as when we automate our processing we will want to get values and set them to variables without the surrounding text.

In this lab we really don’t want to delve far into awk, we just want to learn enough to read some attributes. For our purposes it will be enough for you to understand that in the code we are removing everything before the ‘(‘ and after the ‘)’.

gdalinfo ~/data/images/DJI_0559.JPG | grep EXIF_Aperture | awk '{split($0, a, "[()]"); print a[2]}'

For the keen among you if you really want to know split breaks up text is has an input ($0 is passed in by the pipe), an output, we are arbitrarly creating a variable a, and a deliminator (any charater inside the square brackets is considered a seperator. Then we come to print, this is reterning the 2nd element of a (the first element is “EXIF_Aperature=”, which we don’t care about).

Making decisions in our scripts

The last skill we are going to take on at this point is making decisions, on how to process our data, and we are going to do this with the if statement.

Lets take a look at the code line by line

# Get a list of all files in images folder, save to variable files
files=$(ls ~/data/images) 

for file in $files # Do an operation on every file in the directory
do
    # Use gdalinfo to extract the altitude where photos wer taken
    value=$(gdalinfo ~/data/images/$file | grep EXIF_GPSAltitude | awk '{split($0, a, "[()]"); print a[2]}')
    echo $value # Prints the value we just extracted
    ###
    # The next line has several parts goin on
    # "$value > 854" Is the altitude higher than 854 feet?
    # echo command is used to pipe this question to bc (basic calculator)
    # Basic calculator returns 1 for true, 0 for false
    # expression inside $() is evaluated first
    # -eq is the command for equals, -eq 1 is the same
    # as asking, is the question we sent to bc true
    ###
    if [ $(echo "$value > 854" | bc) -eq 1 ]  
    then # If the above question is true we can process based on that information
        echo "High Altitude"
    else # If the statments is false is there something else we should do
        echo "Low Altitude"
    fi # fi, is if backwards and tells the computer we do not have conditions past here
done # done ends the loop

GIS as a mindset

The goal of todays lab was not to make you an expert in shell, but rather to help inform the way you think about problems. If you know what steps you want to acomplish you can always use your favorite seach engine to find ways to acomplish a spific task. You may have noticed similarities in what we have done to model builder, and for good reason, in model builder we use a GUI to define a processing model, a script is at its heart the text based representation of a data processing model.

When thinking about automation in your workflow you want to break down your problem into simple repetitive steps. And decide what data these steps need to be applied to. Automation can be extremely benificial for routine tasks, but it can also be dangerous if outputs are not checked.

Most of the GIS work you will do is comprised of basic opterations such as Threasholds, Clipping, Intersections, Buffers, etc. Ultimatly at this point we are trying to think about how to solve complex spatial operations by chaining together these simple operations.

Practice