Specifically we will use PostgreSQL, with the PostGIS extensions. During your career you may also come across ArcGIS Enterprise Databases, Oracle, Microsoft SQL, MySQL, or others. We have chosen PostgreSQL because of the PostGIS extensions, and it is open source, however the basic principals leared here will apply to other databases though the command syntax will change slightly.

CRUD

Data bases have 4 basic operations Create, Read, Update & Delete

These operations can be done to Tables, Users, and most commonly Records, a record in a database is a single line in the table, in our case we can typically think of a record as a point, or as a polygon.

We have already created data last tutorial. And we will now continue off by reading the data everyone should be able to

Reading Data

In pgadmin4 select your database created on monday and lets start with the first command:

SELECT * FROM da_boundary;

If you view geometry you should see the entire province

The next command we are going to look at is the WHERE command, you can think of it as being similar to a search tool. What we will do next is show only census subdivisions that have a type city.

SELECT * FROM da_boundary
-- Do note that in SQL single quotes mark strings
-- Standard quotes are treated as column names
WHERE csdtype = 'CY'; -- Do note that in SQL single quotes mark strings

You should now have a map of the cities in BC

At this point we have polygons for cities but now we want to find their populations, you will notice that there is no population data in this table, but that is OK, because we also have a blocks_pop table that we will join to this table much the same way you would join a csv to a shape file in qgis.

Lets start by looking at a diagram from https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

Scroll down to near the bottom where you will see a set of Venn diagrams. For our purposes A is the polygons and B is the Population data.

We want the pologons that have population data (this should be all of them as they are from the same dataset), so we will be ussing an inner join. And the code looks something a little like this.

SELECT * FROM da_boundary
INNER JOIN blocks_pop ON da_boundary.dauid = SUBSTRING(blocks_pop.dbuid, 1 , 8)
WHERE csdtype = 'CY';

I would like to make a note here, that for every boundary, there is multiple population blocks, but for every blocks_pop there is only one da_boundary. To account for this the id’s in da_boundary are 8 digits, while in blocks_pop they are 11 digits (The first 8 matching da_boundary). Can anyone think of another solution to this that would not require the use of SUBSTRING on join?

So at this point we have a large data table lets look at combining rows to make this more human friendly. We will be using a process known as aggregation. If you want more details feel free to read this https://www.postgresql.org/docs/13/functions-aggregate.html but the TLDR is we can group lines and then take averages, count, sum, etc the rows that there are multiple of in a group.

Try running this querry, and then share what error you got, and if you have a theory about why you are getting this error in the chat.

SELECT * FROM da_boundary
INNER JOIN blocks_pop 
ON da_boundary.dauid = SUBSTRING(blocks_pop.dbuid, 1 , 8)
WHERE csdtype = 'CY' GROUP BY cdname;

Next we are going to calculate the population of each census subdivision by summing the population column

SELECT csdname, SUM(population) FROM da_boundary
INNER JOIN blocks_pop 
ON da_boundary.dauid = SUBSTRING(blocks_pop.dbuid, 1 , 8)
WHERE csdtype = 'CY'
GROUP BY csdname;

And lets make this more readable by sorting the results by population

SELECT csdname, SUM(population) FROM da_boundary
INNER JOIN blocks_pop
ON da_boundary.dauid = SUBSTRING(blocks_pop.dbuid, 1 , 8)
WHERE csdtype = 'CY'
GROUP BY csdname
ORDER BY SUM(population) DESC;

But we are in GIS class not statistics lets apply agregation to the geometries!

SELECT csdname, SUM(population) AS "Population", 
ST_Union(da_boundary.geom) AS "Boundary" FROM da_boundary
INNER JOIN blocks_pop
ON da_boundary.dauid = SUBSTRING(blocks_pop.dbuid, 1 , 8)
WHERE csdtype = 'CY'
GROUP BY csdname
ORDER BY SUM(population) DESC;

And take a look at the polygons produced now, notice how now there is only a single polygon per city.

Hands on:

Continuing off of this example extend this code to provide the population density for each city. Some tips first off don’t worry about units at this stage relative density is good enough. Also you can find the function to calcuate area on this page https://postgis.net/docs/reference.html isn’t ([ctrl] + f) great!

What city has the highest population density?

SELECT csdname, SUM(population) AS "Population", 
SUM(population)/ST_Area(ST_Union(da_boundary.geom)) AS "Pop Density" FROM da_boundary
INNER JOIN blocks_pop
ON da_boundary.dauid = SUBSTRING(blocks_pop.dbuid, 1 , 8)
WHERE csdtype = 'CY'
GROUP BY csdname
ORDER BY SUM(population)/ST_Area(ST_Union(da_boundary.geom)) DESC;

Break Time!

Time to streach our legs.

Using location as an attribute

Next we are going to start looking at some Fire data we will start off by looking as how many fires there were per year in the Prince George Census District

SELECT csdname, fire_year, COUNT(fires) FROM da_boundary
LEFT JOIN fires
ON ST_Intersects(da_boundary.geom, fires.geom)
WHERE csdname = 'Prince George'
GROUP BY csdname, fire_year

There are several changes here we are going to look at, first we did a left join, for now we will think of this as we want everything to show up from da_boundary not just when it overlaps. The catch is it does not know what years are missing so we still don’t see them. However next week we will look at coss joins as a way of addressing this.

Now lets look at the GROUP BY, we are grouping by csdname, and fire_year, now the csdname is not strictly needed as there is only one possiblility, however when using GROUP BY, SELECT may only contain columns in the group, or derrived from an agrigate function, allowing Prince George to show up in the results is just being nice to our future selves when we want to epand this query.

Then finally we get to the SELECT statment, where we are simply counting the number of times a fire polygon.

The 3 Way Join

The next section of code is the most complicated we have seen to date, but fear not we will look at it one step at a time.

Our goal is to determin how much population has been affected by wildfire each year. But lets not get too far ahead, one of the great things about code is it’s like lego and we can start small

SELECT fire_year, SUM(population) AS "Population" FROM blocks_pop
LEFT JOIN fires
ON ST_Intersects(blocks_pop.geom, fires.geom)
WHERE fire_year >= 2015
GROUP BY fire_year
ORDER BY fire_Year;

Given this code we can see that we are summing the population of blocks that have had a fire in them, grouped by years. But at this point we have no place names, this is where the 3 way join comes into play, we need the da_boundary table to get the name information, and we can simply add the 2nd join and a group by csdname like so: (Note this querry will not come back quickly, paitence is a virute in GIS)

SELECT csdname, fire_year, SUM(population) AS "Population" FROM blocks_pop
LEFT JOIN da_boundary
ON da_boundary.dauid = SUBSTRING(blocks_pop.dbuid, 1 , 8)
LEFT JOIN fires
ON ST_Intersects(blocks_pop.geom, fires.geom)
WHERE fire_year >= 2015
GROUP BY csdname, fire_year
ORDER BY csdname, fire_Year;

Hand on!

For the last part of the lab I would like everyone to work on a query where we calculate the burn area for each year in the census subdivision Fraser-Fort George that is:

WHERE cdname = 'Fraser-Fort George'

Hint, you will need to join fires to da_boundary.

To understand area take a look at this script:

SELECT ST_SRID(geom) FROM fires
LIMIT 1;

Units in PostGIS always match the SRID of the geometry, in this case 3005, which is BC Albers, so ST_Area will return square meters.

Your solution should be in Hectare.

Have a Great Day!