Andrew Jones

Home   Blog   Projects   Tutorials   Map Gallery

Automating GIS Workflows: Creating a Python Script to Automate Areal Proportion Analysis


Python has become an indispensable tool in automating Geographic Information Systems (GIS) workflows due to its versatility and the rich ecosystem of libraries it offers. By leveraging libraries such as ArcPy for Esri’s ArcGIS platform or GeoPandas for open-source GIS, Python enables users to script and automate complex geospatial tasks with ease. This includes automating data processing, performing spatial analyses, and generating maps. With Python, GIS professionals can create custom tools and workflows that streamline repetitive tasks, ensure consistency, and enhance productivity. Additionally, Python’s integration with web mapping libraries like Folium or Plotly facilitates the creation of interactive and dynamic geospatial visualizations, further expanding its utility in the GIS domain.


In this tutorial, the workflow for areal proportion analysis will be automated. Areal proportion analysis involves examining the relative sizes of different spatial units within a geographic area to understand their distribution and impact. This type of analysis is particularly useful for estimating demographic characteristics across varying geographic scales, such as between census blocks and zip codes.


Figure 1 below provides a visual depiction of areal proportion analysis, where smaller spatial units are aggregated into larger units.


Figure 1. A Visual Depiction of Areal Proportion Analysis

This tutorial utilizes ArcPy and SSUtilities, so it is specifically designed for ArcGIS Pro. In many cases, other GIS operations may require additional libraries such as NumPy, GeoPandas, or others, depending on the specific analysis or functionality needed.


The workflow for Areal Proportion Analysis


Before attempting to automate a GIS workflow, it is essential to understand how the workflow operates. The best way to achieve this is by manually executing the steps to familiarize yourself with the process, understand the necessary nuances, and identify which geoprocessing tools are involved.


Areal proportion analysis requires two spatial layers: one smaller layer with population (or numeric) values, and one larger layer, which may be empty (i.e., no numeric values).


  1. Prepare the Smaller Layer : In the smaller spatial layer, add a field called “AREA” and calculate its geometry in your desired unit (e.g., square miles).
  2. Intersect the Layers : Next, intersect the two layers. Give the new intersected layer an easily identifiable name.
  3. Add and Calculate Geometry in the Intersected Layer : In the newly intersected layer, add a field called “NEWAREA” and calculate its geometry in the same unit used for the “AREA” field in the smaller layer.
  4. Calculate Population Proportions : Add another new field called “NEWPOP” and calculate its value using the formula:
    [POP] * [NEWAREA] / [AREA]
    This formula scales the population from the smaller units to the intersected areas based on their relative sizes.
  5. Summing the Population Values : The “NEWPOP” field needs to be summed up to the aggregated units in the larger spatial layer. This can be done using the Summary Statistics tool, which sums the "NEWPOP" field for each feature in the original larger spatial layer.
    This new table should be saved with an identifiable name, such as POP_TABLE.
  6. Join and Export : Finally, to save the changes to a permanent feature class, join the summary table to the larger spatial layer and export it as a new shapefile or feature class.

These steps are visualized in a modelbuilder simulation below (Figure 2).


Figure 2. A Modelbuilder depiction of the Areal Proportion Analysis Workflow

Writing the Script in Python


With the workflow established, a Python script can now be created to automate the process. To write this script, it is important to use a Python interface, such as IDLE, which will facilitate the development and testing of the script. Since this tutorial is focused on ArcGIS applications, the script will be written using ArcPy, although a similar script could be written in QGIS with PyQGIS.


When working in a separate Python interface, the arcpy module must first be imported into the script. This can be done by adding the command:



import arcpy
import SSUtilities

Figure 3. Import Additional Python Modules

Additionally, the SSUtilities module will be imported to construct a results table that will display once the script has completed its execution. This module is useful for creating summary reports and tracking the status of the script during its run.


Next, it is important to set up the environment settings for the script. These settings define the workspace, enable overwriting of existing outputs, and ensure that outputs are added to the map. The following ArcPy commands can be used to configure these settings:



from arcpy import env # Bring in the workspace
arcpy.env.overwriteOutput = True
arcpy.env.addOutputsToMap = True

Figure 4. Establish the Environmental Settings for the Workspace

At this point, it is important to consider what should be used as an input parameter. Certainly, the small and large spatial layers should be considered parameters. Similarly, if working with multiple numeric fields, there should be a specific field used in the analysis. Finally, a “to” and “from” join field should be identified to join the large layer with the intersected layer. These will be identified using the “arcpy.GetParameterAsText(n)” command.



blockLayer = arcpy.GetParameterAsText(0) # Small spatial layer (with population or numeric data)
aggregateLayer = arcpy.GetParameterAsText(1) # Large spatial layer (to aggregate data to)
populationField = arcpy.GetParameterAsText(2) #  The numeric field (e.g., population)
blockJoinField = arcpy.GetParameterAsText(3) # "From" field in the smaller layer (used to join)
aggregateJoinField = arcpy.GetParameterAsText(4) # "To" field in the larger layer (used to join)

Figure 5. Establish User-Defined Parameters

These parameters make the script more flexible, allowing users to easily customize the input layers and fields based on their specific dataset. Once the input parameters are set up, proceed with the geoprocessing steps and calculations as described in the workflow.


It is important to set the workspace for the feature layer dynamically to ensure that the script works in different environments or locations without needing to hardcode paths. This can be achieved using the "Describe" command in ArcPy, which provides information about the properties of a dataset, such as its path, data type, and other properties. The "Describe" function dynamically assigns the workspace to the feature layer.


To assign a relative system path for the workspace, use the r"" (raw string) syntax, which treats backslashes in file paths correctly without needing to escape them. By setting the workspace dynamically, the script becomes more flexible and portable, eliminating the need for hardcoded file paths.



desc = arcpy.Describe(blockLayer) # Get the path of the input layer dynamically
workSpace = r"" + desc.path # Extracts the directory path of the layer

Figure 6. Automatically Extract the File Location of the Input Block Layer

The larger spatial layer must be converted into a feature layer to enable editing or use in a geoprocessing workflow. This conversion is performed using the “MakeFeatureLayer” tool. First, a name, such as “aggregateLayer,” should be assigned to the layer. Then, the “MakeFeatureLayer” tool from the "Management" toolbox can be executed to create the feature layer.



Aggregate_Layer_Feature_Layer = "aggregateLayer"
arcpy.management.MakeFeatureLayer(in_features=aggregateLayer, out_layer=Aggregate_Layer_Feature_Layer)

Figure 7. Transform the Aggregate Layer into a Feature Layer

At the end of the workflow, the table must be joined back to the larger spatial layer. To facilitate this, a field that matches the join field in the smaller spatial layer needs to be added to the larger layer. This can be accomplished using the “Management.Add_Field” command.



Aggregate_Layer_Feature_Layer_JField = arcpy.management.AddField(in_table=Aggregate_Layer_Feature_Layer, field_name=blockJoinField, field_type="TEXT", field_is_required="NON_REQUIRED")[0] 

Figure 8. Add a Field to the Aggregate Layer to Enable Joining it to the Block Layer

An area field representing the area of each unit on the smaller spatial layer must be created.



Block_with_Area_Field = arcpy.management.AddField(in_table=blockLayer, field_name="Area", field_type="DOUBLE", field_is_required="NON_REQUIRED")[0]

Figure 9. Add a Field Representing the Current Area in Square Miles to the Block Layer

The area field is then calculated using the Calculate Geometry tool. In Python, this can be done with the expression !shape.area@squaremiles!.



Block_Filled_Area_ = arcpy.management.CalculateField(in_table=Block_with_Area_Field, field="Area", expression="!shape.area@squaremiles!", expression_type="PYTHON_9.3")[0]

Figure 10. Calculate the Area of the Block Layer in Square Miles

The two spatial layers can now be intersected using the “Analysis.Intersect” tool from ArcPy. Since this operation will create a new feature layer, a system path must be assigned to the resulting file. The existing "workSpace" variable can be concatenated with "\\" and the string "Block_Intersect" to define the name of the new layer.



Aggregate_Block_Intersect_ = workSpace + "\\" + "Block_Intersect" 
arcpy.analysis.Intersect(in_features=[[Block_Filled_Area_, ""], [Aggregate_Layer_Feature_Layer_JField, ""]], out_feature_class=Aggregate_Block_Intersect_)

Figure 11. Intersect the Block Layer and Aggregate Layer

A field representing the area of the intersected spatial layer is added.



Aggregate_Block_Intersect_Empty_New_Area_ = arcpy.management.AddField(in_table=Aggregate_Block_Intersect_, field_name="New_Area", field_type="DOUBLE")[0]

Figure 12. Add a Field Representing the Intersected Area in Square Miles to the Intersect Layer

The geometry of the added field is calculated.



Aggregate_Block_Intersect_Populated_Area_ = arcpy.management.CalculateField(in_table=Aggregate_Block_Intersect_Empty_New_Area_, field="New_Area", expression="!shape.area@squaremiles!", expression_type="PYTHON_9.3")[0]

Figure 13. Calculate the Area of the Intersect Layer in Square Miles

Another field is added to store the areal proportion population.



Aggregate_Block_Intersect_Empty_New_Pop_ = arcpy.management.AddField(in_table=Aggregate_Block_Intersect_Populated_Area_, field_name=populationField, field_type="LONG")[0]

Figure 14. Add a Field Representing the Areal Proportion Population in the Intersect Layer

The field is then calculated. To define the mathematical relationship in the field calculator, the expression is enclosed in quotes with "f" used to represent the formula. Visual Basic is used for simplicity in the calculation.



Aggregate_Block_Intersect_Filled_Population_ = arcpy.management.CalculateField(in_table=Aggregate_Block_Intersect_Empty_New_Pop_, field=populationField, expression=f"[{populationField}] * [New_Area] / [Area]", expression_type="VB")[0]

Figure 15. Calculate the Areal Proportion Population of the Intersect Layer

As the workflow nears completion, the summary statistics table must be created. A desktop path and a name must first be defined for the table, similar to the "Block_Intersect" feature layer created earlier. The "Statistics" tool, which is another analysis tool, is then used. To calculate the sum, "SUM" should be specified for the statistics field. The field by which the values will be grouped for summation is the “aggregateJoinField,” which is one of the user-defined inputs.



Aggregate_Population_Table = workSpace + "\\" + "AGG_TABLE" 
arcpy.analysis.Statistics(in_table=Aggregate_Block_Intersect_Filled_Population_, out_table=Aggregate_Population_Table, statistics_fields=[[populationField, "SUM"]], case_field=[aggregateJoinField])

Figure 16. Summarize the Areal Proportion Population of the Intersect Layer

Finally, the join between the larger spatial layer and the summary statistics table can be executed.



Aggregate_Population_Layer_ = arcpy.management.AddJoin(in_layer_or_view=Aggregate_Layer_Feature_Layer, in_field=blockJoinField, join_table=Aggregate_Population_Table, join_field=aggregateJoinField, join_type="KEEP_ALL")[0]

Figure 17. Join the Areal Proportion Population Table to the Aggregate Layer

The analytical workflow is now complete. However, using the SSUtilities package, general information about the tool can still be generated. This involves creating a small table that details the input spatial layers as well as the output. This can be accomplished by utilizing a few strings and lists.



header = "Areal Proportion Analysis"
# Report the input list of parameters first
row1 = [ "The Block Layer: ", blockLayer ]
row2 = [ "The Aggregate Layer: ", aggregateLayer  ]
row3 = [ "The Output Dbase File Name: ", Aggregate_Population_Table ]
total = [ row1, row2, row3 ]

Figure 18. An Export Message that Summarizes the User's Inputs

To save the joined larger spatial layer, export it to a shapefile or geodatabase. The entire script is depicted below and can be downloaded here.



##----------------------------------------------------------------------
## AggPopEstimator.py
## Created on: 5-14-2024
## Last Modified on: 5-15-2024
## Created by: Andrew Jones
## Usage: AggPopEstimator(blockLayer), AggPopEstimator(aggregateLayer),
## AggPopEstimator(populationField), AggPopEstimator(blockJoinField), AggPopEstimator(aggregateJoinField)
##              In order to use this script, the user needs to have a census block layer 
##              containing population data. The user must also have a larger layer that will 
##              act as the aggregation layer. 
##              Additionally, the user needs to identify the desired population field as 
##              well as the fields used to join the table to the aggregate layer. Typically 
##              this is the same field as the block field. Remember to setup the parameters or this tool will not function!
##              So far, this tool has successfully functioned using census blocks, block groups, tracts, and zip codes. 
## Description: This script is designed to automate an areal proportion analysis
##              and aggregate the population values of census blocks to larger geographic units. 
##              The results are stored in a dBase table which can then be joined to the larger geographic unit.
##              Areal proportion analysis assumes that the population within census blocks is evenly dispersed.
## Note: This tool may result in an error message if it is run more than once while the aggregate table 
##       is on the active ArcGIS Pro Table of Contents. Remove the aggregate table from the Table of Contents to avoid this.
##       Additionally, this tool was designed to work with census boundary files. Using it on other boundaries may create unexpected
##       results. 
## ----------------------------------------------------------------------
import arcpy
import SSUtilities
from sys import argv

# Set the environmental options 
from arcpy import env # Bring in the workspace
arcpy.env.overwriteOutput = True
arcpy.env.addOutputsToMap = True

# Script tool arguments (Input - Global variables)
blockLayer = arcpy.GetParameterAsText(0)
aggregateLayer = arcpy.GetParameterAsText(1)
populationField = arcpy.GetParameterAsText(2)
blockJoinField = arcpy.GetParameterAsText(3)
aggregateJoinField = arcpy.GetParameterAsText(4)

# Set the workspace of the feature layer to be dynamic
desc = arcpy.Describe(blockLayer)
workSpace = r"" + desc.path

# Make the feature layer for the aggregate layer
Aggregate_Layer_Feature_Layer = "aggregateLayer"
arcpy.management.MakeFeatureLayer(in_features=aggregateLayer, out_layer=Aggregate_Layer_Feature_Layer)

# Creata the join field small spatial layer in the larger spatial layer
Aggregate_Layer_Feature_Layer_JField = arcpy.management.AddField(in_table=Aggregate_Layer_Feature_Layer, field_name=blockJoinField, field_type="TEXT", field_is_required="NON_REQUIRED")[0] 

# Add the area field to the block field
Block_with_Area_Field = arcpy.management.AddField(in_table=blockLayer, field_name="Area", field_type="DOUBLE", field_is_required="NON_REQUIRED")[0]

# Calculate the added area field 
Block_Filled_Area_ = arcpy.management.CalculateField(in_table=Block_with_Area_Field, field="Area", expression="!shape.area@squaremiles!", expression_type="PYTHON_9.3")[0]

# Intersect the block and aggregate layers 
Aggregate_Block_Intersect_ = workSpace + "\\" + "Block_Intersect" 
arcpy.analysis.Intersect(in_features=[[Block_Filled_Area_, ""], [Aggregate_Layer_Feature_Layer, ""]], out_feature_class=Aggregate_Block_Intersect_)

# Add the new area field to the intersected layer
Aggregate_Block_Intersect_Empty_New_Area_ = arcpy.management.AddField(in_table=Aggregate_Block_Intersect_, field_name="New_Area", field_type="DOUBLE")[0]

# Calculate the new area field  
Aggregate_Block_Intersect_Populated_Area_ = arcpy.management.CalculateField(in_table=Aggregate_Block_Intersect_Empty_New_Area_, field="New_Area", expression="!shape.area@squaremiles!", expression_type="PYTHON_9.3")[0]

# Add the new population field to the intersected layer
Aggregate_Block_Intersect_Empty_New_Pop_ = arcpy.management.AddField(in_table=Aggregate_Block_Intersect_Populated_Area_, field_name=populationField, field_type="LONG")[0]

# Use Areal Proportion to calculate the new population 
Aggregate_Block_Intersect_Filled_Population_ = arcpy.management.CalculateField(in_table=Aggregate_Block_Intersect_Empty_New_Pop_, field=populationField, expression=f"[{populationField}] * [New_Area] / [Area]", expression_type="VB")[0]

# Run summary statistics to aggregate the population to the aggregate layer
Aggregate_Population_Table = workSpace + "\\" + "AGG_TABLE" 
arcpy.analysis.Statistics(in_table=Aggregate_Block_Intersect_Filled_Population_, out_table=Aggregate_Population_Table, statistics_fields=[[populationField, "SUM"]], case_field=[aggregateJoinField])

# Join the table to the aggregate layer
Aggregate_Population_Layer_ = arcpy.management.AddJoin(in_layer_or_view=Aggregate_Layer_Feature_Layer, in_field=blockJoinField, join_table=Aggregate_Population_Table, join_field=aggregateJoinField, join_type="KEEP_ALL")[0] 

# Create Output Text Table
# So the input and output can be reported in the tool report window
header = "Create Unique Value Dbase file"
# Report the input list of parameters first
row1 = [ "The Block Layer: ", blockLayer ]
row2 = [ "The Aggregate Layer: ", aggregateLayer  ]
row3 = [ "The Output Dbase File Name: ", Aggregate_Population_Table ]
total = [ row1, row2, row3 ]

# Construct a table so it can be reported in the tool result window
tableOut = SSUtilities.outputTextTable(total,header=header,pad=1)
arcpy.AddMessage(tableOut)


Figure 19. The Completed Script

To enable the script to function like a geoprocessing tool in ArcGIS Pro, several parameters must be defined under the script’s property settings.


  1. In ArcGIS Pro, create a new toolbox in the Catalog pane.
  2. Right-click the toolset and select New > Script.
  3. Copy the Python code from IDLE and paste it into the newly created script.
  4. Save the script.

Once the script is saved, the necessary parameters can be defined through the script's properties to allow for input and output handling, ensuring it functions as a geoprocessing tool within ArcGIS Pro.


Setting up the Script Parameters


Each variable that uses the "arcpy.GetParameterAsText" function must be defined for the script to be fully functional. For both spatial layers, the input parameters must be assigned the "Feature Layer" data type. The remaining three parameters are field inputs, which should be defined as appropriate field types (Figure 20).


Figure 20. The Parameter Settings for the Python Script inside ArcGIS Pro

Some Cartographic Products Created with Areal Proportion Analysis


The most obvious example of the use of areal proportion analysis is working with census geographies smaller than the county level. These applications are detailed in my Site Selection project and Census Data tutorial.


This python script was particularly useful in the Site Selection project. In it, I had to use areal proportion multiple times: three times to create the population change map on the census block group level (2000 – 2010, 2010 – 2020, 2000 – 2020). In doing so, I was able to estimate the population change in Warren County, KY for my Site Selection project. Areas with positive growth were weighed higher as site selection criteria (Figure 21).


Figure 21. Estimated Population Change between 2000 and 2020 in Warren County

In the same Site Selection project, I applied service area analysis multiple times for each candidate fire station. To view the population within each distance break, areal proportion analysis was used repeatedly (more than 5 times) to estimate census block values aggregated to the distance bands. Simply put, this is another example of why automating repetitive geospatial workflows is so valuable.


List of Figures


Figure 1. A Visual Depiction of Areal Proportion Analysis

Figure 2. A Modelbuilder depiction of the Areal Proportion Analysis Workflow

Figure 3. Import Additional Python Modules

Figure 4. Establish the Environmental Settings for the Workspace

Figure 5. Establish User-Defined Parameters

Figure 6. Automatically Extract the File Location of the Input Block Layer

Figure 7. Transform the Aggregate Layer into a Feature Layer

Figure 8. Add a Field to the Aggregate Layer to Enable Joining it to the Block Layer

Figure 9. Add a Field Representing the Current Area in Square Miles to the Block Layer

Figure 10. Calculate the Area of the Block Layer in Square Miles

Figure 11. Intersect the Block Layer and Aggregate Layer

Figure 12. Add a Field Representing the Intersected Area in Square Miles to the Intersect Layer

Figure 13. Calculate the Area of the Intersect Layer in Square Miles

Figure 14. Add a Field Representing the Areal Proportion Population in the Intersect Layer

Figure 15. Calculate the Areal Proportion Population of the Intersect Layer

Figure 16. Summarize the Areal Proportion Population of the Intersect Layer

Figure 17. Join the Areal Proportion Population Table to the Aggregate Layer

Figure 18. An Export Message that Summarizes the User's Inputs

Figure 19. The Completed Script

Figure 20. The Parameter Settings for the Python Script inside ArcGIS Pro

Figure 21. Estimated Population Change between 2000 and 2020 in Warren County

References


What is ArcPy?—ArcGIS Pro | Documentation. (n.d.). https://pro.arcgis.com/en/pro-app/latest/arcpy/get-started/what-is-arcpy-.htm