Andrew Jones

Home   Blog   Projects   Tutorials   Map Gallery

New Urban Areas in Columbia, MO 2000-2024

Detecting Land Use Change with Google Earth Engine


Remote sensing has emerged as a powerful tool for monitoring land use change, providing valuable insights into environmental and urban dynamics. By utilizing satellite imagery and aerial photography, researchers can analyze variations in land cover over time, detecting shifts from natural landscapes to urban developments, agricultural expansion, or deforestation. This technology allows for the collection of large-scale data, enabling the assessment of changes in land use patterns across diverse regions. Moreover, remote sensing facilitates the identification of trends and the impacts of human activities on ecosystems, aiding in effective land management and policymaking. Through advanced analytical techniques, such as machine learning and image classification, remote sensing continues to enhance our understanding of the complex interplay between human development and environmental sustainability.


This project is focused on depicting urban development in Boone County, Missouri between 2000 and 2024 using remote sensing methods. Boone County, located in the center of Missouri, has experienced enormous population growth since the 1940s. Its county seat, Columbia, holds over two-thirds of its population. Alongside this growth comes the expansion of the City of Columbia and the construction of new subdivisions in both Columbia and Boone County. The final cartographic product will depict newly developed urban and suburban areas in Boone County.


Census Data and Municipal Boundaries


Since 1980, both Boone County and its seat, Columbia, have experienced significant population growth. Boone County’s population has nearly doubled, while the City of Columbia’s has more than doubled (Table 1). Since 2000, both the county and the city have grown by over 45,000 residents. Much of this growth is attributed to the development of new suburban areas.


Table 1. Boone County and Columbia Population Change since 1980
Year Boone County Population Change City of Columbia Population Change
1980 100,376 (24.1%) 62,061 (6.0%)
1990 112,379 (12.0%) 69,101 (11.0%)
2000 135,343 (20.5%) 84,531 (28.4%)
2010 162,642 (20.1%) 108,500 (28.4%)
2020 183,610 (12.9%) 126,654 (16.4%)
2023* 189,643 (3.28%) 129,330 (2.4%)

Municipal boundaries in Boone County were obtained from the US Census Bureau. These datasets are essential for tracking the growth of municipalities and for later preparing a land use classification map, which will serve as a reference to identify areas that transitioned to suburban development.


When combined with the 2000 and 2020 US Census TIGER boundaries, a simple illustration shows how Columbia expanded over this period. Notably, Columbia grew in all directions, and many of the smaller towns also experienced growth (Figure 1). This growth is further reflected in the strong population increase shown in Table 1.


Municipality Changes
Figure 1. Boone County Municipality Growth between 2000 and 2020

However, the nature of this expansion, particularly in terms of developed neighborhoods, is not easily discernible. Figure 1 only illustrates areas that have been incorporated into nearby municipalities -- it is also possible that new suburban areas exist outside the City of Columbia. This is where a land use classification map becomes particularly useful in identifying specific areas of development.


Preparing Data for a Supervised Land Use Classification Model


To create a land use classification map, a machine learning algorithm must be used to convert satellite imagery into a land use classification raster. This process involves reducing a complex dataset into a limited number of classes. To achieve this, either a supervised or unsupervised algorithm can be employed (Figure 2). For this study, supervised classification will be used to accurately characterize each classification scheme.


Supervised Unsupervised
Figure 2. Supervised vs Unsupervised Classification (Mishra, 2020)

In supervised classification, training data—hand-drawn or selected—provides representative samples for each land use class. For this study, the land use classes included "water," "urban," "treetop," "agriculture," and "pastoral." Typically, at least fifty samples per class are recommended, although more may be required in practice (Lillesand et al., 2004). In this study, 200 observations were collected for each class, resulting in 1000 total observations (Figure 3). These observations were placed on a 2004 orthophoto from the Missouri Spatial Data Information Service, given latitude and longitude values, and then exported to a spreadsheet.


Training Data
Figure 3. Boone County Image Classification Training Data

Since this project spans from 2000 to 2024, sourcing remote sensing imagery for these years is essential. However, the Missouri Spatial Data Information Service only offers orthophotos from 2004 and 2022. Additionally, attempting to classify large files like these on a personal computer would be time-consuming and inefficient. In this case, it is more effective to pursue a cloud-based solution. One particularly useful platform for collecting, analyzing, and disseminating remote sensing imagery is Google Earth Engine.


Google Earth Engine


Google Earth Engine (GEE) is a powerful cloud-based platform designed for analyzing and visualizing geospatial data. It provides access to a vast repository of satellite imagery, geospatial datasets, and other environmental data, enabling researchers, scientists, and developers to perform large-scale spatial analysis and monitoring. With tools for processing and analyzing data from sources like NASA, USGS, and various global satellite networks, GEE supports applications in fields such as environmental monitoring, disaster management, agriculture, and climate change. Its cloud infrastructure allows users to conduct complex analyses efficiently without needing local computational resources. Additionally, Google Earth Engine facilitates collaboration and sharing of data and results through its online interface and APIs.


Incorporating GEE into this workflow simplifies many potential challenges. Through GEE, a plethora of satellite images spanning a multitude of years are now easily accessible and available for analysis. This makes obtaining imagery much easier than attempting to search through specific local, statewide, and municipal sources. Moreover, GEE is capable of handling complex land use classification analyses, significantly reducing the time required for processing.


GEE Console
Figure 4. The Google Earth Engine Console

The caveat to the advantages of GEE is that it requires JavaScript to perform commands and operations. While this presents a more difficult learning curve for users, there are many example scripts that can be modified to perform many common remote sensing workflows. Additionally, scripts can easily be shared through the JavaScript code, making it easy to reproduce workflows and satellite imagery products.


Selecting a Satellite, Imagery, and Spectral Bands for Land Classification


For this project, Landsat satellite imagery will be used. Landsat is a joint NASA/USGS program which provides the longest continuous space-based record of Earth’s land in existence. Since the period of this study is over twenty years, suitable imagery can be found from Landsat5 and Landsat8. The imagery from 2000 is a composite between June 1st and September 1st, and the imagery from 2024 is a composite between April 1st and June 1st. There is no particular significance to these dates, they are simply the clearest overall composites selected from those years.


One key aspect of remote sensing analysis is selecting the appropriate spectral bands of a satellite. Spectral bands are specific sensors on a satellite that capture distinct electromagnetic wavelengths. The choice of spectral bands significantly affects the appearance of satellite imagery, with certain combinations being ideal for highlighting features such as urban sprawl, forest cover, water bodies, and other spatial phenomena. Since the 2000 imagery is from Landsat 5 and the 2024 imagery is from Landsat 8, different band combinations must be used for each satellite to achieve optimal results in urban classification. Figure 5 below shows imagery from 2000 and 2024 using a simple RGB color scheme: this corresponds to bands 3, 2, and 1 on Landsat 5 and bands 4, 3, and 2 on Landsat 8.


Landsat RGB
Figure 5. Simple RGB Landsat Imagery for 2000 and 2024

For the land classification algorithm to be effective, each of the land use types must appear distinct from one another. The traditional band combinations used to identify urban areas in Landsat imagery include bands (1, 4, 5) for Landsat 5 and bands (7, 6, 4) for Landsat 8. Both band sets were tested with the land use classification model, but ultimately, another set of bands created better results.


To identify unique features in Landsat 5 imagery, Gautam et al. (2017) suggested a different set of bands. For this study, the Landsat 5 imagery was classified using bands 4, 5, 6, which correspond to near infrared (0.76 - 0.90 μm), shortwave infrared 1 (1.55 - 1.75 μm), and thermal infrared 1 (10.40 - 12.50 μm). This combination of spectral bands renders imagery that depicts urban areas in purple, tree cover in brown, water in dark blue, agriculture in tan or green, and pasture in light green.


A similar focus on the infrared sensors was applied to the Landsat 8 imagery. The imagery was ultimately classified using bands 5, 6, 10, which correspond to near infrared (0.85 - 0.88 μm), shortwave infrared 1(1.57 - 1.65 μm), and thermal infrared 1 (10.60 - 11.19 μm). This helped mitigate the dark appearance of the Landsat 8 imagery. Figure 6 below depicts the altered imagery side-by-side below.


Landsat Special
Figure 6. Specific Landsat Imagery to Detect Urban Features for 2000 and 2024

Next, the remote sensing imagery can be classified using GEE.


Classifying Imagery in Google Earth Engine


The training data from Figure 3 and the spectral band combinations from Figure 6 can be used as parameters for the land classification algorithm. Additionally, a rectangular boundary can be drawn to limit the export results to areas around Boone County. The results of the algorithm are presented in Figure 7 below. The JavaScript code to replicate this workflow can be found at the bottom of this web page.


Raw Land Use
Figure 7. The Land Use Layers from Google Earth Engine

Visually, the land classification algorithm performed well in terms of identifying actual urban areas, water, and treetops. The difference between pastoral and agriculture was more discrete, so those two classifications are more comingled. One challenge with Columbia is the amount of tree cover in some old neighborhoods west of downtown – these neighborhoods may be classified as something other than urban due to the prolific tree cover present. Fortunately, since this study is looking at newly developed neighborhoods, this should not present too many issues.


While a simple visual analysis is helpful in quickly determining whether a land classification algorithm was successful, there are also proper methods of quantifying the algorithm’s quality. A confusion matrix can be used to evaluate the quality of the land use classification scheme with more statistical rigor.


The Confusion Matrix


To evaluate the effectiveness of the land use classification scheme, a confusion matrix was generated along with the imagery export. A confusion matrix compares actual values to predicted values, with the diagonal elements (from the top-left to the bottom-right) representing the correctly classified values, or true positives (Figure 8). The off-diagonal elements indicate errors, reflecting either false positives or false negatives.


Confusion Matrix
Figure 8. The Confusion Matrix

There are a few important terms to understand concerning confusion matrices. The overall accuracy is the number of false positives and negatives divided by the number of true positives and negatives. This measure provides a general assessment of the quality of a land classification algorithm. Additionally, there are the User Accuracy (Precision) and Prediction Accuracy (Recall). The user accuracy refers to the probability that a value predicted to be in a certain class is truly in that class. The prediction accuracy is the probability that a given value was classified correctly.


The confusion matrix for the 2000 land use classification can be found in Figure 9 below. The matrix images were created using an online confusion matrix calculator by Marco Vanetti (2007). The overall accuracy for the 2000 classification was 83.162%, with the highest accuracy being reported in the water, urban, and tree cover classes. These tend to be distinct, so it makes sense that the algorithm was able to identify them successfully. The agriculture and pasture classes faired worse, which was understandable given that these two are more difficult to separate from one another. In particular, agriculture classification is difficult due to the various appearances of different crops as well as harvest times.


2000 Land Use Matrix
Figure 9. 2000 Land Use Classification Confusion Matrix

The overall accuracy for the 2024 classification was 83.681%, which is roughly the same as in Figure 9 above. It is important to note that the same set of training data was used for both of these classifications, so it would have been surprising if Figure 10 had a significantly different overall accuracy. Again, the results for the confusion matrix were quite similar as well. Water, urban, and tree cover classification faired well, whereas agriculture and pasture were more difficult to correctly classify.


2024 Land Use Matrix
Figure 10. 2024 Land Use Classification Confusion Matrix

To evaluate the quality of the land classification categories, Cohen’s kappa coefficient can be used to consider the possibility of the land classification agreement occurring by chance (or the agreement between the classification and the truth values). The coefficient takes a value between -1 and 1, where:


In this case, the confusion matrices both reported a kappa value of around 0.79, indicating that there is substantial agreement between the classification and the truth values.


Refining the Land Use Rasters on ArcGIS Pro


A few tools in ArcGIS Pro are helpful in cleaning up the land use classification layers. By applying a majority filter and boundary clean, the land use layers have fewer single erroneous pixels and the boundaries between classification types are smoothed and generalized.

The final result for the 2000 land use classification is presented below in Figure 11. Visually, it has performed well at reflecting the different land cover types.


2000 Land Use
Figure 11. Boone County 2000 Land Use

Similarly, in Figure 12 below, the 2024 land use classification is presented. In comparison to the previous figure, the urban areas have notably expanded, particularly in Ashland, Centralia, Columbia, Hallsville, and Sturges. Fewer tree cover is present: this may be due to increased development, but it is also possible that there was less vegetative growth due to the choice of satellite imagery from Spring rather than Summer.


2000 Land Use
Figure 12. Boone County 2024 Land Use

With the land classification layers prepared, it is now possible to use raster tools to reclassify the land use layers as "urban" or "not urban" and then calculate the difference in the urban class from 2000 to 2024. The remaining layer will represent new urban areas that developed between 2000 and 2024.


Final Products


Having prepared the land use layers, urban growth in Columbia is depicted in Figure 13. As discussed earlier by Figure 1, Columbia and other municipalities expand in all directions. The neighborhoods that experienced the most development by land area include Hominy Branch, Kings Meadow, Mexico Gravel, Stonecrest, Thornbrook, and Vanderveen Crossing. Note that this map is simplified rendition, as it only displays areas over 60,000 sqaure feet. (update update)


2000 Land Use
Figure 13. Urbanization in Boone County since 2000

An interactive Leaflet web map below displays the final results of the project (Figure 14). The red polygons denote areas that have experienced urbanization since 2000. Note that while polygons outside of Boone County were removed from the layer, some areas that suggest urbanization are actually false positives from the land classification algorithm earlier.



Figure 14. Urbanization in Boone County since 2000 -- Interactive Leaflet Web Map

Discussion and Some Final Thoughts


This project represented a common workflow in remote sensing analysis, namely detecting urban growth with remote sensing imagery, land classification analysis, and raster analysis.


Generally, the land classification algorithm performed well at identifying areas that developed since 2000. These were clearly delineated in the Leaflet web map. There was some degree of error as highlighted in the confusion matrix section, though these errors generally did not detract from accurately detecting areas of urban growth.


There remains the question of whether the land use classification in this project could be considered of good quality. Olson (2008) contends that while modern users are comfortable with an 80% overall accuracy rate in classified imagery, the traditional standard has been an 85% overall accuracy rate. Both confusion matrices revealed an accuracy rate of around 83%, which indicates that, while the models were good in identifying new urban growth, a further refinement of training data might assist in removing false hits and better capturing urban areas.


This project stemmed from my interest in urban geography and my time living in Columbia, MO, when I worked at the Missouri House of Representatives. Initially, this project was going to be focused on Bowling Green, KY, but since I had already done the site selection and network analysis project on Bowling Green, I thought it would be better to use a different location. I never had the opportunity to take a full remote sensing class at Western Kentucky University, so it was quite a learning experience to go through and prepare this material.


As with all projects on this site, further mapping and graphical enhancements may be periodically added.

GeoJSON Links

Image Classification Training Data

Boone County Urban Growth Areas

Links to Google Earth Engine JavaScript


2000 Land Use Classification

2024 Land Use Classification

List of Figures and Tables


Figure 1. Boone County Municipality Growth between 2000 and 2020

Figure 2. Supervised vs Unsupervised Classification

Figure 3. Boone County Image Classification Training Data

Figure 4. The Google Earth Engine Console

Figure 5. Simple RGB Landsat Imagery for 2000 and 2024

Figure 6. Specific Landsat Imagery to Detect Urban Features for 2000 and 2024

Figure 7. The Land Use Layers from Google Earth Engine

Figure 8. The Confusion Matrix

Figure 9. 2000 Land Use Classification Confusion Matrix

Figure 10. 2024 Land Use Classification Confusion Matrix

Figure 11. Boone County 2000 Land Use

Figure 12. Boone County 2024 Land Use

Figure 13. Urbanization in Boone County since 2000

Figure 13. Urbanization in Boone County since 2000 -- Interactive Leaflet Web Map

Table 1. Boone County and Columbia Population Change since 1980

References

How to interpret a confusion matrix for a machine learning model. (2025). https://www.evidentlyai.com/classification-metrics/confusion-matrix

Marco Vanetti. Confusion matrix online calculator. (2007). https://marcovanetti.com/pages/cfmatrix/?noc=5

Denise Nedea. Confusion Matrix Calculator. (2020). MDApp. https://www.mdapp.co/confusion-matrix-calculator-406/

Anonym. (2023). The many band combinations of Landsat 8. NV5 Geospatial. https://www.nv5geospatialsoftware.com/Learn/Blogs/Blog-Details/the-many-band-combinations-of-landsat-8

Gautum, V., Murugan, P., & Annadurai, M. (2017). A New Three Band Index for Identifying Urban Areas using Satellite Images. Global Civil Engineering Challanges in Sustainable Development and Climate Change[ICGCSC-17]. https://www.researchgate.net/publication/315447944_A_New_Three_Band_Index_for_Identifying_Urban_Areas_using_Satellite_Images

Kevin Butler. (2019). Band Combinations for Landsat 8. ArcGIS Blog. https://www.esri.com/arcgis-blog/products/product/imagery/band-combinations-for-landsat-8/

Lillesand, T.M., Kiefer, R.W. and Chipman, J.W. (2004). Remote Sensing and Image Interpretation. 5th Edition, John Wiley, New York.

Mishra, S. S. (2020). GETTING TO KNOW ABOUT IMAGE CLASSIFICATION - ( PART 2 ) => Discussing about methods and types of image classification. https://www.linkedin.com/pulse/getting-know-image-classification-part-2-discussing-mishra

E Olson, C., Jr. (2008). Is 80% Accuracy Good Enough? In Asprs. The Future of Land Imaging . . . Going Operational, Denver, Mountain, United States of America. https://www.asprs.org/a/publications/proceedings/pecora17/0026.pdf

T-Test, Chi-Square, ANOVA, Regression, Correlation. . . (2025). https://datatab.net/tutorial/cohens-kappa

Common landsat Band Combinations. (2021). USGS. https://www.usgs.gov/media/images/common-landsat-band-combinations