OpenStreetMap logo OpenStreetMap

Jennings Anderson's Diary

Recent diary entries

In 2018, researchers Daniel Bégin, Rodolphe Devillers, and Stéphane Roche published a paper titled, The life cycle of contributors in collaborative online communities - the case of OpenStreetMap. A key takeaway from this paper was this density plot of a contributor’s first and last edit:

Contributor Lifecycles from Bégin et al.

Plotted this way, we see temporal trends emerge as vertical or horizontal lines describing when many users started or stopped mapping (vertical or horizontal lines). The paper also published this table to describe the events in OSM history that were being captured:

See full entry

Location: Grant, Salem, Marion County, Oregon, 97311, United States

Maximum number of hours spent editing OSM on any day by a single user

Figure 1: The maximum number of hours spent editing OSM in a single day by any user, depending on the total number of days they have ever mapped.


A curious question was raised at SotM this past weekend in the discussion following a talk on developing an Automated approach to identifying corporate editing activity. In that work, Veniamin Veselovsky has found an ingenious method of time-shifting a mapper’s editing pattern to determine a remote-mappers local timezone. Then, doing this for many mappers, he is able to determine specific temporal signatures that have proven helpful in training models to identify paid editors in OSM. Temporal signatures are very powerful in OSM analysis, I used them to characterize editing in North America and found that Amazon’s temporal mapping signature is off-sync from local US mappers because they are primarily mapping during business hours in SE Asia.

See full entry

Location: Last Chance Gulch, Helena, Lewis and Clark County, Montana, 59601, United States

State of the States 2020 - Mapping USA Talk

Posted by Jennings Anderson on 21 May 2021 in English. Last updated on 24 May 2021.

Here are the slides from my Mapping USA Talk:

Cartogram

Cartogram of total edits per state in 2020. Color represents total edits per sq. km. – in equal sized bins (QGIS calculated quantiles) from purple to green to yellow (Less insightful, but more aesthetic)

See full entry

Location: Last Chance Gulch, Helena, Lewis and Clark County, Montana, 59601, United States

Recent articles and blog posts about paid editing in OSM has renewed interest in the topic on social media and OSM discussion channels. The data and numbers presented in these discussions primarily come from a paper I co-authored in 2019, and are now outdated. This diary post presents new, updated figures.

Paid editing in OSM is receiving new attention in light of two articles in the past few months that are reporting on the phenomenon. Both articles heavily cite numbers from our 2019 Corporate Editing in the Evolving Landscape of OpenStreetMap paper:

These articles have prompted some discussion on Twitter from the larger OSM Community. What’s missing in these follow-up threads, however, are updated figures regarding the editing over the past two years.

This post only presents updated figures relevant to paid-editing in OSM and observational analysis. As the OSM research community continues to expand, stay tuned for more in-depth research in this space, such as: novel ways to identify undisclosed paid-editors and occupational mappers, new community-detection algorithms from editing editing patterns, and further investigations of the mapping interactions between paid and unpaid editors. At the end of this diary post I include a glossary with some of the terms in both this and previous posts of mine such as paid editing, professional editing, and occupational editing. These terms are becoming more common in this research space, so I am hoping to better introduce and define them.

Quantifying Edits

See full entry

Location: Last Chance Gulch, Helena, Lewis and Clark County, Montana, 59601, United States

Cartogram

Cartogram showing the number of OSMF survey responses per country

Per Country Editing Activity Since 2020

The OSMF just released the results of the 2021 Community Survey. To normalize the survey results by country, I computed per-country editing counts going back to January 1 2020. This post shares these results, if you’d like to see the analysis notebook and the queries used to do this, you see that here.

Identifying a Mapper’s Home Country

See full entry

Location: Last Chance Gulch, Helena, Lewis and Clark County, Montana, 59601, United States

Did you know that OSM data is available as an open dataset on Amazon Web Services? Updated weekly, the files are transcoded into the .orc format which can be easily queried by Amazon Athena (PrestoDB). These files live on S3 and anyone can create a database table that reads from these files, meaning no need to download or parse any OSM data, that part is done!

In this post, I will walk through a few example queries of the OSM changeset history using Amazon Athena.

For a more complete overview of the capabilities of Athena + OSM, see this blog post by Seth Fitzsimmons. Here I will only cover querying the changeset data.

1. Create The Changeset Table

From the AWS Athena console, ensure you are in the N. Virginia Region. Then, submit the following query to build the changesets table:

CREATE EXTERNAL TABLE changesets (
    id BIGINT,
    tags MAP<STRING,STRING>,
    created_at TIMESTAMP,
    open BOOLEAN,
    closed_at TIMESTAMP,
    comments_count BIGINT,
    min_lat DECIMAL(9,7),
    max_lat DECIMAL(9,7),
    min_lon DECIMAL(10,7),
    max_lon DECIMAL(10,7),
    num_changes BIGINT,
    uid BIGINT,
    user STRING
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/changesets/';

This query creates the changeset table, reading data from the public dataset stored on S3.

2. Example Query

To get started, let’s explore a few annually aggregated editing statistics. You can copy and paste this query directly into the Athena console:

SELECT YEAR(created_at) as year, 
      COUNT(id)        AS changesets,
      SUM(num_changes) AS total_edits,
      COUNT(DISTINCT(uid)) AS total_mappers
FROM changesets 
WHERE created_at > date '2015-01-01'
GROUP BY YEAR(created_at)
ORDER BY YEAR(created_at) DESC

I will break down this query line-by-line:

See full entry

Location: Last Chance Gulch, Helena, Lewis and Clark County, Montana, 59601, United States

OSMUS Community Chronicles

Posted by Jennings Anderson on 30 October 2020 in English. Last updated on 3 November 2020.

Exploring the growth and temporal mapping patterns in OSM in North America

The following figures are from my OSMUS Connect 2020 Talk. Additionally, I’ve included the relevant queries to reproduce these datasets from the OSM public dataset on AWS (See this blog post). For this work, I used a bounding box that encompasses North America.

Starting with the big picture…

This year we are averaging about 900 active mappers each day, with significant growth in the past few years:

Number of Daily Active Mappers

SELECT 
    DATE_TRUNC('day',created_at) as day,
    COUNT(DISTINCT(uid)) as user_count,
FROM changesets
WHERE min_lat >  13.0 AND max_lat <  80.0 AND min_lon > -169.1 AND max_lon < -52.2
GROUP BY DATE_TRUNC('day',created_at)
How did we get here?

This next graph quantifies a mapper’s first edit in North America by month. For example, in August 2009, 1,700 contributors edited in North America for the first time. In January 2017, close to 7,000 contributors edited in North America for the first time.

See full entry

Location: Last Chance Gulch, Helena, Lewis and Clark County, Montana, 59601, United States

HOT Summit & State of the Map 2019

Posted by Jennings Anderson on 26 September 2019 in English. Last updated on 22 September 2020.

This past week, the 2019 HOT Summit was followed by State of the Map in Heidelberg, Germany. First, a big thank you and congratulations on a job well done to all of the organizing committee and folks in Heidelberg that made these events possible!

I had the opportunity to both lead a workshop at the HOT Summit on Thursday and participate in the academic track at State of the Map on Sunday. I’m writing this post to share a few resources and results from these talks, compiled all in one place.

1. HOT Workshop: Hands On Experience Extracting Meaningful OSM Data by Using Amazon Athena with AWS Public Datasets

This workshop was designed to show the analytical power of Amazon Athena with a large dataset like OSM. The workshop description was as follows:

Learn how to use Amazon Athena with AWS Public Datasets to query large amounts of OSM data and extract meaningful results. We will explore the maintenance behavior of contributors after HOT mapping activations and learn how the map gets maintained, what happens after validation, if the data grows stale, and if a local community emerges. This 200 level workshop is hands on and requires familiarity with SQL. Familiarity with data science tools such as Python and Jupyter Notebooks is helpful, but not required. Sample code will be made available at the state that participants can modify and ask their own questions of the data.

Grace Kitzmiller (AWS) & Jennings Anderson (University of Colorado Boulder)

The workshop included 10 prepared Jupyter Notebooks that contained all of the code to parse the results of an Athena query and generate a number of graphs and maps, such as the following graph which shows the cumulative number of users who have edited in Tacloban, Philippines.

See full entry

Location: Neuenheimer Feld, Neuenheim, Heidelberg, Baden-Württemberg, 69120, Germany

At State of the Map US a few weeks ago in Minneapolis, Minnesota, Seth and I presented a session titled:

PostCards from the Edge: A Tour of OSM Data Analyses + Visualizations

The recording and description of the presentation is available here.

Our goal was to curate a collection of OSM data visualizations from over the years that tell the story of OSM’s evolution, both as a map and a community, as well as highlight a few innovative data visualizations that show new ways to interact with OSM data to learn more about an area of the map.

We produced this spreadsheet (same as the table below) with links and author information for each of the visualizations that we showed and discussed in the talk. Since many of them are interactive, we chose to link to the original source:

See full entry

State of the Map US 2018: OpenStreetMap Data Analysis Workshop

Posted by Jennings Anderson on 5 December 2018 in English. Last updated on 10 December 2018.

(This is a description of a workshop Seth Fitzsimmons and I put on at State of the Map US 2018 in Detroit, Michigan. Cross-posting from this repository)

Workshop: October 2018

Workshop Abstract

With an overflowing Birds-of-a-Feather session on “OSM Data Analysis” the past few years at State of the Map US, we’d like to leave the nest as a flock. Many SotM-US attendees build and maintain various OSM data analysis systems, many of which have been and will be presented in independent sessions. Further, better analysis systems have yet to be built, and OSM analysis discussions often end with what is left to be built and how it can be done collaboratively. Our goal is to bring the data-analysis back into the discussion through an interactive workshop. Utilizing web-based interactive computation notebooks such as Zeppelin and Jupyter, we will step through the computation and visualization of various OpenStreetMap metrics.

tl;dr:

We skip the messy data-wrangling parts of OSM data analysis by pre-processing a number of datasets with osm-wayback and osmesa. This creates a series of CSV files with editing histories for a variety of US cities which workshop participants can immediately load into example analysis notebooks to quickly visualize OSM edits without ever having to touch raw OSM data.

1. Background

OpenStreetMap is more than an open map of the world: it is the cumulative product of billions of edits by nearly 1M active contributors (and another 4M registered users). Each object on the map can be edited multiple times. Each time the major attributes of an object are changed in OSM, the version number is incremented. To get a general idea of how many major changes exist in the current map, we can count the version numbers for every object in the latest osm-qa-tiles. This isn’t every single object in OSM, but includes nearly all roads, POIs, and buildings.

See full entry

Location: Goss-Grove, Boulder, Boulder County, Colorado, 80309, United States

Watching the Map Grow: State of the Map US Presentation

Posted by Jennings Anderson on 27 November 2017 in English. Last updated on 28 November 2017.

SOTMUS Logo

At State of the Map US last month, I presented my latest OSM analysis work. This is work done in collaboration between the University of Colorado Boulder and Mapbox. You can watch the whole presentation here or read on for a summary followed by extra details on the methods with some code examples.

OpenStreetMap is Constantly Improving

At the root of this work is the notion that OSM is constantly growing. This makes OSM uniquely different from other comparable sources of geographic information. To this extent, static assessments of quality notions such as completeness or accuracy are limited. For a more wholistic perspective of the constantly evolving project, this work focuses on the growth of the map over time.

Intrinsic Data Quality Assessment

See full entry

Location: Goss-Grove, Boulder, Boulder County, Colorado, 80309, United States

How many contributors are active in each Country?

I recently put together this visualization of users editing per Country with along with some other basic statistics. This analysis is done with tile-reduce and osm-qa-tiles. I’m sharing my code and the procedure here.

Users by Country

This interactive map depitcs the number of contributors editing in each Country. The Country geometries are in a fill-extrusion layer, allowing for 3D interaction. Both the heights of the Countries and the color scale in relation to the number of editors. Additional Country-level statistics such as number of buildings and kilometers of roads are also computed.

Procedure

These numbers are all calculated with OSM-QA-Tiles and tile-reduce. I started with the current planet tiles and used this Countries geojson file for the Country geometries to act as boundaries.

Starting tile reduce:

See full entry

Location: Goss-Grove, Boulder, Boulder County, Colorado, 80309, United States

OSM Contributor Analysis - Entry 2: Annual Summaries of User Edits

Posted by Jennings Anderson on 6 July 2016 in English. Last updated on 7 July 2016.

Over the past two weeks I have been trying out some new methods to uncover user focus on the map. Investigating this idea of user focus includes questions like:

  • Are there areas where a specific user edits more frequently or regularly?
  • Are there multiple contributors who focus on the same areas?
  • Do these activities correlate to “map gardening”?

To answer these questions, I’ve put together an interactive map, similar to How Did You Contribute to OSM by Pascal Neis , but with the addition of being able to compare multiple users through the years.

Check it out Here: OSM Annual User Summary Map

Please Note: Requires recent versions of Google Chrome (recommended) or Firefox (>=35).

How does it work?

Using the annual snapshots osm-qa tiles, I have calculated the following statistics for each user’s visible edits at the end of each year on a per-tile basis:

  • of total edits

  • of buildings

  • of amenities

  • kilometers of roads

See full entry

Location: Goss-Grove, Boulder, Boulder County, Colorado, 80309, United States

OpenStreetMap Data Analysis: Entry 1

Posted by Jennings Anderson on 20 June 2016 in English. Last updated on 29 June 2016.

Howdy OpenStreetMap, I am excited to share that I am working as a Research Fellow with Mapbox this summer! As a research fellow, I am looking to better understand contributions to OSM.

For my first project, I have been using the tile-reduce framework to summarize per-tile visible edits from the Historical OSM-QA-Tiles. These historical tiles are a snapshot of what the map looked like at the time listed on the link.

With this annual resolution, we can visualize the edits (those edits that were visible at the end of that year) that happened on each tile. So far, I’ve summarized them as a) number of editors, b) number of objects, and c) recency of the latest edit (relative to that year).

The OSM-QA-Tiles are all generated at Zoom level 12, which separates the world into 5Million+ tiles. Some tiles have few objects while others have ten-thousand plus.

So far I have created two interactive maps to investigate OpenStreetMap editing behavior at this tile-level analysis:

1. Editor Density (Number of editors active on a tile)

### 2. Edit Recency (Time since last edit on the tile)

Editor Density

This map highlights tiles where multiple editors have been active. The most active editors in most cases are automated bots, especially in the more recent years. For best results, moving the slider in the bottom left for Minimum Users Per Tile to 2 or 3 will exclude most of these automated edits.

Examples

See full entry

Location: Logan Circle/Shaw, Ward 2, Washington, District of Columbia, United States