On October 12, 2018, the New York Times published an article on a remarkable achievement from Microsoft, “A Map of Every Building In America.” The map provides insight on the structure and form of our impact on the environment, as well as the influence of history, geology, and culture on the manmade landscape. In addition to a map of all the buildings across the nation, the article includes an interactive map that allowed readers to enter any zip code or city to explore their local area (see the Charlottesville view above).
The article highlighted Microsoft’s recent research to automatically create an open source database with over 150 million building footprints in the United States. This research pushed the limits of technology by leveraging machine learning and cloud-based computing. It achieved a high level of accuracy noting that their “… metrics show that in the vast majority of cases the quality is at least as good as data hand digitized buildings in OpenStreetMap. It is not perfect, particularly in dense urban areas but it is still awesome.”
Has the problem of mapping building footprints been solved? To understand what Microsoft’s effort means for Charlottesville, it is possible to compare the Microsoft results with Charlottesville Open Data’s Existing Structure Layer. This layer contains all the addressable structures within the City of Charlottesville.
Microsoft’s building footprint detection model makes a number of assumptions about buildings. Among other criteria, they need to have edges of at least three meters and the corners are generally 90 degrees. These assumptions lead to successful extractions in suburban neighborhoods and with isolated buildings, as seen in the image below. The orientation of most buildings is very close, but the Charlottesville data captures more detail in the edges of the buildings.
As Microsoft readily admits, their algorithm performs less well in urbanized areas. In the image below, Microsoft captures the east end of the Downtown Mall as a single building, fusing both sides of the street as well as the alley’s between buildings.
The City of Charlottesville collects detailed building outlines as part of its current business processes, so it is unlikely that the Microsoft buildings would replace the current data. That said, it is interesting to compare the datasets to see where they differ. Additional insights may be obtained by examining the differences, which may indicate errors or alternative, but explainable differences. The image below shows specific differences between the two datasets, where Microsoft contains a series of long, thin structures (yellow) not found in the Charlottesville data, and the City of Charlottesville has three new buildings (green) not found in the Microsoft data.
Some of these differences may be due to dissimilar data collection specifications, i.e., Microsoft collects all building footprints visible in imagery, while the Charlottesville data represents addressable buildings, i.e., buildings with street addresses, not all buildings. It is not possible to determine whether a building is addressable or not from imagery.
Other differences may relate to the date of imagery used for data collection. There is no single dataset of national imagery collected at the same date and time. The City of Charlottesville may have had access to more current imagery or other sources of data when mapping the three buildings, as they are prominent and it appears they should have been collected by Microsoft.
For the City of Charlottesville, Microsoft’s building footprints offer a readily accessible and interesting source of data for evaluating the city’s structure data. While lacking the detail of the city’s footprints, the Microsoft data may be used to confirm existing structures and identify potential missing structures. For counties and towns lacking Charlottesville’s resources, the national Microsoft data could be used as an initial building footprint dataset, greatly reducing the costs of creating a dataset from scratch. In any event, Microsoft’s accomplishment in automatically mapping building footprints is a remarkable technical achievement, both in its extraction capabilities and scaling to a national level.
If you would like to compare the Microsoft data with the City of Charlottesville’s data for your neighborhood, use this webtool I created.