Category Archives: General

Spatially weighted averages in R with sf

Spatial joins allow to augment one spatial dataset with information from another spatial dataset by linking overlapping features. In this post I will provide an example showing how to augment a dataset containing school locations with socioeconomic data of their surrounding statistical region using R and the package sf (Pebesma 2018). This approach has the drawback that the surrounding statistical region doesn’t reflect the actual catchment area of the school. I will present an alternative approach where the overlaps of the schools’ catchment areas with the statistical regions allow to calculate the weighted average of the socioeconomic statistics. If we have no data about the actual catchment areas of the schools, we may resort to approximating these areas as circular regions or as Voronoi regions around schools.

Read More →

A tip for the impatient: Simple caching with Python pickle and decorators

During testing and development, it is sometimes necessary to rerun tasks that take quite a long time. One option is to drink coffee in the mean time, the other is to use caching, i.e. save once calculated results to disk and load them from there again when necessary. The Python module pickle is perfect for caching, since it allows to store and read whole Python objects with two simple functions. I already showed in another article that it’s very useful to store a fully trained POS tagger and load it again directly from disk without needing to retrain it, which saves a lot of time.

Read More →

About the WZB Data Science Blog

This blog collects some experiences from my daily work in the Data Science field of the WZB. The posts will focus around the following topics:

  • Data extraction / data mining
  • Data visualization
  • Data analysis

Read More →