Author Archives: Markus Konrad

Linkdump #136

R
Python
Other interesting articles, projects and news

Robust web scraping or web API based data collection

There are thousands of articles on the web about web scraping and accessing web APIs. Most of them show you how to extract information from specific elements on a web page or how to communicate with a specific API in order to collect data. For smaller data collection projects, this knowledge may be sufficient, but large scale data collection which must run reliably over days or even weeks brings up additional problems that mainly focus on the robustness of the data collection process. I will try to tackle some of these problems in this post. I will use examples in Python, but the basic concepts can easily be translated to R or other programming languages.

Read More →

Linkdump #135

R
Python
Other interesting articles, projects and news

Spiegel Online news topics and COVID-19 – a topic modeling approach

I created a project to showcase topic modeling with the tmtoolkit Python package: I use a corpus of articles from the German online news website Spiegel Online (SPON) to create a topic model for before and during the COVID-19 pandemic. This topic model is then used to analyze the volume of media coverage regarding the pandemic and how it changed over time.

National daily infection numbers clearly drive the volume of media coverage on COVID-19 during the observation period (January 2020 to end of August 2020) on SPON, which is probably not very surprising. Even though infection rates increased dramatically in the world in summer 2020 (e.g. in Brazil, India and USA), media coverage first decreased and then stayed at a moderate level, indicating that SPON doesn’t respond so much to rising infection rates at an international level.

You can have a look at the report here. All scripts are available in the GitHub repository.

Linkdump #134

R
Python
Other interesting articles, projects and news

Linkdump #133

R
Python
Other interesting articles, projects and news

Linkdump #132

R
Python
Other interesting articles, projects and news

Linkdump #131

R
Python
Other interesting articles, projects and news

Linkdump #130

R
Python
Other interesting articles, projects and news

Using Google Places data to analyze changes in mobility during the COVID-19 pandemic

During the COVID-19 pandemic, it’s apparent that location data gathered by private IT companies and telcos is a primary source for many studies about the effect of mobility restrictions on people’s behaviors and movements. In this blog post, I’d like to have a look at the “popular times” data provided by Google Places. I explain the limitations of this data, show how to gather it and provide some results from data that I fetched during March and April.

Read More →