Category Archives: Apis

Some thoughts about the use of cloud services and web APIs in social science research

In the recent weeks I’ve collaborated on the online book APIs for social scientists and added two chapters: a chapter about the genderize.io API and a chapter about the GitHub API. The book seeks to provide an overview about web or cloud services and their APIs that might be useful for social scientists and covers a wide range from text translation to accessing social media APIs complete with code examples in R. By harnessing the GitHub workflow model, the book itself is also a nice example of fruitful collaboration via work organization methods that were initially developed in the open source software community.

While working on the two chapters and playing around with the APIs, I once again noticed the double-edged nature of using web APIs in research. It can greatly improve research or even enable research that was not possible before. At the same time, data collected from these APIs can inject bias and the use of these APIs may cause issues with research transparency and replicability. I noted some of these issues in the respective book chapters and I’ve written about them before,[1]See this article in WZB Mitteilungen (only in German) together with Jonas Wiedner. but the two APIs that I covered for the book provide some very practical examples of the main issues when working with web APIs and I wanted to point them out in this blog post.

Read More →

Batch transfer GitLab projects with the GitLab API

This is a bit off-topic to be filed under DevOps / workflow automation but I still wanted to share it: We use GitLab at the WZB for collaborative software development and project management and I recently had to transfer all my GitLab projects to a GitLab group.[1]In case you don’t know GitLab: It’s similar to GitHub but open-source and you can install your own instance on your server so that all your data stays within your organization’s IT … Continue reading Since transferring a personal project to a group is not something that is done regularly, it’s quite hidden in the project settings and involves a lot of steps. Transferring a project manually with the GitLab web interface means visiting the project page, navigating to the “transfer project” pane in its advanced settings, selecting the group, clicking “Transfer group” and typing a confirmation string. Nobody want’s to do this manually with more than a handful of projects. Luckily GitLab comes with it’s own, well-documented REST API which can save us a lot of time by letting us automating such tedious tasks.

Read More →

Footnotes

Footnotes
1 In case you don’t know GitLab: It’s similar to GitHub but open-source and you can install your own instance on your server so that all your data stays within your organization’s IT realm. That’s better for data projection, customizability and you’re less dependent on the services of an external company.

Robust web scraping or web API based data collection

There are thousands of articles on the web about web scraping and accessing web APIs. Most of them show you how to extract information from specific elements on a web page or how to communicate with a specific API in order to collect data. For smaller data collection projects, this knowledge may be sufficient, but large scale data collection which must run reliably over days or even weeks brings up additional problems that mainly focus on the robustness of the data collection process. I will try to tackle some of these problems in this post. I will use examples in Python, but the basic concepts can easily be translated to R or other programming languages.

Read More →

Using Google Places data to analyze changes in mobility during the COVID-19 pandemic

During the COVID-19 pandemic, it’s apparent that location data gathered by private IT companies and telcos is a primary source for many studies about the effect of mobility restrictions on people’s behaviors and movements. In this blog post, I’d like to have a look at the “popular times” data provided by Google Places. I explain the limitations of this data, show how to gather it and provide some results from data that I fetched during March and April.

Read More →