Monthly Archives: March 2022

You are browsing the site archives by month.

Some thoughts about the use of cloud services and web APIs in social science research

In the recent weeks I’ve collaborated on the online book APIs for social scientists and added two chapters: a chapter about the genderize.io API and a chapter about the GitHub API. The book seeks to provide an overview about web or cloud services and their APIs that might be useful for social scientists and covers a wide range from text translation to accessing social media APIs complete with code examples in R. By harnessing the GitHub workflow model, the book itself is also a nice example of fruitful collaboration via work organization methods that were initially developed in the open source software community.

While working on the two chapters and playing around with the APIs, I once again noticed the double-edged nature of using web APIs in research. It can greatly improve research or even enable research that was not possible before. At the same time, data collected from these APIs can inject bias and the use of these APIs may cause issues with research transparency and replicability. I noted some of these issues in the respective book chapters and I’ve written about them before,[1]See this article in WZB Mitteilungen (only in German) together with Jonas Wiedner. but the two APIs that I covered for the book provide some very practical examples of the main issues when working with web APIs and I wanted to point them out in this blog post.

Read More →

Continuous Integration testing with GitHub Actions using tox and hypothesis

I recently published a major update for the Python tmtoolkit package for text mining and topic modeling. Since it is a fairly large research software package, I’m using a Continuous Integration (CI) system for automated testing on different platforms. This system makes sure that every code update that is pushed to the software repository is automatically checked by running the test suite on all three major operating systems (Linux, MacOS, Windows). For the recent update of tmtoolkit, I decided to move the CI system from Travis CI to GitHub Actions (GHA) since GHA is directly integrated into GitHub and easy to set up. Still, there are some obstacles to overcome so this short post shows how to set up GHA for a Python project with a few extra requirements such as installing system packages on the test runner machine or running tests with tox and hypothesis.

Read More →