Category Archives: Web Scraping

A Twitter network of members of the 19th German Bundestag – part II

This is the second part about my project that deals with the Twitter network of members of the Bundestag. After getting the necessary data, which was explained in part 1, we will now focus on creating a network graph with links between the representatives’ Twitter accounts for exploratory network analysis.

Read More →

A Twitter network of members of the 19th German Bundestag – part I

For the R tutorial that I gave at the WZB in the previous semester, I gave an introduction on how to query web APIs – specifically the Twitter API – and automated data extraction from websites (i.e. web scraping). I showed an example that combined both of these techniques for the goal of getting data about the Twitter activities of members of the current (19th) German Bundestag, which is the federal German parliament. The focus was especially on the question “who follows who” on Twitter. I thought it’s a nice little project showing how to use the Twitter API, do web scraping, combine the collected data and do some exploratory network analysis – all within the R environment. So I decided to polish the code a little bit, put in on GitHub and wrote two blog posts. The first part, i.e. this part, is all about getting the data.

Read More →

Web scraping with automated browsers using Selenium

Web scraping, i.e. automated data mining from websites, usually involves fetching a web page’s HTML document, parsing it, extracting the required information, and optionally follow links within this document to other web pages to repeat this process. This approach is sufficient for many websites that display information in a static way, i.e. do not respond to user interaction dynamically by the means of JavaScript. In these cases, web scraping can be implemented with Python packages such as requests and BeautifulSoup. Even interactive elements such as forms can be emulated by observing the HTTP POST and GET data that is send to the server, whenever a form is submitted. However, this approach has limits. Sometimes, it is necessary to automate a whole browser in order to implement web scraping on JavaScript-heavy websites as will be shown with a short example in this post.

Read More →