Tag Archives: R-bloggers

A Twitter network of members of the 19th German Bundestag – part II

This is the second part about my project that deals with the Twitter network of members of the Bundestag. After getting the necessary data, which was explained in part 1, we will now focus on creating a network graph with links between the representatives’ Twitter accounts for exploratory network analysis.

Read More →

A Twitter network of members of the 19th German Bundestag – part I

For the R tutorial that I gave at the WZB in the previous semester, I gave an introduction on how to query web APIs – specifically the Twitter API – and automated data extraction from websites (i.e. web scraping). I showed an example that combined both of these techniques for the goal of getting data about the Twitter activities of members of the current (19th) German Bundestag, which is the federal German parliament. The focus was especially on the question “who follows who” on Twitter. I thought it’s a nice little project showing how to use the Twitter API, do web scraping, combine the collected data and do some exploratory network analysis – all within the R environment. So I decided to polish the code a little bit, put in on GitHub and wrote two blog posts. The first part, i.e. this part, is all about getting the data.

Read More →

Zooming in on maps with sf and ggplot2

When working with geo-spatial data in R, I usually use the sf package for manipulating spatial data as Simple Features objects and ggplot2 with geom_sf for visualizing these data. One thing that comes up regularly is “zooming in” on a certain region of interest, i.e. displaying a certain map detail. There are several ways to do so. Three common options are:

  • selecting only certain areas of interest from the spatial dataset (e.g. only certain countries / continent(s) / etc.)
  • cropping the geometries in the spatial dataset using sf_crop()
  • restricting the display window via coord_sf()

I will show the advantages and disadvantages of these options and especially focus on how to zoom in on a certain point of interest at a specific “zoom level”. We will see how to calculate the coordinates of the display window or “bounding box” around this zoom point.

Read More →

Three ways of visualizing a graph on a map

When visualizing a network with nodes that refer to a geographic place, it is often useful to put these nodes on a map and draw the connections (edges) between them. By this, we can directly see the geographic distribution of nodes and their connections in our network. This is different to a traditional network plot, where the placement of the nodes depends on the layout algorithm that is used (which may for example form clusters of strongly interconnected nodes).

In this blog post, I’ll present three ways of visualizing network graphs on a map using R with the packages igraph, ggplot2 and optionally ggraph. Several properties of our graph should be visualized along with the positions on the map and the connections between them. Specifically, the size of a node on the map should reflect its degree, the width of an edge between two nodes should represent the weight (strength) of this connection (since we can’t use proximity to illustrate the strength of a connection when we place the nodes on a map), and the color of an edge should illustrate the type of connection (some categorical variable, e.g. a type of treaty between two international partners).

Read More →

Visualizing graphs with overlapping node groups

I recently came across some data about multilateral agreements, which needed to be visualized as network plots. This data had some peculiarities that made it more difficult to create a plot that was easy to understand. First, the nodes in the graph were organized in groups but each node could belong to multiple groups or to no group at all. Second, there was one “super node” that was connected to all other nodes (while “normal” nodes were only connected within their group). This made it difficult to find the right layout that showed the connections between the nodes as well as the group memberships. However, digging a little deeper into the R packages igraph and ggraph it is possible to get satisfying results in such a scenario.

Read More →

Creating a “balloon plot” as alternative to a heat map with ggplot2

Heat maps are great to compare observations with lots of variables (which must be comparable in terms of unit, domain, etc.). In some cases however, traditional heat maps might not suffice, for example when you want to compare multiple groups of observations. One solution is to use facets. Another solution, which I want to explain here, is to make a “ballon plot” with a fixed grid of rows and columns.

Read More →

Parallel Coordinate Plots for Discrete and Categorical Data in R — A Comparison

Parallel Coordinate Plots are useful to visualize multivariate data. R provides several packages/functions to draw Parallel Coordinate Plots (PCPs):

In this post I will compare these approaches using a randomly generated data set with three discrete variables.

Read More →

Dynamic column/variable names with dplyr using Standard Evaluation functions

Data manipulation works like a charm in R when using a library like dplyr. An often overlooked feature of this library is called Standard Evaluation (SE) which is also described in the vignette about the related Non-standard Evaluation. It basically allows you to use dynamic arguments in many dplyr functions (“verbs”).

Read More →