I recently came across some data about multilateral agreements, which needed to be visualized as network plots. This data had some peculiarities that made it more difficult to create a plot that was easy to understand. First, the nodes in the graph were organized in groups but each node could belong to multiple groups or to no group at all. Second, there was one “super node” that was connected to all other nodes (while “normal” nodes were only connected within their group). This made it difficult to find the right layout that showed the connections between the nodes as well as the group memberships. However, digging a little deeper into the R packages igraph and ggraph it is possible to get satisfying results in such a scenario.
Linkdump #77
R
- Qualitative Data Science: Using RQDA to analyse interviews
- Interpretable Machine Learning with iml and mlr
Python
- Interpretable Machine Learning with Python
- Visualizing Pandas’ Pivoting and Reshaping Functions
- Basic Machine Learning with SciKit-Learn
- Machine Learning: In Facebooks PyTorch 1.0 sollen Forschung und Produktion verschmelzen
Interesting articles, projects and news
- Trotz Automatisierung: “Es wird Arbeit geben”
- Facebook security analyst is fired for using private data to stalk women
- Bundesländer: Google & Co. sollen die Kriterien ihrer Algorithmen offenlegen
- Google Maps Platform für Entwickler gestartet
- re:publica 2018: Danah Boyd, die Algorithmen und die Macht
- As AI advances rapidly, More Human Than Human says, “Stop, let’s think about this”
- rough.js – Create graphics with a hand-drawn, sketchy, appearance
Linkdump #76
R
Python
- Bayesian Linear Regression in Python: Using Machine Learning to Predict Student Grades Part 1
- hypertools – A Python toolbox for gaining geometric insights into high-dimensional data
- Pipenv: A Guide to the New Python Packaging Tool
- How to create a 2D game with Python and the Arcade library
Interesting articles, projects and news
Linkdump #75
R
- A Recession Before 2020 Is Likely; On the Distribution of Time Between Recessions
- Writing an R package from scratch
- Developing Your First R Package
- Statistical Inference: A Tidy Approach
Python
- Python 3.7: Introducing Data Classes
- Python: Neue Repository-Software für PyPI ist fertig
- backoff – Python library providing function decorators for configurable backoff and retry
Interesting articles, projects and news
- No boundaries for Facebook data: third-party trackers abuse Facebook Login
- Arrow and beyond: Collaborating on next generation tools for open source data science
- Text Embedding Models Contain Bias. Here’s Why That Matters.
- Fake News PSA — DeepFake video of Barack Obama saying things that Obama never said
- Introducing TensorFlow.js: Machine Learning in Javascript
- The little book of LDA
- Deep Painterly Harmonisation — composite and preserve the style of the destination image
- Google works out a fascinating, slightly scary way for AI to isolate voices in a crowd
- Künstliche Intelligenz: Facebook sagt Nutzerverhalten voraus und verkauft damit Anzeigen
Linkdump #74
R
- ggridges – Ridgeline plots provide a convenient way of visualizing changes in distributions over time or space. This package enables the creation of such plots in ‘ggplot2’.
- Regular Expressions Every R programmer Should Know
- Neglected R Super Functions
- An overview of keyword extraction techniques
Python
- Array Programming With NumPy
- Introducing TensorFlow Probability
- Creating Map Animations with Python
- python-intervals – Python library for interval arithmetic
Interesting articles, projects and news
- Ten quick tips for teaching programming
- Why Does “=” Mean Assignment?
- Data Science in the Browser
- Code and Data for the Social Sciences: A Practitioner’s Guide
- Targeting: Die nächste Generation
- Studie: Roboter bedrohen Jobs in Deutschland stärker als in anderen Industrienationen
- Blockchain is not only crappy technology but a bad vision for the future
Linkdump #73
News / Artikel
R
Python
- Part-of-Speech tagging tutorial with the Keras Deep Learning library
- Pytubes is a library that optimizes loading datasets into memory
- Introducing TensorFlow Hub: A Library for Reusable Machine Learning Modules in TensorFlow
- Markdown Descriptions on PyPI
- ANN Visualizer – A python library for visualizing Artificial Neural Networks (ANN)
- The Python Arcade Library – Arcade is an easy-to-learn Python library for creating 2D video games. It is ideal for people learning to program.
Interesting articles, projects and news
- Interpreting predictive models with Skater: Unboxing model opacity
- How Cambridge Analytica’s Facebook targeting model really worked
- 10 Simple rules for overcoming statistical paralysis
- Good enough practices in scientific computing
- Schlechter Journalismus füttert “Fake News”
- Wie kreativ ist künstliche Intelligenz?
Linkdump #72
R
- reticulate: R interface to Python
- Prime Hints For Running A Data Project In R
- Dependencies
- Dependencies and bloat
Python
- New PyCharm 2018.1 supports code cells for scientific development
- textdistance – compute distance between sequences. 30+ algorithms, pure python implementation, common interface.
- Pythonic Data Cleaning With NumPy and Pandas
- Overview of Pandas Data Types
Interesting articles, projects and news
- Commoditisation of AI, digital forgery and the end of trust: how we can fix it
- Studie: Zu hohe Erwartungen an Blockchain und Kryptowährungen?
- KI-Forscher: Deep Learning wird früher oder später ausgereizt sein
- Video suggests huge problems with Uber’s driverless car program
- Kommentar: Erzählt uns nicht, KI würde keine Arbeitsplätze zerstören
- Wenn Computer über Leben und Tod entscheiden: Wer haftet, wenn die KI tötet?
- Facebook’s Cambridge Analytica scandal, explained
Linkdump #71
R
- Math Notation for R Plot Titles: expression, bquote, & Greek Letters
- Data-driven unit testing for data scientists and quant developers alike
- Generating codebooks in R
- Exploring influence in networks
- R Tip: Introduce Indices to Avoid for() Class Loss Issues
Python
- The Visual Python Debugger for Jupyter Notebooks You’ve Always Wanted
- Snips Python library to extract meaning from text
- Exploring word2vec embeddings as a graph of nearest neighbors
Interesting articles, projects and news
- Historischer Meilenstein: Microsoft-KI übersetzt Chinesisch so gut wie Menschen
- EU-Studie zu “Fake News”: Plattformen sollen Algorithmen offenlegen
- “Zur Bestie geworden”: UN-Beobachter geben Facebook Mitschuld an Verbrechen gegen Minderheit in Myanmar
- Studie: Konservative Amerikaner verbreiteten russische Fake-News zur Präsidentschaftswahl
- Studie: Unwahre Twitter-Inhalte verbreiten sich schneller als die Wahrheit
- Studie: Präzisionsmedizin könnte weitere Nachteile für unterprivilegierte Gruppen bringen
- Reflections on 4 months of GitHub: my advice to beginners
- The spread of true and false news online
Creating and plotting Voronoi regions for geographic data with geovoronoi
Recently, I’ve worked a lot with geospatial data in Python. One thing that we needed for our analysis was generating Voronoi regions (or “cells”) from a given set of coordinates inside certain administrative boundaries (a country, a state, etc.). Such regions are interesting for spatial analysis, because each random point inside a Voronoi region is closest to the cell’s “origin point” (the point the cell was generated from) than to any other cell’s origin. As a practical example: In Melbourne parents can see which is the closest school for their home, by looking at an online map of Voronoi regions of schools.
These regions allow to calculate an estimate of a “coverage”: For each point’s Voronoi region, the area can be calculated, which represents the area theoretically covered by this point. Referring to the Melbourne example: The schools at the edge of the city cover a larger area than those in the city center. This approach of course does not take geographic properties into account. So if there’s a large lake inside a cell, it is also part of the covered area. Still, Voronoi tessellation is useful when looking at how the shape of the Voronoi regions changes over time, for example when new schools open or others close. We could then see for example, if the coverage of schools in the city center becomes better over the years, whereas in the rural areas it gets more sparse.
So all in all, Voronoi regions can be a very useful tool in spatial data analysis. QGIS provides a tool for Voronoi tessellation but we needed a more flexible approach that also fit into our workflow and could be used in our Python scripts. I decided to write a small Python package named geovoronoi that takes a set of points, a boundary object (the geographic shape enclosing the points – e.g. a country boundary) and then calculates the Voronoi regions using SciPy. These regions are then “cut” to the enclosing shape (using the excellent shapely package). The resulting Voronoi cells can then be used for further calculations (areas, distances, unions, etc.) and can also be visualized on a map.
The package geovoronoi is now available on PyPI (install it with pip install geovoronoi[plotting]) and the source is uploaded on the WZB’s GitHub page.
Linkdump #70
R
- R Tip: Use drop = FALSE with data.frames
- Cluster Analysis – Naming Pattern in the last Century
- Annotating phylogenetic tree with images using ggtree and ggimage
Python
- Data Pre-Processing in Python: How I learned to love parallelized applies with Dask and Numba
- Python Plotting With Matplotlib (Guide)
- 101 NumPy Exercises for Data Analysis (Python)
- JupyterLab is Ready for Users
- Python & Async Simplified
Interesting articles, projects and news
- Leaks: Rechtsextreme manipulierten Online-Debatten zur Bundestagswahl
- Bad News: Browsergame will über Fake News aufklären
- Blockchain For Dummies
- Überhöhte Hoffnungen an Deep Learning im autonomen Fahren
- China schafft digitales Punktesystem für den “besseren” Menschen
- Voyages in sentence space
- Datenanalyse: Facebook-Hetze kommt von einer extrem kleinen Gruppe
- Programm findet Hass-Tweets selbstständig und sofort
- Kollege Roboter im Supermarkt: Walmart-Roboter kommt bei Mitarbeitern gut an
- Women go into science careers more often in countries without gender equality
- Künstliche Intelligenz: Google Brain verfasst selbstständig Wikipedia-Artikel
- Discovering Types for Entity Disambiguation
- Wie rechte Trolle die Diskussion um einen ARD-Film kaperten
- Robo-philosophy: Führt künstliche Intelligenz zu mehr Freiheit?


Recent Comments