Author Archives: Markus Konrad

Linkdump #13

R
Python
Interesting articles, projects and news

Linkdump #12

Python
R

Styling individual cells in Excel output files created with pandas

The Python Data Analysis Library pandas provides basic but reliable Excel in- and output. However, more advanced features for writing Excel files are missing. Some of these advanced things, like conditional formatting can be achieved with XlsxWriter (see also
Improving Pandas’ Excel Output). However, sometimes it is
necessary to set styles like font or background colors on individual cells on the “Python side”. In this scenario,
XlsxWriter won’t work, since “XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates of datetimes.”

To achieve setting styles on individual cells on the Python side, I wrote a small extension for pandas and put it on github, along with some examples. It comes in quite handy, for example when you are running complicated data validation routines (which you probably don’t want to implement in VBA) and want to highlight the validation results by coloring
individual cells in the output Excel sheets.

Linkdump #11

Python
R
Interesting articles, projects and news

Parallel Coordinate Plots for Discrete and Categorical Data in R — A Comparison

Parallel Coordinate Plots are useful to visualize multivariate data. R provides several packages/functions to draw Parallel Coordinate Plots (PCPs):

In this post I will compare these approaches using a randomly generated data set with three discrete variables.

Read More →

Dynamic column/variable names with dplyr using Standard Evaluation functions

Data manipulation works like a charm in R when using a library like dplyr. An often overlooked feature of this library is called Standard Evaluation (SE) which is also described in the vignette about the related Non-standard Evaluation. It basically allows you to use dynamic arguments in many dplyr functions (“verbs”).

Read More →

Bringing SVG to life with d3.js

Scalable Vector Graphics (SVG) are create to display high quality, scalable, graphics on the web. Most graphics software like Adobe Illustrator or Inkscape can export it. The graphics are of course static, but with a little help from the JavaScript data visualization library d3.js, they can be brought to life by animating parts of them or making some elements respond to actions like mouse clicks.

In this post I will explain how to do that using the example of an interactive map for the LATINNO project.

Read More →

Linkdump #10

Python related
R related
Interesting articles, projects and news

Linkdump #9

Python related
R related

A tip for the impatient: Simple caching with Python pickle and decorators

During testing and development, it is sometimes necessary to rerun tasks that take quite a long time. One option is to drink coffee in the mean time, the other is to use caching, i.e. save once calculated results to disk and load them from there again when necessary. The Python module pickle is perfect for caching, since it allows to store and read whole Python objects with two simple functions. I already showed in another article that it’s very useful to store a fully trained POS tagger and load it again directly from disk without needing to retrain it, which saves a lot of time.

Read More →