Category Archives: Io

Styling individual cells in Excel output files created with pandas

The Python Data Analysis Library pandas provides basic but reliable Excel in- and output. However, more advanced features for writing Excel files are missing. Some of these advanced things, like conditional formatting can be achieved with XlsxWriter (see also
Improving Pandas’ Excel Output). However, sometimes it is
necessary to set styles like font or background colors on individual cells on the “Python side”. In this scenario,
XlsxWriter won’t work, since “XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates of datetimes.”

To achieve setting styles on individual cells on the Python side, I wrote a small extension for pandas and put it on github, along with some examples. It comes in quite handy, for example when you are running complicated data validation routines (which you probably don’t want to implement in VBA) and want to highlight the validation results by coloring
individual cells in the output Excel sheets.

Reading textual data from CSV and Excel files correctly with pandas

The pandas library is great for data analysis with Python, but it has some caveats and gotchas. One of it is importing textual data from CSV and Excel files that is automatically converted to numeric values when it only consists of digits. This is mostly a nice feature, but sometimes it is not what you want, for example in the case of codes with leading zeros like a FIPS state code. If you have a column with FIPS state codes in your CSV or Excel file, it will show up as an integer series after importing it with pandas, so the FIPS code of ’03’ will become the integer ‘3’.

To prevent pandas from doing this, a good guess would be specifying the dtype directly so that it doesn’t need to be guessed, but unfortunately this is not supported:

import pandas as pd

df = pd.read_excel('some_excelfile.xls', dtype=object)
>>> ValueError: The 'dtype' option is not supported with the 'python' engine

It also doesn’t work with other “engines” yet, so we need another solution: Converters. You can pass a dict that specifies a conversion function for each column (either by column index or column name). For example, if we want to have strings instead of numeric values in the columns with indices 3 and 7, we could pass a dict with the conversion function str() like this:

converters = {col: str for col in (3, 7)}
df = pd.read_excel('some_excelfile.xls', converters=converters)

pandas will not guess the data type of the columns where a conversion function is defined but will use the output type of the conversion function, so we will have a series of strings with the leading zeros as we wanted it.