Category Archives: Parallel Computing

Vectorization and parallelization in Python with NumPy and Pandas

February 2, 2018 4:25 pm , Markus Konrad

Modern computers are equipped with processors that allow fast parallel computation at several levels: Vector or array operations, which allow to execute similar operations simultaneously on a bunch of data, and parallel computing, which allows to distribute data chunks on several CPU cores and process them in parallel. When working with large amounts of data, it is important to know how to exploit these features because this can reduce computation time drastically. Taking advantage of this usually requires some extra effort during implementation. With packages like NumPy and Python’s multiprocessing module the additional work is manageable and usually pays off when compared to the enormous waiting time that you may need when doing large-scale calculations inefficiently.

Posted in: Parallel computing, Python

Speeding up NLTK with parallel processing

June 19, 2017 5:24 pm , Markus Konrad

When doing text processing with NLTK on large corpora, you often need a lot of patience since even simple methods like word tokenization take quite some time when you’re processing a large amount of text data. This is because NLTK does not often harness the power of modern multicore computers — the code will only run on a single core even if you have four processing cores in your machine. You will need to add parallel processing of your documents yourself. Fortunately this is quite straight forward to implement with Python’s multiprocessing module and I will show how to do this in this small post.

Posted in: NLP & Text Analysis, Parallel computing, Python

Category Archives: Parallel Computing

Vectorization and parallelization in Python with NumPy and Pandas

Speeding up NLTK with parallel processing

Recent posts

Categories

Links

Links

Recent Posts

Recent Comments

Archives

Categories

Meta