Author Archives: Markus Konrad

Linkdump #69

February 16, 2018 2:29 pm , Markus Konrad

Python

Interesting articles, projects and news

Posted in: Linkdump

Linkdump #68

February 12, 2018 12:38 pm , Markus Konrad

R

Python

Interesting articles, projects and news

Posted in: Linkdump

Vectorization and parallelization in Python with NumPy and Pandas

February 2, 2018 4:25 pm , Markus Konrad

Modern computers are equipped with processors that allow fast parallel computation at several levels: Vector or array operations, which allow to execute similar operations simultaneously on a bunch of data, and parallel computing, which allows to distribute data chunks on several CPU cores and process them in parallel. When working with large amounts of data, it is important to know how to exploit these features because this can reduce computation time drastically. Taking advantage of this usually requires some extra effort during implementation. With packages like NumPy and Python’s multiprocessing module the additional work is manageable and usually pays off when compared to the enormous waiting time that you may need when doing large-scale calculations inefficiently.

Posted in: Parallel computing, Python

Slides on Topic Modeling – Background, Hyperparameters and common pitfalls

January 26, 2018 4:56 pm , Markus Konrad

I just uploaded my slides on probabilistic Topic Modeling with LDA that give an overview of the theory, the basic assumptions and prerequisites of LDA and some notes on common pitfalls that often happen when trying out this method for the first time. Furthermore I added a Jupyter Notebook that contains a toy implementation of the Gibbs sampling algorithm for LDA with lots of comments and plots to illustrate each step of the algorithm.

Posted in: Machine Learning, NLP & Text Analysis, Presentation slides, Python

Linkdump #67

January 26, 2018 2:13 pm , Markus Konrad

R

Python

Interesting articles, projects and news

Posted in: Linkdump

Linkdump #66

January 19, 2018 2:51 pm , Markus Konrad

R

Python

Interesting articles, projects and news

Posted in: Linkdump

Linkdump #65

January 12, 2018 5:14 pm , Markus Konrad

R

Python

Introduction to Python Ensembles

Interesting articles, projects and news

Posted in: Linkdump

Linkdump #64

January 5, 2018 3:20 pm , Markus Konrad

R

Python

10 Tips for Upgrading to Django 2.0

Interesting articles, projects and news

Posted in: Linkdump

Linkdump #63

December 15, 2017 12:26 pm , Markus Konrad

R

Python

Interesting articles, projects and news

Posted in: Linkdump

Web scraping with automated browsers using Selenium

December 1, 2017 5:26 pm , Markus Konrad

Web scraping, i.e. automated data mining from websites, usually involves fetching a web page’s HTML document, parsing it, extracting the required information, and optionally follow links within this document to other web pages to repeat this process. This approach is sufficient for many websites that display information in a static way, i.e. do not respond to user interaction dynamically by the means of JavaScript. In these cases, web scraping can be implemented with Python packages such as requests and BeautifulSoup. Even interactive elements such as forms can be emulated by observing the HTTP POST and GET data that is send to the server, whenever a form is submitted. However, this approach has limits. Sometimes, it is necessary to automate a whole browser in order to implement web scraping on JavaScript-heavy websites as will be shown with a short example in this post.

Posted in: Data Mining, Python, Web Scraping

« Previous 1 … 8 9 10 11 12 … 19 Next »

Author Archives: Markus Konrad

Python

Interesting articles, projects and news

R

Python

Interesting articles, projects and news

R

Python

Interesting articles, projects and news

R

Python

Interesting articles, projects and news

R

Python

Interesting articles, projects and news

R

Python

Interesting articles, projects and news

R

Python

Interesting articles, projects and news

Post Navigation

Recent posts

Categories

Links

Links

Recent Posts

Recent Comments

Archives

Categories

Meta