Multiprocessing Functions for Text Processing

Anyone who followed the post A small journey in the valley of Natural Language Processing and Text Pre-Processing for German language saw some of the challenges of modeling a German text classifier.

However, one thing that saved me in the pre-processing stage was that I used multiprocessing to parallelize the pre-processing of the text column, which saved me an incredible amount of time (recalling: I had 1+ million text records, with an average of 250 words per record and a standard deviation of 700, all using an internal library).

<script src=”.js”> </script>

That’s it: Simple and straightforward.