Multiprocessing Functions for Text Processing
2020 Apr 01Anyone who followed the post A small journey in the valley of Natural Language Processing and Text Pre-Processing for German language saw some of the challenges of modeling a German text classifier.
However, one thing that saved me in the pre-processing stage was that I used multiprocessing to parallelize the pre-processing of the text column, which saved me an incredible amount of time (recalling: I had 1+ million text records, with an average of 250 words per record and a standard deviation of 700, all using an internal library).
<script src=”.js”> </script>
That’s it: Simple and straightforward.