First parallel version of the code
Parallelized using multiprocessing library. I'm not really sure about
the code being thread safe. I think we don't care if tweets are appended
to the files in a different order, but the metadata files being
corrupted would be problematic. In the first tests the metadata were
fine, but I think this line is probably not thread safe (two threads
could load try to update the old value at the same time, resulting in
inconsistencies):
"""
metadata_file["files"][file_path]["count"] += increase
"""
Apart from that, code is much faster than before.
Showing
Please
register
or
sign in
to comment