Files · ed2c9d7432b4fa34795c7b0e1ef3e68d2f59dd2f · serpucga / migration_scripts

First parallel version of the code · ed2c9d74

authored Jul 15, 2019

Parallelized using multiprocessing library. I'm not really sure about
the code being thread safe. I think we don't care if tweets are appended
to the files in a different order, but the metadata files being
corrupted would be problematic. In the first tests the metadata were
fine, but I think this line is probably not thread safe (two threads
could load try to update the old value at the same time, resulting in
inconsistencies):

"""
metadata_file["files"][file_path]["count"] += increase
"""

Apart from that, code is much faster than before.

ed2c9d74

Name	Last commit	Last update
lib		Loading commit data...
.gitignore		Loading commit data...
header.txt		Loading commit data...
pymongoexport_csv.py		Loading commit data...
pymongoexport_json.py		Loading commit data...
requirements.txt		Loading commit data...
utils.py		Loading commit data...