- 15 Jul, 2019 3 commits
-
-
serpucga authored
-
serpucga authored
Parallelized using multiprocessing library. I'm not really sure about the code being thread safe. I think we don't care if tweets are appended to the files in a different order, but the metadata files being corrupted would be problematic. In the first tests the metadata were fine, but I think this line is probably not thread safe (two threads could load try to update the old value at the same time, resulting in inconsistencies): """ metadata_file["files"][file_path]["count"] += increase """ Apart from that, code is much faster than before.
-
serpucga authored
Simpler, more elegant and slightly faster version using the cursors instead of building a list of tweets for each page
-
- 12 Jul, 2019 3 commits
- 11 Jul, 2019 2 commits
- 10 Jul, 2019 2 commits
-
-
serpucga authored
-
serpucga authored
Repository contains just one simple script for the moment to dump the "tweets" collection of a Mongo database to a JSON file in a "pymongodump" directory that is created at the moment and place of execution. Faster than mongoexport, although the format of the resulting JSONs is somewhat different (adapted to Python's syntax).
-