1. 15 Jul, 2019 3 commits
    • Create one Mongo connection for each process · 33782e37
      serpucga authored
    • First parallel version of the code · ed2c9d74
      serpucga authored
      Parallelized using multiprocessing library. I'm not really sure about
      the code being thread safe. I think we don't care if tweets are appended
      to the files in a different order, but the metadata files being
      corrupted would be problematic. In the first tests the metadata were
      fine, but I think this line is probably not thread safe (two threads
      could load try to update the old value at the same time, resulting in
      inconsistencies):
      
      """
      metadata_file["files"][file_path]["count"] += increase
      """
      
      Apart from that, code is much faster than before.
    • Simpler, more elegant and slightly faster version using the cursors instead of… · 34776b63
      serpucga authored
      Simpler, more elegant and slightly faster version using the cursors instead of building a list of tweets for each page
  2. 12 Jul, 2019 3 commits
  3. 11 Jul, 2019 2 commits
  4. 10 Jul, 2019 2 commits
    • gitignore set to ignore output dir pymongodump · 56c27157
      serpucga authored
    • Initial commit: Mongo to JSON dumper · d1923e7e
      serpucga authored
      Repository contains just one simple script for the moment to dump the
      "tweets" collection of a Mongo database to a JSON file in a
      "pymongodump" directory that is created at the moment and place of
      execution.
      Faster than mongoexport, although the format of the resulting JSONs is
      somewhat different (adapted to Python's syntax).