Added get_tweets_from_csv() method

47f3ea5d · Serbaf · e5f2cd6f · 47f3ea5d · 47f3ea5d · 47f3ea5d
Commit 47f3ea5d authored Mar 20, 2019 by Serbaf
Showing with 91 additions and 4 deletions

HISTORY.rst HISTORY.rst +25 -0

README.rst README.rst +11 -1

requirements_dev.txt requirements_dev.txt +2 -1

setup.py setup.py +2 -2

tweet_model.py tweet_model/tweet_model.py +51 -0

No files found.
--- a/HISTORY.rst
+++ b/HISTORY.rst
@@ -6,3 +6,28 @@ History
 ------------------
 * First release on PyPI.
+0.2.0 (2019-03-20)
+------------------
+* Completed the Tweet class that allows the user to make usable instances of a
+  tweet model. Includes initialization of all the Tweet attributes indicated in
+  the Twitter documentation (default to None, unless the user provides a value)
+  and overriding of __getitem__ to provide a dictionary-like access to the
+  information.
+0.3.0 (2019-03-20)
+------------------
+* Added method "get_tweets_from_csv()", which gets a CSV file as an argument
+  and returns a list containing as many Tweet objects as lines (minus the
+  header) in the CSV file. The header of the CSV is used to know which 
+  attributes should be set.
+* The method will raise an error and exit if any item in the header does not
+  match with the specification of the Tweet object (for example, the header
+  word "media.sizes.thumb.h" would be valid, but "user.lightsaber.color" would
+  not.
+* At this point, the method took 1.75s aprox to read and return the contents of
+  a 5.7 MB as a list of 'Tweet's. This could be troublesome with very large
+  collections in a future if the progression of time was proportional with the 
+  file size (estimation would be 25 minutes for a 5 GB file)
--- a/README.rst
+++ b/README.rst
@@ -28,10 +28,20 @@ Dashboard project.
 Features
 --------
-* TODO
+* A modelization of a tweet in the form of class Tweet. This class contains a
+  constructor that initializes all the possible tweet attributes to None
+  except those indicated otherwise.
+* The inner objects of a tweet ("user", "entities", "places", etc.) are stored
+  internally as nested dictionaries.
+* The __getitem__() method for Tweet is overriden to allow a dictionary-like
+  access to the tweet contents. For example, if "tweet1" is an instance of
+  Tweet, one could do tweet1["id"] to get the id of that tweet, or
+  tweet1["user"]["name"] to get the name of the person that published the
+  tweet.
 Credits
 -------
+Creator: Sergio
 This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

--- a/requirements_dev.txt
+++ b/requirements_dev.txt
@@ -8,4 +8,5 @@ coverage==4.5.1
 Sphinx==1.8.1
 twine==1.12.1
+# App requirements
+tweetmanager-serpucga==1.1.4
--- a/setup.py
+++ b/setup.py
@@ -44,12 +44,12 @@ setup(
    long_description=readme + '\n\n' + history,
    include_package_data=True,
    keywords='tweet_model',
-    name='tweet_model',
+    name='tweet_model_serpucga',
    packages=find_packages(include=['tweet_model']),
    setup_requires=setup_requirements,
    test_suite='tests',
    tests_require=test_requirements,
    url='https://github.com/Serbaf/tweet_model',
-    version='0.2.0',
+    version='0.3.0',
    zip_safe=False,
 )
--- a/tweet_model/tweet_model.py
+++ b/tweet_model/tweet_model.py
 # -*- coding: utf-8 -*-
 """Main module."""
+import sys
+from tweet_manager.lib import format_csv
 class Tweet():
    """
@@ -352,3 +356,50 @@ class Tweet():
    def __getitem__(self, key):
        return getattr(self, key)
+def get_tweets_from_csv(csv_file):
+    """
+    Take one argument: a path pointing to a valid CSV file.
+    The function reads the file, which should be a collection of tweets with a
+    header indicating the tweet fields (user.id, place.bounding_box.type,
+    etc.), and instances a new Tweet object for each of the lines in the CSV
+    file, assigning each value in the CSV to the corresponding Tweet attribute.
+    Returns a list of the Tweet objects instanced.
+    """
+    tweets = []
+    with open(csv_file, 'r') as csv_object:
+        header = csv_object.readline()
+        body = csv_object.readlines()
+    header = format_csv.split_csv_line(header)
+    # Check that the header contains valid fields
+    test_tweet = Tweet()
+    for field in header:
+        field_components = field.split(".")
+        checking_dict = test_tweet.__dict__
+        error_string = ""
+        for component in field_components:
+            error_string += component
+            if (checking_dict is None) or (component not in checking_dict):
+                print('The field in the header "' + error_string + '" is ' +
+                      'not a valid element of a Tweet')
+                sys.exit(1)
+            checking_dict = checking_dict[component]
+            error_string += "."
+    # Go through every tweet in the file, instance it using the 'Tweet' class
+    # and add it to the list 'tweets'
+    for j in range(len(body)):
+        body[j] = format_csv.split_csv_line(body[j])
+        tweet_contents = {}
+        for i in range(len(body[j])):
+            if body[j][i] != '':
+                tweet_contents[header[i].replace(".", "__")] = body[j][i]
+        tweets.append(Tweet(**tweet_contents))
+    return tweets