Commit 47f3ea5d by Serbaf

Added get_tweets_from_csv() method

parent e5f2cd6f
...@@ -6,3 +6,28 @@ History ...@@ -6,3 +6,28 @@ History
------------------ ------------------
* First release on PyPI. * First release on PyPI.
0.2.0 (2019-03-20)
------------------
* Completed the Tweet class that allows the user to make usable instances of a
tweet model. Includes initialization of all the Tweet attributes indicated in
the Twitter documentation (default to None, unless the user provides a value)
and overriding of __getitem__ to provide a dictionary-like access to the
information.
0.3.0 (2019-03-20)
------------------
* Added method "get_tweets_from_csv()", which gets a CSV file as an argument
and returns a list containing as many Tweet objects as lines (minus the
header) in the CSV file. The header of the CSV is used to know which
attributes should be set.
* The method will raise an error and exit if any item in the header does not
match with the specification of the Tweet object (for example, the header
word "media.sizes.thumb.h" would be valid, but "user.lightsaber.color" would
not.
* At this point, the method took 1.75s aprox to read and return the contents of
a 5.7 MB as a list of 'Tweet's. This could be troublesome with very large
collections in a future if the progression of time was proportional with the
file size (estimation would be 25 minutes for a 5 GB file)
...@@ -28,10 +28,20 @@ Dashboard project. ...@@ -28,10 +28,20 @@ Dashboard project.
Features Features
-------- --------
* TODO * A modelization of a tweet in the form of class Tweet. This class contains a
constructor that initializes all the possible tweet attributes to None
except those indicated otherwise.
* The inner objects of a tweet ("user", "entities", "places", etc.) are stored
internally as nested dictionaries.
* The __getitem__() method for Tweet is overriden to allow a dictionary-like
access to the tweet contents. For example, if "tweet1" is an instance of
Tweet, one could do tweet1["id"] to get the id of that tweet, or
tweet1["user"]["name"] to get the name of the person that published the
tweet.
Credits Credits
------- -------
Creator: Sergio
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template. This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
......
...@@ -8,4 +8,5 @@ coverage==4.5.1 ...@@ -8,4 +8,5 @@ coverage==4.5.1
Sphinx==1.8.1 Sphinx==1.8.1
twine==1.12.1 twine==1.12.1
# App requirements
tweetmanager-serpucga==1.1.4
...@@ -44,12 +44,12 @@ setup( ...@@ -44,12 +44,12 @@ setup(
long_description=readme + '\n\n' + history, long_description=readme + '\n\n' + history,
include_package_data=True, include_package_data=True,
keywords='tweet_model', keywords='tweet_model',
name='tweet_model', name='tweet_model_serpucga',
packages=find_packages(include=['tweet_model']), packages=find_packages(include=['tweet_model']),
setup_requires=setup_requirements, setup_requires=setup_requirements,
test_suite='tests', test_suite='tests',
tests_require=test_requirements, tests_require=test_requirements,
url='https://github.com/Serbaf/tweet_model', url='https://github.com/Serbaf/tweet_model',
version='0.2.0', version='0.3.0',
zip_safe=False, zip_safe=False,
) )
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
"""Main module.""" """Main module."""
import sys
from tweet_manager.lib import format_csv
class Tweet(): class Tweet():
""" """
...@@ -352,3 +356,50 @@ class Tweet(): ...@@ -352,3 +356,50 @@ class Tweet():
def __getitem__(self, key): def __getitem__(self, key):
return getattr(self, key) return getattr(self, key)
def get_tweets_from_csv(csv_file):
"""
Take one argument: a path pointing to a valid CSV file.
The function reads the file, which should be a collection of tweets with a
header indicating the tweet fields (user.id, place.bounding_box.type,
etc.), and instances a new Tweet object for each of the lines in the CSV
file, assigning each value in the CSV to the corresponding Tweet attribute.
Returns a list of the Tweet objects instanced.
"""
tweets = []
with open(csv_file, 'r') as csv_object:
header = csv_object.readline()
body = csv_object.readlines()
header = format_csv.split_csv_line(header)
# Check that the header contains valid fields
test_tweet = Tweet()
for field in header:
field_components = field.split(".")
checking_dict = test_tweet.__dict__
error_string = ""
for component in field_components:
error_string += component
if (checking_dict is None) or (component not in checking_dict):
print('The field in the header "' + error_string + '" is ' +
'not a valid element of a Tweet')
sys.exit(1)
checking_dict = checking_dict[component]
error_string += "."
# Go through every tweet in the file, instance it using the 'Tweet' class
# and add it to the list 'tweets'
for j in range(len(body)):
body[j] = format_csv.split_csv_line(body[j])
tweet_contents = {}
for i in range(len(body[j])):
if body[j][i] != '':
tweet_contents[header[i].replace(".", "__")] = body[j][i]
tweets.append(Tweet(**tweet_contents))
return tweets
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment