Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
T
tweet_model
Project
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Registry
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
serpucga
tweet_model
Commits
47f3ea5d
Commit
47f3ea5d
authored
Mar 20, 2019
by
Serbaf
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Added get_tweets_from_csv() method
parent
e5f2cd6f
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
91 additions
and
4 deletions
+91
-4
HISTORY.rst
HISTORY.rst
+25
-0
README.rst
README.rst
+11
-1
requirements_dev.txt
requirements_dev.txt
+2
-1
setup.py
setup.py
+2
-2
tweet_model.py
tweet_model/tweet_model.py
+51
-0
No files found.
HISTORY.rst
View file @
47f3ea5d
...
...
@@ -6,3 +6,28 @@ History
------------------
* First release on PyPI.
0.2.0 (2019-03-20)
------------------
* Completed the Tweet class that allows the user to make usable instances of a
tweet model. Includes initialization of all the Tweet attributes indicated in
the Twitter documentation (default to None, unless the user provides a value)
and overriding of __getitem__ to provide a dictionary-like access to the
information.
0.3.0 (2019-03-20)
------------------
* Added method "get_tweets_from_csv()", which gets a CSV file as an argument
and returns a list containing as many Tweet objects as lines (minus the
header) in the CSV file. The header of the CSV is used to know which
attributes should be set.
* The method will raise an error and exit if any item in the header does not
match with the specification of the Tweet object (for example, the header
word "media.sizes.thumb.h" would be valid, but "user.lightsaber.color" would
not.
* At this point, the method took 1.75s aprox to read and return the contents of
a 5.7 MB as a list of 'Tweet's. This could be troublesome with very large
collections in a future if the progression of time was proportional with the
file size (estimation would be 25 minutes for a 5 GB file)
README.rst
View file @
47f3ea5d
...
...
@@ -28,10 +28,20 @@ Dashboard project.
Features
--------
* TODO
* A modelization of a tweet in the form of class Tweet. This class contains a
constructor that initializes all the possible tweet attributes to None
except those indicated otherwise.
* The inner objects of a tweet ("user", "entities", "places", etc.) are stored
internally as nested dictionaries.
* The __getitem__() method for Tweet is overriden to allow a dictionary-like
access to the tweet contents. For example, if "tweet1" is an instance of
Tweet, one could do tweet1["id"] to get the id of that tweet, or
tweet1["user"]["name"] to get the name of the person that published the
tweet.
Credits
-------
Creator: Sergio
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
...
...
requirements_dev.txt
View file @
47f3ea5d
...
...
@@ -8,4 +8,5 @@ coverage==4.5.1
Sphinx==1.8.1
twine==1.12.1
# App requirements
tweetmanager-serpucga==1.1.4
setup.py
View file @
47f3ea5d
...
...
@@ -44,12 +44,12 @@ setup(
long_description
=
readme
+
'
\n\n
'
+
history
,
include_package_data
=
True
,
keywords
=
'tweet_model'
,
name
=
'tweet_model'
,
name
=
'tweet_model
_serpucga
'
,
packages
=
find_packages
(
include
=
[
'tweet_model'
]),
setup_requires
=
setup_requirements
,
test_suite
=
'tests'
,
tests_require
=
test_requirements
,
url
=
'https://github.com/Serbaf/tweet_model'
,
version
=
'0.
2
.0'
,
version
=
'0.
3
.0'
,
zip_safe
=
False
,
)
tweet_model/tweet_model.py
View file @
47f3ea5d
# -*- coding: utf-8 -*-
"""Main module."""
import
sys
from
tweet_manager.lib
import
format_csv
class
Tweet
():
"""
...
...
@@ -352,3 +356,50 @@ class Tweet():
def
__getitem__
(
self
,
key
):
return
getattr
(
self
,
key
)
def
get_tweets_from_csv
(
csv_file
):
"""
Take one argument: a path pointing to a valid CSV file.
The function reads the file, which should be a collection of tweets with a
header indicating the tweet fields (user.id, place.bounding_box.type,
etc.), and instances a new Tweet object for each of the lines in the CSV
file, assigning each value in the CSV to the corresponding Tweet attribute.
Returns a list of the Tweet objects instanced.
"""
tweets
=
[]
with
open
(
csv_file
,
'r'
)
as
csv_object
:
header
=
csv_object
.
readline
()
body
=
csv_object
.
readlines
()
header
=
format_csv
.
split_csv_line
(
header
)
# Check that the header contains valid fields
test_tweet
=
Tweet
()
for
field
in
header
:
field_components
=
field
.
split
(
"."
)
checking_dict
=
test_tweet
.
__dict__
error_string
=
""
for
component
in
field_components
:
error_string
+=
component
if
(
checking_dict
is
None
)
or
(
component
not
in
checking_dict
):
print
(
'The field in the header "'
+
error_string
+
'" is '
+
'not a valid element of a Tweet'
)
sys
.
exit
(
1
)
checking_dict
=
checking_dict
[
component
]
error_string
+=
"."
# Go through every tweet in the file, instance it using the 'Tweet' class
# and add it to the list 'tweets'
for
j
in
range
(
len
(
body
)):
body
[
j
]
=
format_csv
.
split_csv_line
(
body
[
j
])
tweet_contents
=
{}
for
i
in
range
(
len
(
body
[
j
])):
if
body
[
j
][
i
]
!=
''
:
tweet_contents
[
header
[
i
]
.
replace
(
"."
,
"__"
)]
=
body
[
j
][
i
]
tweets
.
append
(
Tweet
(
**
tweet_contents
))
return
tweets
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment