Number of tweets in different languages posted around Germany |
Collecting the geocoded tweets
Using this script I collected approx 1.3 Mio tweets in a weeks. The tweets are stored one line per tweet and one file per hour e.g. 2013-05-21T19_51_03.json. The content of the file would look like:{"created_at":"Tue May 21 17:51:09 +0000 2013","id":336901993555709952,"id_str":"336901993555709952","text":"@OmegaBlue69 ... {"created_at":"Tue May 21 17:51:10 +0000 2013","id":336901996680450048,"id_str":"336901996680450048","text":"Sweet1 ....
Handling the json-file
The first task extracts the relevant information from these files. The following script reads the json files line by line and writes the coordinates, languages and for each tweet to a text-file e.g. "2013-05-21T19_51_03.coords.txt" using rjson.
Putting it all together
The next script picks up all text files with coordinate information, merges infrequent levels and does the color-coding. Finally it creates a simple barplot and stores the data in a data.frame all.data and the colors in a vector colsIn another blog I describe how this data can be used to create a zoomable map.
No comments:
Post a Comment