Archive for June, 2009
Twitter JSON stream parser
So recently I’ve had occasion to parse the Twitter JSON stream, specifically the spritzer stream for data mining purposes. Turns out this is a pretty difficult problem to solve in most languages. So here’s my Alexandrian solution to this particular Gordian knot, in Bash, because that’s just how I roll.
curl -s --basic --user username:password http://stream.twitter.com/spritzer.json | while read line; do echo "${line}" > temp_tweet ; cat temp_tweet | sed -e 's/=\"/\=\\"/g' | sed -e 's/\">/\\">/g' | ./twitterparse.pl; done
This parses the JSON steam and passes each tweet to a perl script which does the actual parsing.
Hey kids: Don’t do this. It’s bad. If you must, use a tool like jsawk.
comments