Who's really worth following on Twitter?

I’ve been digging into some of my ideas related to Twitter as of late, and finally sat down on Sunday (while watching playoff football) and hacked out a little something I could share with people (all the other things related to Twitter that I’ve been hacking do not yet have any sort of front end for users). What I came up with is a simple word cloud generated from a user’s tweet history (and some other little extras)…you can play with it on www.halfbite.com. Anyway, once I had a little something working that I felt was worth sharing I sent around some emails to tell a few people who I thought might find it interesting as well…what follows is the tail end of one of those email chains. I share it here, because I end up dumping a lot of my plans throughout the chain that I had hoped to eventually write a post about…so here you go:

Hey guys, First thanks for the feedback on the Twitter tool today…just wanted to drop you a note that I just added the ability to mesh the data from multiple users together into one ‘report’. As an example - this link gives a meshed report from the last 200 tweets of innonate, ceonyc, and jschwa: http://www.halfbite.com/groupjive.php?user=innonate,ceonyc,jschwa&sort=freq You can also just use the 2nd form on http://www.halfbite.com main page to generate these sorts of things too… Thanks again for putting up with me!
Nice, Kevin. Why only 200 recent posts? Is that a Twitter API limit?
Yeah - the Twitter API lets you grab up to 200 of the recent tweets for a given user in a single call (I could have made multiple calls per user and I might update the service for that in the future but for now it already runs slower than I would like so I really wanted to limit the number of calls and processing I needed to do)… Plus 200 felt like a reasonable enough data chunk to get a feel for recent trends (as it turns out, 200 tweets seems to really cover at least a month or more of tweets and often covers up to three months for less active people). Up next (when I find time), I’m going to use some of the data this system collects to try and identify 'relevant’ or 'popular’ words across a collection of users…which is more related to the other Twitter project I’m hacking on (this stuff is actually just a small/quick test and proof of concept for the other project)
Couple thoughts… When examining myself, I’m more interested in the long-term trends than short term trends. Also, not sure you need to replicate search.twitter.com’s meme tracking unless you have a really different take. Keep me in the loop!
I agree about analyzing myself - it would be better to go as deep as possible…but my orig. goal for this little tool was to give me a quick overview of what someone else does with Twitter more than tell me what I do with it (ie. when I get a notice that someone new is following me, I want to do a little research on them before I blindly follow them back – otherwise I get spammed too much and my time line gets too noisy to listen to). For the meme tracking - I think my take will be a little different than what anyone else is doing yet because my plan is to go deeper and go beyond a simple word count, take into account a few other data points, and also allow you to put a filter or lens on it (because often I’m more interested in things in my little world than I am across all of Twitter. ie. what things are you, Charlie, etc. hot on right now? Rather than what things are all Twitter users hot on? They probably overlap a lot, but the former question is much more interesting to me in all cases.) [from what I can tell, twitter’s search thing appears to do a simple word count across all tweets for some given period of time or just show you what people are searching on the most right now and then show you those words - def. interesting, but really not all that helpful for the type of questions I want to have answered] Honestly I’m not sure if anyone else is doing this stuff yet because I don’t know everything everyone else is working on related to Twitter – too many things to keep track of! So I just play with it as it interests me and worst case, I get my own set of tools and knowledge out of it all right? :-) Anyway, to give you a little more 'inside’ information here’s the working elevator pitch on the related 'big idea’ I’m working on (so far it’s me and another tech. friend working on it VERY part time - and I’ve shared most of these details with Charlie, Darren Herman, and a handful of others for some quick feedback thoughts): In a sense, it’s a bit like building a new google but instead of indexing all the world’s data, we would be indexing just the data people think is worth talking about right now… So less inclusive, more immediate…less about direct search results, and more about topics and introducing people to other people and data that they may be interested in but not yet know about… For a little more detail…here is a list of the 'what’ or general 'how’ we plan to do it: 1. Collect Tweets that have URIs listed in them from twitter’s search API 2. Index the pages that those URI’s resolve to gathering a set of keywords (and some other behind the scenes data used for search optimization) 3. Using the collected data set, tie tweets and users together to form 'topic’ pages and calculate rankings based on who tweeted (# followers exposed to the URI/Topic), how often the URI/Topic was tweeted about (ie. how many different people mention the various links for this given 'topic’), and how relevant a given keyword is to a 'topic’. 4. Possibly pass the collected URI through Digg, Delicious, Alexa, Compete, etc. to gather additional metrics about the web popularity of a given URI/topic. 5. On topic views, besides showing what tweets were mentioned, listing out the URI (and counts on how many times listed), we also list out 'related topics and URI’ based on keyword analysis and Twitter users who appear to be interested in this 'topic’ who might be worth following… Overall, the beginning is a bit like what Twitturly.com is doing and what flaptor.com is doing but I think it goes a lot further than either of those efforts are so far…in that we don’t just say what are the most popular links being mentioned on Twitter right now, we actually analyze the pages those URLs point to (it doesn’t appear that most other systems are going beyond the actual tweet text yet), and so we also list (and take into the ranking calculations) URI and topics that are direct matches and others that are either 'on topic’ or closely related to topic… Again a lot like Google based it’s initial theory on the value of a link, we are basing a lot of our initial theory on the value of a tweet (with a link)…and then we are using a bunch of open data points, custom algorithms, and network effects to produce what we *think*/*hope* will be some really interesting knowledge… At least that’s the working theory so far…we’ll see if/what comes out of it… BTW - sorry about the length of the email (if I had more time I would edit it to be shorter and make more sense!)…oh and thanks for being interested. Let me know if you have any ideas, feedback, red flags, or whatever about any of this! :)

This post has received 39 loves.


This is the personal blog of Kevin Marshall (a.k.a Falicon) where he often digs into side projects he's working on for digdownlabs.com and other random thoughts he's got on his mind.

Kevin has a day job as CTO of Veritonic and is spending nights & weekends hacking on Share Game Tape. You can also check out some of his open source code on GitHub or connect with him on Twitter @falicon or via email at kevin at falicon.com.

If you have comments, thoughts, or want to respond to something you see here I would encourage you to respond via a post on your own blog (and then let me know about the link via one of the routes mentioned above).