Can you get me a quick list of players?

Today’s quick Ruby hack is a script I threw together REAL quick (in about five minutes) to help a friend. Basically, my friend needed a pipe delimited list of NFL players from sportsline.com - I think he is doing something on his local machine with his fantasy roster - but honestly, I don’t know why he really needed this. In any case, what he wanted was the sportsline.com playerid, the player’s name, the player’s position, and the NFL team they are on. As luck would have it, pages like this one have just that easily available. So it’s really a simple case of looping through each letter listing on Sportsline.com and breaking out the data we want. Again, this is a case of a one-off script I’m running local and that doesn’t really need to be all that fast. What this means is I can take a bunch of shortcuts, I can hard code against my data scenarios, and I don’t really have to worry about security… so without further ado, here’s the quick hack in it’s full:

require ‘net/http’ players = File.new(“players.txt”, “w+”) ('A’ .. 'Z’).each do |letter|   begin     response = Net::HTTP.get_response(URI.parse(“http://www.sportsline.com/nfl/playerindex/” + letter))     data = response.body     playerfound = false     while !playerfound do       # grab player details if we can; example - <a href=“/nfl/players/playerpage/187381”> Abraham, John</a> DE, Atlanta Falcons       data =~ /“\/nfl\/players\/playerpage\/(\d+)/is       playerid = $1       data =~ /#{playerid}”> ([a-zA-Z’ .-,]+)/is       playername = $1       data =~ /#{playername}<\/a> ([a-zA-Z]+),/is       position = $1       data =~ / #{position}, ([A-Za-z. ]+)/is       nflteam = $1       line = “#{playerid}|#{playername}|#{position}|#{nflteam}”       if playerid != nil         players.puts line         puts line       end       chopspot = data.index(playerid) + 20       datasize = data.length - chopspot       data = data[chopspot, datasize]       if (!data =~ /“\/nfl\/players\/playerpage\/(\d+)/)         playerfound = true       end       playerid, playername, position, nflteam = ”“,”“,”“,”“     end   rescue   end end
As you can see, it’s very light on comments (and logic really)…so just to break it down a little bit: 1. Sportsline.com lists players who’s last name start with each letter - so we start by looping through the alphabet. 2. We wrap our process in a begin/rescue/end loop just in case we hit a problem on a given page, our program will continue to grab the data for the other letters. 3. We use a simple Net::HTTP call to grab the data for each page. 4. We use a handful of regular expressions to get the data we want out…I could have done these calls/assignments all in one regular expression, but I found it easier to build it up in bits and so I just kept it that way. (In something I was going to spend more time on, I would have purged these down into one regex call) 5. I only write the data out to the file (and screen) if there was a player ID found…this way we ignore any junk lines or false matches. And it’s basically that simple. Here’s the first few lines of the generated file:
405198|Abdullah, Husain|DB|Minnesota Vikings 405208|Abiamiri, Victor|DE|Philadelphia Eagles 187381|Abraham, John|DE|Philadelphia Eagles 395911|Adams, Anthony|NT|Chicago Bears 1614642|Adams, Chester|G|Chicago Bears 12175|Adams, Flozell|T|Dallas Cowboys 405275|Adams, Gaines|DE|Tampa Bay Buccaneers 517269|Adams, Jamar|DB|Seattle Seahawks 1222573|Adams, Michael|DB|Seattle Seahawks

This post has received 41 loves.


ARCHIVE OF POSTS



This is the personal blog of Kevin Marshall (a.k.a Falicon) where he often digs into side projects he's working on for digdownlabs.com and other random thoughts he's got on his mind.

Kevin has a day job as CTO of Veritonic and is spending nights & weekends hacking on Share Game Tape. You can also check out some of his open source code on GitHub or connect with him on Twitter @falicon or via email at kevin at falicon.com.

If you have comments, thoughts, or want to respond to something you see here I would encourage you to respond via a post on your own blog (and then let me know about the link via one of the routes mentioned above).