You want me to schedule what?

OK - my weekend of fun was, well, fun! But my friends have all gone home now and it’s time to get back to work. Tops on my plate right now (or at least tops on my brain) is cleaning up the last few important parts to have a real beta version of fubnub.com. As I saw it, to achieve this goal I had to get over one big road block: The ability to schedule your feeds for automatic fubnub'ing. Actually, I’ve had this challenge on my to-do list from the start of the project, but I kept putting it off because my brain was thinking it was going to be a fairly annoying thing to implement. When I say annoying, I mean in the sense that it was going to take some solid design and serious thought before I sat at my keyboard and starting hacking. That is, if I wanted to create a solution that had the potential to scale over time. Here’s what I came up with: 1. Ultimately a scheduled task will run once an hour to fubnub any feeds that should be. Since I’m doing the backend of the rest of this project in Perl, I decided that this scheduled task should also be a Perl program. The biggest challenge with this step is actually providing information to the users about how the system is works in the backend (which I haven’t done yet). The trick is that users can only have feeds published by the hour, the day, the week, or the month…so the first time you publish, you might say to publish every hour or month but it may publish the first feed almost immediately (because you have never published and you just set it up before the program started to run). After the first time the feed gets processed, it should then be on the proper schedule and so I’m hoping it’s not a big deal. But it may create some confusion for people as they get started. Knowing that up front allows me to prepare for handling all the confusion. 2. To actually accomplish scheduled posting, I created one table that stores the details about what feed you want to have automatically posted. This table includes information about what the interval for posting should be as well as when the last time the feed was published. The user populates this table via the web site when they set up the schedule, but it’s really the Perl program in the first step that uses this table. Here’s a brief rundown of the table structure:

scheduled_id - int scheduled_title - varchar scheduled_frequency - varchar scheduled_time - int account_id - int scheduled_lastpublished - datetime delivery_id - int

The trick here is that, to make things ready to scale, I really wanted to limit the number of database calls my Perl program has to make. The easy solution of course would be to first grab all the records, pulling out the interval and determining when the next time a feed should be published is. Then doing a second query pulling out just the details of those records. That’s an easy solution, but it really doesn’t scale for a couple of reasons. First, you have to deal with every single record in the database - many of which won’t be ready for processing this time through the program (and imagine you have a million plus users!). Second, you’ve got to make at least two database calls. And the more database calls you have to make, the harder it is to scale your system. So I wanted to be able to come up with a query that pulls up the details of which feeds need published (and only those feeds) all in one query. And since I control the format and details of what data actually gets into the table (by limiting the options the user has on the front end), I can use that inside knowledge to code against certain expected values. In the end, it took a little bit of brain power and use of some slightly more complex T-SQL than your day-to-day stuff, but here’s the query I came up with:

select   scheduled_id,   scheduled_title,   scheduled.account_id,   scheduled_lastpublished,   command from   scheduled,   delivery,   commands where   case lower(scheduled_frequency)     when ‘hours’ then dateadd(hour, -scheduled_time, getdate())     when 'days’ then dateadd(day, -scheduled_time, getdate())     when 'weeks’ then dateadd(week, -scheduled_time, getdate())     when 'months’ then dateadd(month, -scheduled_time, getdate())   end > isnull(scheduled_lastpublished, dateadd(month, -50, getdate())) and   delivery.delivery_id = scheduled.delivery_id and   delivery.feedtype = commands.feedtype

If you break down the query, you see that it really was a simple matter of a CASE statement, the DATEADD statement, and a ISNULL statement. With a few joins thrown in to actually get at all the data I needed (details of that are omitted from this post because they aren’t relevant to this topic right now) As I mentioned above, I know the small set of intervals a user can pick (hours, days, weeks, months). This means I can hard code against that. I use the user defined time range with the DATEADD feature to move back in time to the appropriate spot. Once I’m there, I just need to see if it’s still more recent than the last published date. Next, since the first time you publish there is no last published date, I use the ISNULL statement to move to a time that will always be less than what a user could pick (in this case 50 months ago). This way, a feed will always get published the first time the program sees the feed is scheduled. The rest of the Perl program I mentioned in step one simply kicks off the fubnub'ing of the feeds and updates the scheduled table as needed (eventually using threads to publish, and wrapping the update calls into as few queries as possible). Once again it’s really that simple - much easier than my brain thought it was going to be! You just need to have that inside knowledge of your system, your data, and think at least a little!

This post has received 39 loves.


ARCHIVE OF POSTS



This is the personal blog of Kevin Marshall (a.k.a Falicon) where he often digs into side projects he's working on for digdownlabs.com and other random thoughts he's got on his mind.

Kevin has a day job as CTO of Veritonic and is spending nights & weekends hacking on Share Game Tape. You can also check out some of his open source code on GitHub or connect with him on Twitter @falicon or via email at kevin at falicon.com.

If you have comments, thoughts, or want to respond to something you see here I would encourage you to respond via a post on your own blog (and then let me know about the link via one of the routes mentioned above).