scheduled_id - int scheduled_title - varchar scheduled_frequency - varchar scheduled_time - int account_id - int scheduled_lastpublished - datetime delivery_id - int
The trick here is that, to make things ready to scale, I really wanted to limit the number of database calls my Perl program has to make. The easy solution of course would be to first grab all the records, pulling out the interval and determining when the next time a feed should be published is. Then doing a second query pulling out just the details of those records. That’s an easy solution, but it really doesn’t scale for a couple of reasons. First, you have to deal with every single record in the database - many of which won’t be ready for processing this time through the program (and imagine you have a million plus users!). Second, you’ve got to make at least two database calls. And the more database calls you have to make, the harder it is to scale your system. So I wanted to be able to come up with a query that pulls up the details of which feeds need published (and only those feeds) all in one query. And since I control the format and details of what data actually gets into the table (by limiting the options the user has on the front end), I can use that inside knowledge to code against certain expected values. In the end, it took a little bit of brain power and use of some slightly more complex T-SQL than your day-to-day stuff, but here’s the query I came up with:select scheduled_id, scheduled_title, scheduled.account_id, scheduled_lastpublished, command from scheduled, delivery, commands where case lower(scheduled_frequency) when ‘hours’ then dateadd(hour, -scheduled_time, getdate()) when 'days’ then dateadd(day, -scheduled_time, getdate()) when 'weeks’ then dateadd(week, -scheduled_time, getdate()) when 'months’ then dateadd(month, -scheduled_time, getdate()) end > isnull(scheduled_lastpublished, dateadd(month, -50, getdate())) and delivery.delivery_id = scheduled.delivery_id and delivery.feedtype = commands.feedtype
If you break down the query, you see that it really was a simple matter of a CASE statement, the DATEADD statement, and a ISNULL statement. With a few joins thrown in to actually get at all the data I needed (details of that are omitted from this post because they aren’t relevant to this topic right now) As I mentioned above, I know the small set of intervals a user can pick (hours, days, weeks, months). This means I can hard code against that. I use the user defined time range with the DATEADD feature to move back in time to the appropriate spot. Once I’m there, I just need to see if it’s still more recent than the last published date. Next, since the first time you publish there is no last published date, I use the ISNULL statement to move to a time that will always be less than what a user could pick (in this case 50 months ago). This way, a feed will always get published the first time the program sees the feed is scheduled. The rest of the Perl program I mentioned in step one simply kicks off the fubnub'ing of the feeds and updates the scheduled table as needed (eventually using threads to publish, and wrapping the update calls into as few queries as possible). Once again it’s really that simple - much easier than my brain thought it was going to be! You just need to have that inside knowledge of your system, your data, and think at least a little!This post has received 42 loves.
This is the personal blog of Kevin Marshall (a.k.a Falicon) where he often digs into side projects he's working on for digdownlabs.com and other random thoughts he's got on his mind.
Kevin has a day job as CTO of Veritonic and is spending nights & weekends hacking on Share Game Tape. You can also check out some of his open source code on GitHub or connect with him on Twitter @falicon or via email at kevin at falicon.com.
If you have comments, thoughts, or want to respond to something you see here I would encourage you to respond via a post on your own blog (and then let me know about the link via one of the routes mentioned above).