It’s been a really long time since I’ve posted any actual code up here…so I figured I would share a little snippet that I’ve been using quite a bit in the knowabout.it processing stuff lately. One of the things that happens a lot on the backend is indexing links or users…to save on processing, I don’t want to just loop through every user ever time the process runs. I only want to loop through those users that have links waiting to be indexed. Solving that problem is actually pretty easy because Mongo has a ‘distinct’ command that you can use to get a distinct list of results (in this case, I get a distinct list of users that have links waiting to be indexed). The next issue that we run into though is that the distinct command relies on an index, and that index will cause the results to be sorted (in the order that you defined the index – so either ascending or descending). In a lot of cases this is fine, but in my case it ends up meaning that every time I want to index links for my users the people who are at the start of the alphabet get their work done first. And again, not a huge deal, but if the process takes a long time to run (which it does the more and more users we acquire) the longer and longer the users at the end have to wait for a turn (technically this isn’t entirely true as I do thread much of this work - but even with threading the initial set of threads are the people at the start of the alphabet). So anyway - what I wanted was a unique list of users from a mongo collection, sorted in a random order…and that’s basically what this chunk of code does:I’m sure there are lots of ways to do this better (and be more efficient), but for now it’s a really nice, lightweight way for me to get a list of records in a random order. Note: If you don’t need to use the 'distinct’ command, and all you want is a random set of records from Mongo is some random order, the recommneded way would be to assign a random number to each document in your collection…put an index on that, and just sort based on it when querying (and I actually do use this approach quite successfully for other parts of knowabout.it). The reason I needed the above code is because, logically, you can’t combine a 'distinct’ call with a 'random’ function…at least not while expecting reliable, unique results.
This post has received 45 loves.
Kevin has a day job as CTO of Veritonic and is spending nights & weekends hacking on Share Game Tape. You can also check out some of his open source code on GitHub or connect with him on Twitter @falicon or via email at kevin at falicon.com.
If you have comments, thoughts, or want to respond to something you see here I would encourage you to respond via a post on your own blog (and then let me know about the link via one of the routes mentioned above).