It’s been a really long time since I’ve posted any actual code up here…so I figured I would share a little snippet that I’ve been using quite a bit in the knowabout.it processing stuff lately.
One of the things that happens a lot on the backend is indexing links or users…to save on processing, I don’t want to just loop through every user ever time the process runs. I only want to loop through those users that have links waiting to be indexed.
Solving that problem is actually pretty easy because Mongo has a ‘distinct’ command that you can use to get a distinct list of results (in this case, I get a distinct list of users that have links waiting to be indexed).
The next issue that we run into though is that the distinct command relies on an index, and that index will cause the results to be sorted (in the order that you defined the index — so either ascending or descending). In a lot of cases this is fine, but in my case it ends up meaning that every time I want to index links for my users the people who are at the start of the alphabet get their work done first.
And again, not a huge deal, but if the process takes a long time to run (which it does the more and more users we acquire) the longer and longer the users at the end have to wait for a turn (technically this isn’t entirely true as I do thread much of this work - but even with threading the initial set of threads are the people at the start of the alphabet).
So anyway - what I wanted was a unique list of users from a mongo collection, sorted in a random order…and that’s basically what this chunk of code does:
use MongoDB;
# define the array we are going to store usernames in
my @accounts = ();
# create a connection to Mongo
my $conn = MongoDB::Connection->new(‘YOUR_CONNECTION_HOST’,{});
# state what mongo instance we want to work with
my $db = $conn->my_test_mongo;
# get a distinct list of users from our collection
my $users = $db->run_command([‘distinct’ => ‘some_collection_that_has_users’, ‘key’ => ‘user’]);
# run_command returns a Hash (where ‘values’ contains the data we want to get at)
while (my ($key, $value) = each(%$users)) {
if ($key =~ /values/) {
foreach my $val (@$value) {
# we are going to use a random number to sort the array (so it can be diff order every time)
$newval = rand() . “||” . $val;
push(@accounts, $newval);
}
}
}
# not really required, but let’s clean up any dangling connection stuff
undef $users;
undef $db;
undef $conn;
# now we have an array of users, sort it (which is based on random numbers)
@accounts = sort(@accounts);
# finally we can loop through our sorted array and do something with each user
foreach my $account (@accounts) {
# we have to get rid of the stuff we used to sort the array (so we just have the user we want to process)
@tmp = split(/||/, $account);
print $tmp[1] . ‘\n’;
}
I’m sure there are lots of ways to do this better (and be more efficient), but for now it’s a really nice, lightweight way for me to get a list of records in a random order.
Note: If you don’t need to use the ‘distinct’ command, and all you want is a random set of records from Mongo is some random order, the recommneded way would be to assign a random number to each document in your collection…put an index on that, and just sort based on it when querying (and I actually do use this approach quite successfully for other parts of knowabout.it). The reason I needed the above code is because, logically, you can’t combine a ‘distinct’ call with a ‘random’ function…at least not while expecting reliable, unique results.