A few MongoDB commands you just gotta know...

I’m a massive MongoDB fanboy. So I thought I would take some time today to share a few of the key mongo commands that I use every day in my work…

DBQuery.shellBatchSize = 300

This handy little command lets you change the number of results that are returned by your mongo find statements (20 is default). There are a lot of times when I need/want to quickly review more than 20 records so I use this command all the time.

By the way, you can mess with config. settings so that the default is larger than 20; but I prefer to just do it ad-hoc as the number of records I *want* returned frequently changes.

show collections

This will just show you the list of collections currently defined in the dbs you are connected to (the thing most like tables in a relational database). I use this a lot when I first connect to a dbs just to do a quick sanity check that I’m connected to the dbs I think I am, and to also quickly remind myself of what collections are already here to work with (plus it reminds me if I’m using plurals or not for these collection names).

show dbs

This will just show you what databases are available within this instance of mongo. A lot of people probably run a one-to-one between mongo and their DBS and so they probably don’t need this very much.

For me though, I run mongo on a few different servers and each of those instances actually has a handful of dbs on them (for example, one of my mongo instances currently has dbs for gawk.it , greentile.com , and knowabout.it ).

So I use this command all the time when I first connect to mongo just to see which instance I really connected too (there’s that sanity check again!)…and also to make sure that the dbs I’m expecting to connect to is available.

db.accounts.getIndexKeys()

The secret to a happy mongo experience is ‘indexing’ (BTW - it’s actually the secret to a happy relational database existence too). If you want to find something fast, regardless of how complex your objects actually are, you just need to have a good index.

As I evolve features, or need to search for objects in new ways, I often need to use this command just to see what indexes already exist that I could take advantage of (and I try to stay *very* proactive about managing my indexes).

As an aside designing good indexes is actually a *very* important topic because there are some important tradeoffs that you must consider and plan for. I will try to write more about the specifics behind this topic in the near future.

db.accounts.ensureIndex({'username’:1})

This is the core command to add a new index (ascending on the username field in accounts object this case). Again there are a lot of important things to know about indexes beyond just this basic create command…but assuming you’ve done that work, actually creating them is as simple as this and I use it all the time.

drop vs remove

Especially when I’m prototyping and testing new stuff, I’ll often fill a collection with dummy objects…play with some code…and then need to clear out that dummy data. Other times, I actually need to remove an object from a production system.

Both of these commands can help, but they have a few crucial differences.

First, the remove command can be used to remove a specific set of objects (e.g. db.accounts.remove({'username’:'falicon’}) ) while the drop command is only used to completely remove the collection from the dbs.

Second, the remove command keeps the existing indexes intact while the drop command also kills any existing indexes that were on that collection.

So you might be thinking that remove is clearly better and more useful than drop…and in many, common cases, that is true. However, remove can be *really* slow and generally locks your instance up while it’s processing (because it has to manage the integrity of the indexes) while drop, especially when dealing with large data sets, is wicked fast.

My general rule is that, if the data set is massive and I need to drop (or adjust) a majority of the data it’s often quicker/easier to pull out the data I’m keeping, do a drop, rebuild the indexes, and then re-insert the data that I wanted to keep…especially in a production system that can’t afford the performance hit for a large dataset remove. For almost every other scenario, I default to a remove command.

This post has received 37 loves.


ARCHIVE OF POSTS



This is the personal blog of Kevin Marshall (a.k.a Falicon) where he often digs into side projects he's working on for digdownlabs.com and other random thoughts he's got on his mind.

Kevin has a day job as CTO of Veritonic and is spending nights & weekends hacking on Share Game Tape. You can also check out some of his open source code on GitHub or connect with him on Twitter @falicon or via email at kevin at falicon.com.

If you have comments, thoughts, or want to respond to something you see here I would encourage you to respond via a post on your own blog (and then let me know about the link via one of the routes mentioned above).