Practical MongoDB in 10 minutes


Practical MongoDB in 10 minutes

Like MySQL, MongoDB can contain multiple databases, but instead of tables they contain a "collection". Collection - is similar to a table, but without the columns. Instead, each row contains a set of records in the form of a key: value.

Installing MongoDB on Linux

  1. Add a repository key of  MongoDB:

    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
  2. Add a repository MongoDB:

    echo 'deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen' | sudo tee  /etc/apt/sources.list.d/mongodb.list
  3. Update the packages of a repository:

    sudo apt-get update
  4. Install MongoDB:

    sudo apt-get install mongodb-10gen
  5. Run MongoDB:

    sudo /etc/init.d/mongodb start
  6. Test MongoDB:

    mongo
    db.test.save
    db.test.find()

Congratulations! It is all!

Virtual Machines for data science

Start with MongoDB.

Create database.

Call

db.help

to see full method, which we can use in mongo.

Let’s create our first database:

use tutorial

Than we add new data with command, insert:

db.unicorns.insert({name: 'Aurora', gender: 'f', weight: 450})

To see what  data we have in database run:

db.unicorns.find()

In this way we can add many lists and work with them.

Load files.

To show MongoDB in work we load ready data. Here we are using zipcodes collection  that contains zip code data from the US.

We are using “mongoimport” to load file:

mongoimport –d zipcode –c code …/zips.json

where d – database name, c – collection name

Now we are loading mongo and go to our database:

mongo localhost/zipcode
db
db.code.find

In this step we can see lists in collection code.

  • The _id field holds the zip code as a string.
  • The city field holds the city name (a city can have more than one zip code associated with it as different sections of the city, each can have a different zip code.
  • The state field holds the two letter state abbreviation.
  • The pop field holds the population.
  • The loc field holds the location as a latitude longitude pair.

Basic operations.

Let's do a little analysis in this database and answer a series of questions to learn a few basic operations.

  1. Find all Adams cities  in US and order them by state.
    db.code.find({ city: “ADAMS”}).sort({ state: 1})

    (for ascending: -1)

  2. After completing the operation you will see some results like these:
        { "_id" : "47240", "city" : "ADAMS", "loc" : [ -85.473532, 39.331223 ], "pop": 19250, "state" : "IN" }
        { "_id" : "41201", "city" : "ADAMS", "loc" : [ -82.702437, 37.991375 ], "pop" : 2540, "state" : "KY" }
        { "_id" : "01220", "city" : "ADAMS", "loc" : [ -73.117225, 42.622319 ], "pop" : 9901, "state" : "MA" }
    
  3. Find the cities with population more than 2 million grouped by state and sorted by population.

    Here we are using  the aggregate() helper in the mongo shell.

    db.code.aggregate( [   
        { $group : { _id : { city: "$city", state:  "$state"}, totalPop: { $sum: "$pop" } } },
        { $match: { totalPop : { $gte : 2000000 } } },
        {$sort: {"totalPop": 1}}
    ] )
    

    (note: $lt – “<”, $lte – “<=”, $gt – “>”, $gte – “>=”,  $ne – “!=” (not equal)).

    Here $group stage groups data by the city and state field, calculates with help of  the the $sum operator total population for each unique state. Than the $match stage filters this information and gives only those documents whose total Pop value is greater than or equal to 2 million. And in the end $sort stage sorts the results by population and we get results like this one:

    "result" :
        "_id" : {
                 "city" : "HOUSTON",
                 "state" : "TX"
                 },
                 "totalPop" : 2095918
                 },
         ...
         ...
    
  4. Find the cities in state Illinois with population less than 10 thousand, sort by cities’ name by ascending and limit to 5 .
    db.code.aggregate( [  
         { $group : { _id : { city: "$city", state:  "$state"}, totalPop: { $sum: "$pop" } } },
         { $match: { totalPop : { $lte : 10000 } } },
         { $sort: { "_id.city": -1}},
         { $match: { “_id.state”: “IL”}},
         { $limit: 5}  
          ] )
    

    First two results to check yourself:

    "result" : [
       {
       "_id" : {
                 "city" : "ABINGDON",
                 "state" : "IL"
                },
       "totalPop" : 4241
       },
       {
       "_id" : {
                  "city" : "ADAIR",
                  "state" : "IL"
        },
       "totalPop" : 731
        },
    
  5. Return average City Population by State from the smallest number.

    db.code.aggregate( [
         { $group: { _id: { city: "$city", state:  "$state"}, pop: { $sum: "$pop" } } },
         { $group: { _id: "$_id.state", avgCityPop: { $avg: "$pop" } } },
         { $sort: { "avgCityPop": 1}}  
         ] )
    

    And we get the first state with the smallest average population:

    "result" : [
          {
                  "_id" : "ND",
                  "avgCityPop" : 1645.0309278350514
          },
          {
                  "_id" : "SD",
                  "avgCityPop" : 1839.6746031746031
          },
      ….
    
  6. Find Largest and Smallest Cities by State.

    db.zipcodes.aggregate( [
        { $group: { _id: { city: "$city", state:  "$state"}, pop: { $sum: "$pop" } } },
        { $sort: { pop: 1 } },
        { $group:  { _id : "$_id.state",  largestCity:  { $last: "$_id.city" },
                                                             largestPop:   { $last: "$pop" },
                                                             smallestCity: { $first: "$_id.city" },
                                                             smallestPop:  { $first: "$pop" } } }
    ] )
    

    Here we are using two $group stages: the first one  groups data by the city and state field, calculates total population for each state; the second - groups documents, which were sorted $sort stage by population, by the _id.state field and outputs a document for each state. $last expression means the last _id field with the largest  population, and $first – the smallest value.

    Last two results:

    {
       "_id" : "NE",
       "largestCity" : "OMAHA",
       "largestPop" : 358930,
       "smallestCity" : "LAKESIDE",
       "smallestPop" : 5
    },
    {
       "_id" : "WA",
       "largestCity" : "SEATTLE",
       "largestPop" : 520096,
       "smallestCity" : "BENGE",
       "smallestPop" : 2
    },
    

Conclusion

In this rapid introduction to using MongoDB we have looked at:

  • What Mongo is;

  • How to install it;

  • How to do basic operations in MongoDB.

Comments (0)

Add a new comment: