Maxwell Rules - Setting up MongoDB for API data collection

Setting up MongoDB for API data collection

More than once I've found myself wasting time and resources by querying APIs to retrive the same data twice (or thrice or ...). This problem is easily fixed by quickly getting MongoDB to run locally on a Docker container. Not only we get an efficient datastore for API calls, but also the ability to query them easily.

Setting up

Assuming you have Docker running on your computer the first thing you need to do is to pull the image for Mongo.

$ docker pull mongo:4.0-xenial

We will want to persist the data even if the container goes down. To accomplish it, just create a folder in your computer where to point /data/db to. For example:

$ mkdir -p ~/data/mongo

The last step is to start the container mounting the folder we just created; remember to expose Mongo's default port, 27017.

$ docker run -d -v ~/data/mongo:/data/db -p 27017:27017 --name mongodata mongo:4.0-xenial

We can check that everything is running smoothly by doing docker ps.

Importing some data

I opened an account at OpenWeather to get an API for testing the setup. The following script will connect to the running MongoDB instance using PyMongo, query some data and store it.

Access Mongo shell

Finally, open the Mongo shell,

$ docker exec -it mongodata bash
$ mongo

show the DBs available and query the newly inserted data.

> show dbs
> use weather-database
> show collections
> db.weather.find({})

Outro

Data science projects require often the gathering of large amounts of data via APIs. In less than 5 minutes you can have a real DB up and running instead of dumping everything into files.

Thanks for reading!