Setting up MongoDB for API data collection
More than once I've found myself wasting time and resources by querying APIs to retrive the same data twice (or thrice or ...). This problem is easily fixed by quickly getting MongoDB to run locally on a Docker container. Not only we get an efficient datastore for API calls, but also the ability to query them easily.
Setting up
Assuming you have Docker running on your computer the first thing you need to do is to pull the image for Mongo.
$ docker pull mongo:4.0-xenial
We will want to persist the data even if the container goes down. To accomplish it, just create a folder in your computer where to point /data/db to. For example:
$ mkdir -p ~/data/mongo
The last step is to start the container mounting the folder we just created; remember to expose Mongo's default port, 27017.
$ docker run -d -v ~/data/mongo:/data/db -p 27017:27017 --name mongodata mongo:4.0-xenial
We can check that everything is running smoothly by doing docker ps
.
Importing some data
I opened an account at OpenWeather to get an API for testing the setup. The following script will connect to the running MongoDB instance using PyMongo, query some data and store it.
Access Mongo shell
Finally, open the Mongo shell,
$ docker exec -it mongodata bash
$ mongo
show the DBs available and query the newly inserted data.
> show dbs
> use weather-database
> show collections
> db.weather.find({})
Outro
Data science projects require often the gathering of large amounts of data via APIs. In less than 5 minutes you can have a real DB up and running instead of dumping everything into files.
Thanks for reading!