Skip to main content

Introduction to MongoDB

Introduction

MongoDB is a noSQL database and in this document, we will be working through a datacamp course of introduction to Mongodb and provide a brief introduction to mongodb. This is the first part of the series of post where I summarize the findings from the datacamp course.

what is NoSQL?

NoSQL stands for not sql; i.e. non-relational databases. Including MongoDB, graphql and other databases, non-relational databases are results of big data and non-relational data models which model the data in a relation less manner. the common ways are graph node-edge systems, key-value pairs(mongodb falls in this) and others. NoSQL are good for scaling up with huge load of data and they store data in non-tabular formats unlike sql and other relational database systems.

MongoDB basics:

What is MongoDB?

Mongodb is one of the free, nosql database systems; which is scalable from its core, and provides a json format data to easily interact, query and plug with your applications.

How does a mongodb database look like?

Mongodb stores data as json documents. The structure it follows somehow is like a nested dictionary in python. 

One mongodb data base contains a number of collections, which collections in turn contains documents, subdocuments which in turn consists of records. 

You can think of a python equivalent of a mongodb database as described in this picture below( credits: datacamp).

mongodb json python comparison image. contains information about mongodb, python, json comparison. it is a picture for introduction to mongodb blogpost.
 

To access a database, one has to first connect a mongodb database via a db client. We can access the databases under this client as dictionaries; i.e. say if we have a client named client and a database named my_database, then we can access the database using client['my_database'] or client.my_database.

how to access a collection under the database in mongodb?

As we can see, collections are like lists. But as it is not exactly a list, we can't just index it; rather for a collection my_collection from my_database, we will access it as my_database.my_collection.

Basic usage:

Now, the client is like a dictionary of databases and collections are like lists. But still we can't use direct keys etc to get their names. To do that, we will have to use list_database_names() and list_collection_names(). 

For getting the names of databases from the client, we have to use the following syntax:

database_list = client.list_database_names()

and for getting the names of collections from the database, we have to use the following syntax:

collection_list = client.my_database.list_collection_names()

Now, the next thing in line is to extract collections based on filters. To extract one collection based on some filter, we can use the following syntax:

collection = client.database.find_one(filter_condition)

What are filters?

filters are basically conditions based on which you want to search the dataset. For example, if we have a dataset of footballers with their body features like height, weight, speed, their player features like goals, passes etc. Then you may want to see player documents based on these features. then these will become filtering conditions. 

How to provide filter conditions in mongodb?

In mongodb, you have to provide filter conditions in form of json format. In the current datacamp course, we have a noble prize database which we work on through out the course. In this, database, a sample filter condition will be, find someone with surname rontgen. for that the filter will look like:

{"surname":"rontgen"}

and we can use it to find the document like below:

docs = client.database.find_one({"surname":"rontgen"})

which will give us the william rontgen's record of getting noble prize for discovery of x-ray. 

We will learn more details about filters in part 2 of mongodb series.

Comments

Popular posts from this blog

Spacy errors and their solutions

 Introduction: There are a bunch of errors in spacy, which never makes sense until you get to the depth of it. In this post, we will analyze the attribute error E046 and why it occurs. (1) AttributeError: [E046] Can't retrieve unregistered extension attribute 'tag_name'. Did you forget to call the set_extension method? Let's first understand what the error means on superficial level. There is a tag_name extension in your code. i.e. from a doc object, probably you are calling doc._.tag_name. But spacy suggests to you that probably you forgot to call the set_extension method. So what to do from here? The problem in hand is that your extension is not created where it should have been created. Now in general this means that your pipeline is incorrect at some level.  So how should you solve it? Look into the pipeline of your spacy language object. Chances are that the pipeline component which creates the extension is not included in the pipeline. To check the pipe eleme...

Mastering SQL for Data Science: Top SQL Interview Questions by Experience Level

Introduction: SQL (Structured Query Language) is a cornerstone of data manipulation and querying in data science. SQL technical rounds are designed to assess a candidate’s ability to work with databases, retrieve, and manipulate data efficiently. This guide provides a comprehensive list of SQL interview questions segmented by experience level—beginner, intermediate, and experienced. For each level, you'll find key questions designed to evaluate the candidate’s proficiency in SQL and their ability to solve data-related problems. The difficulty increases as the experience level rises, and the final section will guide you on how to prepare effectively for these rounds. Beginner (0-2 Years of Experience) At this stage, candidates are expected to know the basics of SQL, common commands, and elementary data manipulation. What is SQL? Explain its importance in data science. Hint: Think about querying, relational databases, and data manipulation. What is the difference between WHERE ...

20 Must-Know Math Puzzles for Data Science Interviews: Test Your Problem-Solving Skills

Introduction:   When preparing for a data science interview, brushing up on your coding and statistical knowledge is crucial—but math puzzles also play a significant role. Many interviewers use puzzles to assess how candidates approach complex problems, test their logical reasoning, and gauge their problem-solving efficiency. These puzzles are often designed to test not only your knowledge of math but also your ability to think critically and creatively. Here, we've compiled 20 challenging yet exciting math puzzles to help you prepare for data science interviews. We’ll walk you through each puzzle, followed by an explanation of the solution. 1. The Missing Dollar Puzzle Puzzle: Three friends check into a hotel room that costs $30. They each contribute $10. Later, the hotel realizes there was an error and the room actually costs $25. The hotel gives $5 back to the bellboy to return to the friends, but the bellboy, being dishonest, pockets $2 and gives $1 back to each friend. No...