Skip to main content

Introduction to MongoDB

Introduction

MongoDB is a noSQL database and in this document, we will be working through a datacamp course of introduction to Mongodb and provide a brief introduction to mongodb. This is the first part of the series of post where I summarize the findings from the datacamp course.

what is NoSQL?

NoSQL stands for not sql; i.e. non-relational databases. Including MongoDB, graphql and other databases, non-relational databases are results of big data and non-relational data models which model the data in a relation less manner. the common ways are graph node-edge systems, key-value pairs(mongodb falls in this) and others. NoSQL are good for scaling up with huge load of data and they store data in non-tabular formats unlike sql and other relational database systems.

MongoDB basics:

What is MongoDB?

Mongodb is one of the free, nosql database systems; which is scalable from its core, and provides a json format data to easily interact, query and plug with your applications.

How does a mongodb database look like?

Mongodb stores data as json documents. The structure it follows somehow is like a nested dictionary in python. 

One mongodb data base contains a number of collections, which collections in turn contains documents, subdocuments which in turn consists of records. 

You can think of a python equivalent of a mongodb database as described in this picture below( credits: datacamp).

mongodb json python comparison image. contains information about mongodb, python, json comparison. it is a picture for introduction to mongodb blogpost.
 

To access a database, one has to first connect a mongodb database via a db client. We can access the databases under this client as dictionaries; i.e. say if we have a client named client and a database named my_database, then we can access the database using client['my_database'] or client.my_database.

how to access a collection under the database in mongodb?

As we can see, collections are like lists. But as it is not exactly a list, we can't just index it; rather for a collection my_collection from my_database, we will access it as my_database.my_collection.

Basic usage:

Now, the client is like a dictionary of databases and collections are like lists. But still we can't use direct keys etc to get their names. To do that, we will have to use list_database_names() and list_collection_names(). 

For getting the names of databases from the client, we have to use the following syntax:

database_list = client.list_database_names()

and for getting the names of collections from the database, we have to use the following syntax:

collection_list = client.my_database.list_collection_names()

Now, the next thing in line is to extract collections based on filters. To extract one collection based on some filter, we can use the following syntax:

collection = client.database.find_one(filter_condition)

What are filters?

filters are basically conditions based on which you want to search the dataset. For example, if we have a dataset of footballers with their body features like height, weight, speed, their player features like goals, passes etc. Then you may want to see player documents based on these features. then these will become filtering conditions. 

How to provide filter conditions in mongodb?

In mongodb, you have to provide filter conditions in form of json format. In the current datacamp course, we have a noble prize database which we work on through out the course. In this, database, a sample filter condition will be, find someone with surname rontgen. for that the filter will look like:

{"surname":"rontgen"}

and we can use it to find the document like below:

docs = client.database.find_one({"surname":"rontgen"})

which will give us the william rontgen's record of getting noble prize for discovery of x-ray. 

We will learn more details about filters in part 2 of mongodb series.

Comments

Popular posts from this blog

Mastering SQL for Data Science: Top SQL Interview Questions by Experience Level

Introduction: SQL (Structured Query Language) is a cornerstone of data manipulation and querying in data science. SQL technical rounds are designed to assess a candidate’s ability to work with databases, retrieve, and manipulate data efficiently. This guide provides a comprehensive list of SQL interview questions segmented by experience level—beginner, intermediate, and experienced. For each level, you'll find key questions designed to evaluate the candidate’s proficiency in SQL and their ability to solve data-related problems. The difficulty increases as the experience level rises, and the final section will guide you on how to prepare effectively for these rounds. Beginner (0-2 Years of Experience) At this stage, candidates are expected to know the basics of SQL, common commands, and elementary data manipulation. What is SQL? Explain its importance in data science. Hint: Think about querying, relational databases, and data manipulation. What is the difference between WHERE ...

Spacy errors and their solutions

 Introduction: There are a bunch of errors in spacy, which never makes sense until you get to the depth of it. In this post, we will analyze the attribute error E046 and why it occurs. (1) AttributeError: [E046] Can't retrieve unregistered extension attribute 'tag_name'. Did you forget to call the set_extension method? Let's first understand what the error means on superficial level. There is a tag_name extension in your code. i.e. from a doc object, probably you are calling doc._.tag_name. But spacy suggests to you that probably you forgot to call the set_extension method. So what to do from here? The problem in hand is that your extension is not created where it should have been created. Now in general this means that your pipeline is incorrect at some level.  So how should you solve it? Look into the pipeline of your spacy language object. Chances are that the pipeline component which creates the extension is not included in the pipeline. To check the pipe eleme...

What is Bort?

 Introduction: Bort, is the new and more optimized version of BERT; which came out this october from amazon science. I came to know about it today while parsing amazon science's news on facebook about bort. So Bort is the newest addition to the long list of great LM models with extra-ordinary achievements.  Why is Bort important? Bort, is a model of 5.5% effective and 16% total size of the original BERT model; and is 20x faster than BERT, while being able to surpass the BERT model in 20 out of 23 tasks; to quote the abstract of the paper,  ' it obtains performance improvements of between 0 . 3% and 31%, absolute, with respect to BERT-large, on multiple public natural language understanding (NLU) benchmarks. ' So what made this achievement possible? The main idea behind creation of Bort is to go beyond the shallow depth of weight pruning, connection deletion or merely factoring the NN into different matrix factorizations and thus distilling it. While methods like know...