Introduction to MongoDB

Introduction

MongoDB is a noSQL database and in this document, we will be working through a datacamp course of introduction to Mongodb and provide a brief introduction to mongodb. This is the first part of the series of post where I summarize the findings from the datacamp course.

what is NoSQL?

NoSQL stands for not sql; i.e. non-relational databases. Including MongoDB, graphql and other databases, non-relational databases are results of big data and non-relational data models which model the data in a relation less manner. the common ways are graph node-edge systems, key-value pairs(mongodb falls in this) and others. NoSQL are good for scaling up with huge load of data and they store data in non-tabular formats unlike sql and other relational database systems.

MongoDB basics:

What is MongoDB?

Mongodb is one of the free, nosql database systems; which is scalable from its core, and provides a json format data to easily interact, query and plug with your applications.

How does a mongodb database look like?

Mongodb stores data as json documents. The structure it follows somehow is like a nested dictionary in python.

One mongodb data base contains a number of collections, which collections in turn contains documents, subdocuments which in turn consists of records.

You can think of a python equivalent of a mongodb database as described in this picture below( credits: datacamp).

mongodb json python comparison image. contains information about mongodb, python, json comparison. it is a picture for introduction to mongodb blogpost.

To access a database, one has to first connect a mongodb database via a db client. We can access the databases under this client as dictionaries; i.e. say if we have a client named client and a database named my_database, then we can access the database using client['my_database'] or client.my_database.

how to access a collection under the database in mongodb?

As we can see, collections are like lists. But as it is not exactly a list, we can't just index it; rather for a collection my_collection from my_database, we will access it as my_database.my_collection.

Basic usage:

Now, the client is like a dictionary of databases and collections are like lists. But still we can't use direct keys etc to get their names. To do that, we will have to use list_database_names() and list_collection_names().

For getting the names of databases from the client, we have to use the following syntax:

database_list = client.list_database_names()

and for getting the names of collections from the database, we have to use the following syntax:

collection_list = client.my_database.list_collection_names()

Now, the next thing in line is to extract collections based on filters. To extract one collection based on some filter, we can use the following syntax:

collection = client.database.find_one(filter_condition)

What are filters?

filters are basically conditions based on which you want to search the dataset. For example, if we have a dataset of footballers with their body features like height, weight, speed, their player features like goals, passes etc. Then you may want to see player documents based on these features. then these will become filtering conditions.

How to provide filter conditions in mongodb?

In mongodb, you have to provide filter conditions in form of json format. In the current datacamp course, we have a noble prize database which we work on through out the course. In this, database, a sample filter condition will be, find someone with surname rontgen. for that the filter will look like:

{"surname":"rontgen"}

and we can use it to find the document like below:

docs = client.database.find_one({"surname":"rontgen"})

which will give us the william rontgen's record of getting noble prize for discovery of x-ray.

We will learn more details about filters in part 2 of mongodb series.

20 Must-Know Math Puzzles for Data Science Interviews: Test Your Problem-Solving Skills

Introduction: When preparing for a data science interview, brushing up on your coding and statistical knowledge is crucial—but math puzzles also play a significant role. Many interviewers use puzzles to assess how candidates approach complex problems, test their logical reasoning, and gauge their problem-solving efficiency. These puzzles are often designed to test not only your knowledge of math but also your ability to think critically and creatively. Here, we've compiled 20 challenging yet exciting math puzzles to help you prepare for data science interviews. We’ll walk you through each puzzle, followed by an explanation of the solution. 1. The Missing Dollar Puzzle Puzzle: Three friends check into a hotel room that costs $30. They each contribute $10. Later, the hotel realizes there was an error and the room actually costs $25. The hotel gives $5 back to the bellboy to return to the friends, but the bellboy, being dishonest, pockets $2 and gives $1 back to each friend. No...

Spacy errors and their solutions

Introduction: There are a bunch of errors in spacy, which never makes sense until you get to the depth of it. In this post, we will analyze the attribute error E046 and why it occurs. (1) AttributeError: [E046] Can't retrieve unregistered extension attribute 'tag_name'. Did you forget to call the set_extension method? Let's first understand what the error means on superficial level. There is a tag_name extension in your code. i.e. from a doc object, probably you are calling doc._.tag_name. But spacy suggests to you that probably you forgot to call the set_extension method. So what to do from here? The problem in hand is that your extension is not created where it should have been created. Now in general this means that your pipeline is incorrect at some level. So how should you solve it? Look into the pipeline of your spacy language object. Chances are that the pipeline component which creates the extension is not included in the pipeline. To check the pipe eleme...

GAM model : PyGAM package details Analysis and possible issue resolving

Introduction: picture credit to peter laurinec. I have been studying about PyGAM package for last couple of days. Now, I am planning to thoroughly analyze the code of PyGAM package with necessary description of GAM model and sources whenever necessary. This is going to be a long post and very much technical in nature. Pre-requisites: For understanding the coding part of PyGAM package, first you have to learn what is a GAM model. GAM stands for generalized additive model, i.e. it is a type of statistical modeling where a target variable Y is roughly represented by additive combination of set of different functions. In formula it can be written as: g(E[Y]) = f 1 (x 1 ) + f 2 (x 2 ) + f 3 (x 3 ,x 4 )+...etc where g is called a link function and f are different types of functions. In technical terms, in GAM model, theoretically expectation of the link transformed target variable is assume...

Machine learning and statistics with python

Search This Blog