Skip to main content

Introduction to MongoDB

Introduction

MongoDB is a noSQL database and in this document, we will be working through a datacamp course of introduction to Mongodb and provide a brief introduction to mongodb. This is the first part of the series of post where I summarize the findings from the datacamp course.

what is NoSQL?

NoSQL stands for not sql; i.e. non-relational databases. Including MongoDB, graphql and other databases, non-relational databases are results of big data and non-relational data models which model the data in a relation less manner. the common ways are graph node-edge systems, key-value pairs(mongodb falls in this) and others. NoSQL are good for scaling up with huge load of data and they store data in non-tabular formats unlike sql and other relational database systems.

MongoDB basics:

What is MongoDB?

Mongodb is one of the free, nosql database systems; which is scalable from its core, and provides a json format data to easily interact, query and plug with your applications.

How does a mongodb database look like?

Mongodb stores data as json documents. The structure it follows somehow is like a nested dictionary in python. 

One mongodb data base contains a number of collections, which collections in turn contains documents, subdocuments which in turn consists of records. 

You can think of a python equivalent of a mongodb database as described in this picture below( credits: datacamp).

mongodb json python comparison image. contains information about mongodb, python, json comparison. it is a picture for introduction to mongodb blogpost.
 

To access a database, one has to first connect a mongodb database via a db client. We can access the databases under this client as dictionaries; i.e. say if we have a client named client and a database named my_database, then we can access the database using client['my_database'] or client.my_database.

how to access a collection under the database in mongodb?

As we can see, collections are like lists. But as it is not exactly a list, we can't just index it; rather for a collection my_collection from my_database, we will access it as my_database.my_collection.

Basic usage:

Now, the client is like a dictionary of databases and collections are like lists. But still we can't use direct keys etc to get their names. To do that, we will have to use list_database_names() and list_collection_names(). 

For getting the names of databases from the client, we have to use the following syntax:

database_list = client.list_database_names()

and for getting the names of collections from the database, we have to use the following syntax:

collection_list = client.my_database.list_collection_names()

Now, the next thing in line is to extract collections based on filters. To extract one collection based on some filter, we can use the following syntax:

collection = client.database.find_one(filter_condition)

What are filters?

filters are basically conditions based on which you want to search the dataset. For example, if we have a dataset of footballers with their body features like height, weight, speed, their player features like goals, passes etc. Then you may want to see player documents based on these features. then these will become filtering conditions. 

How to provide filter conditions in mongodb?

In mongodb, you have to provide filter conditions in form of json format. In the current datacamp course, we have a noble prize database which we work on through out the course. In this, database, a sample filter condition will be, find someone with surname rontgen. for that the filter will look like:

{"surname":"rontgen"}

and we can use it to find the document like below:

docs = client.database.find_one({"surname":"rontgen"})

which will give us the william rontgen's record of getting noble prize for discovery of x-ray. 

We will learn more details about filters in part 2 of mongodb series.

Comments

Popular posts from this blog

Tinder bio generation with OpenAI GPT-3 API

Introduction: Recently I got access to OpenAI API beta. After a few simple experiments, I set on creating a simple test project. In this project, I will try to create good tinder bio for a specific person.  The abc of openai API playground: In the OpenAI API playground, you get a prompt, and then you can write instructions or specific text to trigger a response from the gpt-3 models. There are also a number of preset templates which loads a specific kind of prompt and let's you generate pre-prepared results. What are the models available? There are 4 models which are stable. These are: (1) curie (2) babbage (3) ada (4) da-vinci da-vinci is the strongest of them all and can perform all downstream tasks which other models can do. There are 2 other new models which openai introduced this year (2021) named da-vinci-instruct-beta and curie-instruct-beta. These instruction models are specifically built for taking in instructions. As OpenAI blog explains and also you will see in our

Can we write codes automatically with GPT-3?

 Introduction: OpenAI created and released the first versions of GPT-3 back in 2021 beginning. We wrote a few text generation articles that time and tested how to create tinder bio using GPT-3 . If you are interested to know more on what is GPT-3 or what is openai, how the server look, then read the tinder bio article. In this article, we will explore Code generation with OpenAI models.  It has been noted already in multiple blogs and exploration work, that GPT-3 can even solve leetcode problems. We will try to explore how good the OpenAI model can "code" and whether prompt tuning will improve or change those performances. Basic coding: We will try to see a few data structure coding performance by GPT-3. (a) Merge sort with python:  First with 200 words limit, it couldn't complete the Write sample code for merge sort in python.   def merge(arr, l, m, r):     n1 = m - l + 1     n2 = r- m       # create temp arrays     L = [0] * (n1)     R = [0] * (n

What is Bort?

 Introduction: Bort, is the new and more optimized version of BERT; which came out this october from amazon science. I came to know about it today while parsing amazon science's news on facebook about bort. So Bort is the newest addition to the long list of great LM models with extra-ordinary achievements.  Why is Bort important? Bort, is a model of 5.5% effective and 16% total size of the original BERT model; and is 20x faster than BERT, while being able to surpass the BERT model in 20 out of 23 tasks; to quote the abstract of the paper,  ' it obtains performance improvements of between 0 . 3% and 31%, absolute, with respect to BERT-large, on multiple public natural language understanding (NLU) benchmarks. ' So what made this achievement possible? The main idea behind creation of Bort is to go beyond the shallow depth of weight pruning, connection deletion or merely factoring the NN into different matrix factorizations and thus distilling it. While methods like knowle