Skip to main content

Posts

Showing posts from January, 2021

What is topic modeling?

                               What is topic modeling?                                      Written by: Shyambhu Mukherjee Introduction: Topic modeling is one of the famous natural language processing tasks. Topic modeling refers to assigning one or multiple topic to a set of document. In this article, we will discuss the basic, medium and advanced understanding of topic modeling and discuss multiple python libraries which will be used to do topic modeling. Summary of the article: In this article, first we will define what topic modeling and topic classification modeling are; what are their difference and how to do both as a overview. Then we will discuss topic modeling procedures, such as LSA, LDA; in details. Finally we will end the discussion with conclusion and further readings. What is topic modeling? Topic modeling is a type of statistical modeling tool which is used to assess what all abstract topics are being discussed in a set of documents. Topic modeling, by its construc

Issues with current data science mentorship programs in India

Introduction: I have been working with mentorbruh, a close impact mentorship program for quite sometime now, and had the opportunity to meet a lot and mentor a few data aspirants. Lots of them, come with a prior experiences with internship programs, coaching schools and other type of certification courses and what not. But most of them one thing common.  "Their knowledge from these programs didn't make them employable." Now when I say that, that sounds like I am exaggerating; and it can't be like that. But in this article, I am going to explain three types of programs, and also will in detail break down that why they don't work fully or partially. Summary of article: I am going to take a deep dive into the current famous programs running in india for data science coaching, and will thoroughly explain their flaws and reason of very small to no success rates. The SMB coaching centers: The lowest tier in data science mentorship is the small and medium businesses

What is lazypredict automl library and how to use it?

                               Lazypredict: The automl library Introduction: Recently I have started my journey with automl, and explored the sberbank's light automl framework as well auto-eda with pandas profiling in this post . In this post, I will explore the lazypredict framework written by Shankar Pandala sir. In this post, we will first show how to use the library, what are the outputs we get from this, and then finally, we will go in-depth of the code; to see how lazypredict does what it does. Usage: For this part, we will just use the github repo's code example. There are two classes, LazyClassifier and LazyRegressor, respectively for classifier and regressor. We can import the classifier class if your problem is classification, and import regressor if you have a regression problem. X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.5,random_state =123) clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None) models,predictions =

5 mistakes I made in my first year of machine learning and what I learned from them

                             5 mistakes of a ml beginner Introduction: This is a lightweight non-technical post. I have been practicing the craft named machine learning for the last 2.5 years now. I recently have started working with a few data science aspirants, and found them making a lot of similar mistakes like I would do. In this article, I will share 5 mistakes I did, and a lot of machine learning and data science beginners do. First mistake: not reading your data: Data science is exciting and machine learning starts with learning a lot of models and algorithms. Hence often when we start learning machine learning and data science, we don't learn the most important step. The most important step is to read your data. Now, I used to find it stupid to read the data, because I didn't know what does reading data mean.  Reading data means understanding different patterns of the data manually, errors and inconsistencies in the data. Let's say you have a text data. Now re

What is DALL.E and how does it create image out of text?

                                              DALL-E                                       A ground-breaking machine learning news Introduction: It was a tuesday morning. Woke up, sipping in my morning cup of coffee, found out that openai has dropped a bomb again. This time they didn't stop with language, but they created a neural network architecture which takes a text-prompt and creates an image for that text. It created another ripple within the data science and deep learning communities within few days; and within 5 days, there are 1000s of news to technical articles written about DALL-E now. So here is my take on DALL-E. sit back and enjoy! Summary: In this article, we are going to go through the basic and medium technical review of DALL-E and clip AI neural networks. This is not a purely technical article,as it will cater to both ml and non-ml people similarly about this awesome new thing. What is DALL-E? DALL-E is a version of GPT-3 with 12 billion parameters, which is

Introduction to Rasa: the NLU chatbot framework

                       Introduction to Rasa                                        Written by: shyambhu mukherjee Motivation: With the onset of 2021, I planned to up-skill in chatbot creation. For chatbot creation, there are a number of frameworks available; such as Dialogflow, RASA and others. I already wrote about what are chatbots and created a small appointment scheduler chatbot using Dialogflow by following google's developer course on the same. If you don't know what a chatbot is, read the above linked article first; and then continue in this post.  Summary of the article: In this article, we will first describe what Rasa is and what a normal chatbot anatomy looks like. Then we will quickly iterate over a few concepts related to chatbot framework. Finally we will document our process to create a health care chatbot by providing step by step guide to install, initialize, train and deploy a rasa bot in a linux machine. By the end of this post, you will be able to crea

Spacy errors and their solutions

 Introduction: There are a bunch of errors in spacy, which never makes sense until you get to the depth of it. In this post, we will analyze the attribute error E046 and why it occurs. (1) AttributeError: [E046] Can't retrieve unregistered extension attribute 'tag_name'. Did you forget to call the set_extension method? Let's first understand what the error means on superficial level. There is a tag_name extension in your code. i.e. from a doc object, probably you are calling doc._.tag_name. But spacy suggests to you that probably you forgot to call the set_extension method. So what to do from here? The problem in hand is that your extension is not created where it should have been created. Now in general this means that your pipeline is incorrect at some level.  So how should you solve it? Look into the pipeline of your spacy language object. Chances are that the pipeline component which creates the extension is not included in the pipeline. To check the pipe eleme