Machine learning and statistics with python

Posts

Showing posts from November, 2020

Neural network basic introduction and FAQ

Introduction: All of us, when start out with neural network, go through pictures of networks, try and understand complex equations; get baffled by the back-propagation equations and take our time to eventually assimilate the knowledge of what all that stands for. Recently while mentoring students for our new effort mentorbruh , one of my students asked some pretty interesting yet very basic questions. This content is an effort to write down those so that it can help other students who are going through neural networks first time and have these doubts. But before that, let's brush up the basics once pretty quickly. Abstract of this article: In this post, we are going to take small steps in explaining what is neural network, what are input,output and hidden layers; how does a node calculate its values. We will also briefly touch the concepts of bias, activation, hidden layer number count and all the related artifacts. In the more looser second part of this neural network b...

The installation saga of opencv

Introduction: The day is a sunday and I reluctantly was checking with one of my interns about the opencv project I have been helping them complete. She, rohini, told me, that she is finding a "dll error: module not found" type error in opencv video operation. Now, at this point I decided to solve this issue by creating a fresh venv and downloading the files, run them and resolve. And hence starts the issues. Naive-enough: The naive me, tried to download opencv saying pip3 install opencv. That sadly ends in a 404 error from pypi; as the project is actually under the name opencv-python. Now, I write pip3 install opencv-python; some downloads start; but it again stops with the statement, 'ModuleNotFoundError: No module named 'skbuild'. Big oof! So figures out.. skbuild is not some module to download. It is getting caused because of pip3 versioning. The solution to this issue is found from the github issue here; which is upgrade your pip using pip3 install --u...

How to find subjects and predicates using spacy in german text?

Introduction: I have talked enough about spacy in english. But enough about english; what about, say german? One of my fellow linguists who is not a german native speaker, wanted to analyze the german texts to find nouns and predicates. In this post, I will try to introduce you to german models, how to download, use and we'll finish what my fellow linguist started. gut, lass uns anfangen Download and load a german model: One of the good thing about spacy is that on change of language, there is no significant structure change. For german, there are 3 models available in the spacy pretrained models. These models are 'de_core_news_sm','de_core_news_md' and 'de_core_news_lg'. The sm, md and lg refer to small, medium and large respectively. For tasks where similarity are not needed and light models are more needed, one can go with the de_core_news_sm model. For higher precision and correct similarity related operations, de_core_news_lg is the preferred mode...

How to do lemmatization using spacy?

Introduction: Lemmatization is the concept of reducing down a word to its nearest root word; i.e. removing the prefixes and suffices and replacing with the nearest word of that form. i.e. for checking --> check , girls --> girl etc. Lemmatization is one of the most important steps in information retrieval as well as natural language processing. Lemmatizing different words with same root word reduces the noise and improves the information concentration in both natural language processing as well as information retrieval exercises. Now this content is not about explaining how lemmatization or stemming works. For that refer to this comprehensive post about stemming and lemmatization which should give you proper idea about what and how these processes work. Now lets talk about spacy. If you don't know what spacy is, start here with introduction to spacy . spacy is one of the best production level natural language processing library which lets one perform ...

how to continue line in python using backslash?

Introduction: According to pep-8 style, a line in python should be less than 79 characters. But many times, you can't finish a line within that. So what will be the solution for that? Solution: The solution is to continue the line by breaking it after some operator or a space. But to make sense that this is part of the same line; you need to provide backslash ("\") at the line end. Example: turn this following line: dataframe = prev_data.groupby(['cust_id','edge_id','edge_date']).mean().reset_index() into: dataframe = prev_data.groupby(['cust_id', \ 'edge_id','edge_date']).mean().reset_index() using backslash line break. Now you can write more clearer and nice pretty code using line break. Thanks for reading!

pandas groupby functions usage and examples

Introduction: Pandas is one of the most basic data processing libraries data enthusiasts learn and use frequently. We have discussed 10 most basic functions to know from pandas in a previous post. Now, although I have known and used groupby for quite a bit of time now, there are a lot of tricky things and actions around the groupby functions we need to learn, so that one can utilize groupby functions most. The basics: Now, if you are new to pandas, let's gloss over the pandas groupby basics first. groupby() is a method to group the data with respect to one or more columns and aggregate some other columns based on that. The normal syntax of using groupby is: pandas.DataFrame.groupby(columns).aggregate_functions() For example, you have a credit card transaction data for customers, each transaction for each day. Now, you want to know how much transaction is being done on a day level. Then in such a case, to know the transaction on a day level, you will want to group the data...

How to create better app layout and deploy your apps in streamlit

Introduction: Photo by Alexander Sinn on Unsplash In our first post , we discussed about how we create a basic streamlit application and then actually we showed a nlp usage data application and showed how it is working. In this post, we will discuss how to setup layout of apps and deploy apps via streamlit share. Let's dig in. What is a layout? The layout refers to the structure of app front end. We study layouts and specifically fix each and every small component of an app. The reason for creating, studying and actively "optimizing" layout designs is to create the best user experience for every user of the app. The details of such topics are obviously out of scope for this article; but as the meaning of layout in this context is established; let's proceed with how to create different lay...

what is bigbird models and why is it such a great successor to transformer?

Introduction: Transformer models made a huge news and then a huge impact in our nlp world. Researchers and industrialists across the world, took the new idea of language modeling and created new standard models in text processing, classification, natural language generation and many other directions. Transformer architecture in language modeling is therefore considered to be a inflection point in the nlp research history. Issues with initial transformer models: But with great power, comes great responsibilities. Transformers too, came with one great responsibility; which is computation. In the core of transformers we use attention mechanism. Attention, in plain English is the way to get a signal of relation between different words/tokens of text. Transformers use what we call, a full quadratic transformer; i.e. in case of transformers, we calculate the relevance of each token with every other token. This simple thing, in turn, increases the computation cost in a 0(n 2 ) order wh...

python3 list: creation, addition, deletion

Introduction: Lists are one of the most fundamental data structures in python3. In this article we are going to go through the basics and some of the advanced usage of lists in python and where should you use them and where you should not use them. what is list? List is a python data structure which is equivalent to a dynamic array in C++ or C. List is used to store normally homogeneous elements in python3; but pythons allow to store dissimilar elements in a list too. To say in summary, list is an ordered linear data structure. How to create a list? List can be created by using as simple as writing list = []; which initiates an empty list. Also, there is the list() builtin function in python which also creates a list. Empty lists can be created using both [] and list(); but it is known that [] is a bit faster method to initiate an empty list. List can also be created with the elements to be put into it. i.e. you can start a list with the elements it is supposed t...

how to download and use different spacy pipelines?

Introduction: Photo by CHUTTERSNAP on Unsplash Spacy is the new age, industrial usage and computationally economical nlp library which employs full pipelines for nlp works and attempts to democratize the natural language processing for smaller companies by publishing high end work into the open source software spacy. For last 2 months, I have been reading and using spacy heavily and been blogging about it. We have seen a detailed how to use spacy in the part1 , part2 , part3 and part4 of the spacy series. But as we have worked mostly with one model; we have never dealt with different pipelines the spacy library offers. So in this post, I am going to give a small summary of the different models we can use for spacy and what are the pipelines related to them. Let's dig in. How to download spacy models and use them: All spacy pipelines are d...

pytextrank: module for TextRank for phrase extraction and text summarization

Introduction: We have described spacy in part1 , part2 , part3 , and part4 . In this post, we will describe the pytextrank project based on spacy structure which solves phrase extraction and text summarization. Pytextrank is written by Paco nathan , an american computer scientist, based on texas. Pytextrank is mainly interesting for me for two reasons: (1) implementation of the textrank algorithm very nicely in a spacy extension format (2) the easy usage of the package which properly abstracts out all the complexity of the package from the user and can be used with little to no understanding of the underlying algorithm. Now, as I may have given enough motivation to read and use this package; we will explore the basic usage first, and then dive in to see the inner working; which will be the more advanced part of this post. How to use pytextrank: pytextrank can be installed via pip3 install pytextrank as it is included in the pypi listing. Now once you install it in that manner; t...

Posts

subscribe!