Machine learning and statistics with python

Posts

Showing posts from July, 2020

Huggingface transformers library exploration: part1,summarization

Introduction: Being in the nlp field, one of the burning topics currently is transformer architecture. Using attention and pooling and a totally new architecture, transformer based models have been pushing new improvements every year. Now, generally these models are pretty tough to understand and implement. Therefore people search for a good library with these models implemented in them. And one such good library is huggingface's transformer library. I have recently started to explore it. transformers has both pytorch as well as tensorflow support. To install transformers, in linux, you can just type pip install transformers And it will download and settle. Now, I will go through the quick tour part and try out a couple of examples from it. Quick tour: From quick tour, I have decided to try out the summarization task. I had a microsoft related text in my pc downloaded earlier. Now, I will try out different summarization tasks with the easily usable pipeline structure. A pipeline ...

basic commands in ubuntu console and ec2 instances

Select Language Afrikaans Albanian Arabic Armenian Azerbaijani Basque Belarusian Bulgarian Catalan Chinese (Simplified) Chinese (Traditional) Croatian Czech Danish Dutch English Estonian Filipino Finnish French Galician Georgian German Greek Haitian Creole Hebrew Hindi Hungarian Icelandic Indonesian Irish Italian Japanese Korean Latvian Lithuanian Macedonian Malay Maltese Norwegian Persian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swahili Swedish Thai Turkish Ukrainian Urdu Vietnamese Welsh Yiddish Bengali Gujarati Marathi Nepali Punjabi Tamil Telugu Introduction: For the last couple of days, I have started working in ec2 instances and I have been noticing the fact that there is not a good question answering place for the ec2 instances. So in this blog I will try to notedown some of the common but useful commands you can use in ec2 instances while working in aws cloud machines. Ubuntu specific normal commands: 1. ls ls is used to see the lists ...

Is it fine to start learning Deep learning with a basic knowledge in machine learning?

A small story: A junior data scientist walks up to a senior. He asks, "my model accuracy is not good; can you take a look?". The senior takes a look into the model, and sees that the model is a self attention model without pre-training; and it is trained on a 400 row datasets. This can happen if you don't have a bit of experience in machine learning and delve into deep learning too fast. But then how much knowledge in machine learning is needed to start working on deep learning? Why are people rushing to deep learning? Deep learning, the machine learning part established on neural networks; has started back in 1950s. After two winters in deep learning, just now, when processor speeds are at maximum and computation cost is at the lowest of all times, deep learning is booming in both academics as well as industries. From Tesla to openai, from stanford to MIT, everywhere academics are working on exciting new things, the different new architectures are now too many to e...

How to upgrade your python version to 3.7 in ubuntu to avoid problems?

I wanted to download a package some days ago which gave an error 'doesn't contain python>=3.7' ; which meant that I didn't have python of versions more than or equal to 3.7. So I went ahead and downloaded and faced certain problems. That's why I am writing this article. One line work: The only thing you need to do for this is write in bash: sudo apt update -y sudo apt install python3.7 Now this part is important. In many tutorials, you may have seen the following line: sudo update-alternatives --install /usr/bin/python python3 /usr/bin/python3.7 2 sudo update-alternatives --config python3 Now doing this last line, tutorials will tell you to selection 3.7 as the default option. That is where your system may go wrong. I did the same. And then the following three problem came up: (1) there is a red circle in notification bar with a white dash inside it. This says, some error happened while checking for updates. (2) You will not be able to access the software cen...

Some nice open-source python repositories to work

I have been looking to contribute in some nice data science repositories in python. I have searched some nice repos which are in their begining phases or rather they are in a position that they can be touched to improve and amateurs like me in machine learning with almost no commit in open sources, can do a pretty good job committing to those. Keeping that in mind, I am starting to list down some of the repos like that: (1) dython: This is a python data tools repo. Although this is published now and has somewhat no issues; it looks like being somewhat open-ended creative people can add a lots of other tools and functionality to this tool, make it more main streamed and therefore increase their individual and the package's overall progress. (2) simpletransformers : This is simple transformer written by Thilina Rajapakse. This is an amazing package which uses the huggingface's transformer library and then combines its high level knowledge requirements into its intrinsic progra...

Are communication students eligible for machine learning?

C ommunication students; meaning students of electrical communications and electronics engineering, are an elite stream of engineers who generally have more than average depth in mathematics, signal processing and probability theory. Today, in this video we are going to review the idea that whether communication students are eligible for machine learning even after that. The good: Communication students are generally taught 2-3 courses of engineering mathematics and 1-2 courses of compulsory probabilities and numerical approximations. Along with all these mathematics, handling the normal engineering physics courses as well as electronics calculations, they get a good exposure to the mathematical problem setting, problem solving, numerical algorithms and other several necessary tools needed for a machine learning expert. What is more important for further advanced experiences in machine learning is understanding of diffe...

How do I carry out regression analysis with a sample size of only 28 and number of variables (including DV) 14?

For regression, this is way too less number of samples. It is advisable to use 20–30 samples par variable. Therefore, what you can do here, is consider the correlation of the independent variables with the dependent variables and choose the highest correlated variable to the dependent variable and build a one variable model once. You can also try out principal component analysis on the sample to create 3 effective variable to capture the variance mostly, but I suppose the effectiveness of the PCA is also not that good in a sample size of yours. I wonder whether your main task is to do regression or not. Because such small data are seen mostly in neuroscience and psychology where the main task is to find out underlying factors and not doing any prediction of sort. If you are also having similar reasons, then resort to tests like ANOVA, mANOVA, rank tests and others and devise them carefully enough to find out the effects you are trying to find out. Finally, if you have...

Collection of python packages for NeuroPyschology

Introduction: R ecently I have started to gain a bit of interest in the field of neuropsychology. And for the same reason, I have begun a summary of python models and packages available for doing some standard actions in the field. (1) autoreject: autoreject is a package to automatically reject bad/ erroneous M/EEG data. If I examine it, I will give a better details. (2) Drift diffusion package : This is a package written for a computation model called drift diffusion model which is often used in cognitive neuroscience and psychology. If I examine it, I will give a better details. (3) Fieldtrip toolbox converter : This is a toolbox used to convert MEG and EEG data of different format from fieldtrip toolbox to MNE toolbox in python. I will update on further examination. (4) MNE : This is the main MNE toolbox in python. This is used for all head modeling related analysis and stuff in python. I will definitely go through analyzing this. This is a notingly relevant resource for head mod...

do machine learning experts use more maths or more libraries?

I don't consider myself a machine learning expert yet, but I have had quite a bit of experience working with machine learning experts and veterans in my yet short career. Some of them have done PhD in mathematics while some others have worked extensively as senior software engineers. And therefore, I think it is easy for me to answer whether machine learning experts use more math or more libraries. The answer which you didn't expect! They use more libraries and read more maths. If you call a person machine learning experts if he/she can do a lot of machine learning work efficiently, they certainly expertise in both options. Generally, machine learning experts are handed over more open-ended problems, which require both theoretical solution of them as well as building thorough pipelines for cleaning, feature processing, modeling, tuning and interpretable results and visualization creation. And a person who can do so, surely needs a lot of experience in maths, libraries and fra...

calculate condition number and determinant in R and python

Introduction: If you are working on machine learning, more than often, you will need to perform different types of matrix manipulations in r/python. I will mention some of such manipulations and regarding functions to use in r and python. condition number finding: condition number is generally used to find the stability as well non-singularity of a matrix. It is defined to be the ratio of absolute value of highest singular value and smallest singular value ( in terms of mod value). If a matrix has condition number more than 1000, then it is generally considered to be a unstable matrix. For finding condition number of a matrix in r, we have to use the kappa() function. For normally using kappa, you need to use two parameters. First input to kappa has to be the matrix. The second one is exact parameter. This exact is set to be FALSE in general. In this setting, a cheap (computationally) approximation of condition number is obtained and provided. If you set kappa to be TRUE, then the me...

GAM model : PyGAM package details Analysis and possible issue resolving

Introduction: picture credit to peter laurinec. I have been studying about PyGAM package for last couple of days. Now, I am planning to thoroughly analyze the code of PyGAM package with necessary description of GAM model and sources whenever necessary. This is going to be a long post and very much technical in nature. Pre-requisites: For understanding the coding part of PyGAM package, first you have to learn what is a GAM model. GAM stands for generalized additive model, i.e. it is a type of statistical modeling where a target variable Y is roughly represented by additive combination of set of different functions. In formula it can be written as: g(E[Y]) = f 1 (x 1 ) + f 2 (x 2 ) + f 3 (x 3 ,x 4 )+...etc where g is called a link function and f are different types of functions. In technical terms, in GAM model, theoretically expectation of the link transformed target variable is assume...

Posts

subscribe!