For those wanna-be statisticians

Introduction:

Today, I found a question on Reddit asking that what do you have to read to be a statistician. I started to write an answer and immediately understood that it is going to be a good answer for those who want to have a certain checklist to complete a self-teaching journey. In any self-teaching journey, this is one of the problems, that you do not know where to stop and what to read exactly unless you get your checklist straight.
So here is the answer and from this you can get your checklist correctly.

For statistics, make sure you have a good probability background i.e. you understand random variables, expectations, variances, pdf, cdf , moment generating functions, techniques of solving probability questions, convergences etc basics of probability. Also, you will need to have a good linear algebra background as much of the statistics will need matrices and vector spaces. Then, once you have that, you can balance by taking MOOCs and read the topics taught in the courses from standard statistics books. Now, I am assuming that you want to be a statistician. Therefore, you will want to check the following lists to be ticked:

(1) descriptive statistics:
histogram, bar, chart,box,stem-leaf plotting etc, describing and understanding basic natural problems in statistical versions. This is descriptive level.

(2) diagonostic and predictive statistics:
This will need to know sufficient,ancillary statistics, mle, mom methods, hypothesis, t,z,f,chi-square, goodness of fit and other different tests for hypothesis testing, different types of relations like univariate, bivariate relations, correlation and dependence of variables and their effects. These helps to understand a problem situation to a statistician. Also predictive statistics means to know different types of regressions, i.e. linear, logistic, multilinear etc and their details. Predictive basically introduces a statistician to fit the data into some specified pattern and then predict the outcome for next things.

(3) forecasting and time series analysis:
These are then branches of statistics. Forecasting and time series are basically used to know,model and predict things which are dependent on times and therefore are far more interesting. There are numbers of models under both of these and therefore good time is required for the same.

(4) Bayesian statistics and non-parametric based studies:
Although they come under predictive and diagnostic, but a lot of books and courses will not go into these while doing regression and other parametric staffs. Bayesian statistics may need good amount of probability, but once known will introduce you to a big area of modern statistics. Also, as data is not always fit for all our assumptions, in practical, lot of things are done under the hood of non-parametric based studies.

(5) Sample surveying: This, although is not that important, but as for a statistician may be looked for survey and other works in a company and/or in academics, sample for research is to be collected by the researchers only, a good understanding of the undergoing techniques of sample surveying is also good to have.

So, I think you will now have a sense of the things you need to go through. The topics are in itself a order of increasing difficulty and are also less mandatory to already know as a statistician. But then again, if you are self teaching, why be a bad teacher to leave some of the syllabus!

For linear algebra, you may follow michael artin's linear algebra. For basic probability, it is good to follow introduction to probability by sheldon ross. Now, for beginners statistics, give a read once to introductory statistics by sheldon ross, the descriptive statistics part is good here.
For point (2) topics, it will be enough to follow casella and berger. Then, for the other topics, you can follow a lot of books and online courses. For regression,time series, forecasting, non-parametric tests; please also go through R and/or python implementation of them; if possible.

Hope you enjoy the journey in statistics.
I have started compiling some of the necessary building blocks for essentially doing a statistician or data scientist job. Please follow these links below to get started with me in:
(1) time series analysis
(2) pandas use in data science
(3) Regression
(4) non-parametric tests
(5) a basic understanding of python
(6) keras introduction
and many other posts to come.

20 Must-Know Math Puzzles for Data Science Interviews: Test Your Problem-Solving Skills

Introduction: When preparing for a data science interview, brushing up on your coding and statistical knowledge is crucial—but math puzzles also play a significant role. Many interviewers use puzzles to assess how candidates approach complex problems, test their logical reasoning, and gauge their problem-solving efficiency. These puzzles are often designed to test not only your knowledge of math but also your ability to think critically and creatively. Here, we've compiled 20 challenging yet exciting math puzzles to help you prepare for data science interviews. We’ll walk you through each puzzle, followed by an explanation of the solution. 1. The Missing Dollar Puzzle Puzzle: Three friends check into a hotel room that costs $30. They each contribute $10. Later, the hotel realizes there was an error and the room actually costs $25. The hotel gives $5 back to the bellboy to return to the friends, but the bellboy, being dishonest, pockets $2 and gives $1 back to each friend. No...

Deep Learning by Ian GoodFellow, Yoshua Bengio and Aaron courville Review

History: I have been reading deep learning topics from a number of resources like machine learning mastery by Jason Brawlee, Analyticsvidhya, and other blog resources. But the problem has stayed, the problem of inconsistency in the knowledge. Therefore, I have decided to now sit down, and go through a deep learning book thoroughly. And what better name for deep learning other than Ian Goodfellow! So I have found this book named Deep Learning by Ian Goodfellow. Introduction: Plan for this post is reviewing and rewriting the topics from the book, in simpler language and for sharing the pieces of knowledge with my readers. I will update this post continuously as I proceed with the reading also. So ideally this post is broadly about basic to advanced deep learning material discussion. Sponsored Ads Learn deep learning in python with Udemy Different parts of the book and purpose of them: This book has three parts,which talks about (1) applied mathematics and machine learni...

Pyarabic: python package for Arabic language

Introduction: In languages which are non-english and non-european as well, NLP work has progressed slowly in the last few decades because of the lesser number of scholars working on them as well as a lack of global interest in them. But now the time has changed and people from all over the world are collaborating on these lesser explored libraries and they are building resources for working on these languages with the same ease with that of english. Pyarabic is a package created from such a similar effort which deals with the intricate details of the arabic language and helps processing all kinds of arabic texts. While trying to learn it, being from a non-arab background, I couldn't read lots of parts of the main readthedocs site and had to work my around it. So in this blog post, I will summarize my learnings in english language, so that you can learn it and use the package with much more ease than me. [Credit where credit is due: this article heavily uses the ac...

Machine learning and statistics with python

Search This Blog