Introduction:
Today, I found a question on Reddit asking that what do you have to read to be a statistician. I started to write an answer and immediately understood that it is going to be a good answer for those who want to have a certain checklist to complete a self-teaching journey. In any self-teaching journey, this is one of the problems, that you do not know where to stop and what to read exactly unless you get your checklist straight.So here is the answer and from this you can get your checklist correctly.
For statistics, make sure you have a good probability background i.e. you understand random variables, expectations, variances, pdf, cdf , moment generating functions, techniques of solving probability questions, convergences etc basics of probability. Also, you will need to have a good linear algebra background as much of the statistics will need matrices and vector spaces. Then, once you have that, you can balance by taking MOOCs and read the topics taught in the courses from standard statistics books. Now, I am assuming that you want to be a statistician. Therefore, you will want to check the following lists to be ticked:
(1) descriptive statistics:
histogram, bar, chart,box,stem-leaf plotting etc, describing and understanding basic natural problems in statistical versions. This is descriptive level.
histogram, bar, chart,box,stem-leaf plotting etc, describing and understanding basic natural problems in statistical versions. This is descriptive level.
(2) diagonostic and predictive statistics:
This will need to know sufficient,ancillary statistics, mle, mom methods, hypothesis, t,z,f,chi-square, goodness of fit and other different tests for hypothesis testing, different types of relations like univariate, bivariate relations, correlation and dependence of variables and their effects. These helps to understand a problem situation to a statistician. Also predictive statistics means to know different types of regressions, i.e. linear, logistic, multilinear etc and their details. Predictive basically introduces a statistician to fit the data into some specified pattern and then predict the outcome for next things.
This will need to know sufficient,ancillary statistics, mle, mom methods, hypothesis, t,z,f,chi-square, goodness of fit and other different tests for hypothesis testing, different types of relations like univariate, bivariate relations, correlation and dependence of variables and their effects. These helps to understand a problem situation to a statistician. Also predictive statistics means to know different types of regressions, i.e. linear, logistic, multilinear etc and their details. Predictive basically introduces a statistician to fit the data into some specified pattern and then predict the outcome for next things.
(3) forecasting and time series analysis:
These are then branches of statistics. Forecasting and time series are basically used to know,model and predict things which are dependent on times and therefore are far more interesting. There are numbers of models under both of these and therefore good time is required for the same.
These are then branches of statistics. Forecasting and time series are basically used to know,model and predict things which are dependent on times and therefore are far more interesting. There are numbers of models under both of these and therefore good time is required for the same.
(4) Bayesian statistics and non-parametric based studies:
Although they come under predictive and diagnostic, but a lot of books and courses will not go into these while doing regression and other parametric staffs. Bayesian statistics may need good amount of probability, but once known will introduce you to a big area of modern statistics. Also, as data is not always fit for all our assumptions, in practical, lot of things are done under the hood of non-parametric based studies.
Although they come under predictive and diagnostic, but a lot of books and courses will not go into these while doing regression and other parametric staffs. Bayesian statistics may need good amount of probability, but once known will introduce you to a big area of modern statistics. Also, as data is not always fit for all our assumptions, in practical, lot of things are done under the hood of non-parametric based studies.
(5) Sample surveying: This, although is not that important, but as for a statistician may be looked for survey and other works in a company and/or in academics, sample for research is to be collected by the researchers only, a good understanding of the undergoing techniques of sample surveying is also good to have.
So, I think you will now have a sense of the things you need to go through. The topics are in itself a order of increasing difficulty and are also less mandatory to already know as a statistician. But then again, if you are self teaching, why be a bad teacher to leave some of the syllabus!
For linear algebra, you may follow michael artin's linear algebra. For basic probability, it is good to follow introduction to probability by sheldon ross. Now, for beginners statistics, give a read once to introductory statistics by sheldon ross, the descriptive statistics part is good here.
For point (2) topics, it will be enough to follow casella and berger. Then, for the other topics, you can follow a lot of books and online courses. For regression,time series, forecasting, non-parametric tests; please also go through R and/or python implementation of them; if possible.
For point (2) topics, it will be enough to follow casella and berger. Then, for the other topics, you can follow a lot of books and online courses. For regression,time series, forecasting, non-parametric tests; please also go through R and/or python implementation of them; if possible.
Hope you enjoy the journey in statistics.
I have started compiling some of the necessary building blocks for essentially doing a statistician or data scientist job. Please follow these links below to get started with me in:
(1) time series analysis
(2) pandas use in data science
(3) Regression
(4) non-parametric tests
(5) a basic understanding of python
(6) keras introduction
and many other posts to come.
I have started compiling some of the necessary building blocks for essentially doing a statistician or data scientist job. Please follow these links below to get started with me in:
(1) time series analysis
(2) pandas use in data science
(3) Regression
(4) non-parametric tests
(5) a basic understanding of python
(6) keras introduction
and many other posts to come.
Comments
Post a Comment