Below are three of the most important steps in becoming a data engineer.
If you are interested in data science, artificial intelligence, machine learning, big data, data mining and all the fields that are the main tools of data, the way to reach the things I have written below
All the said cases are the result of my own research for several months and I am on the way to reaching each one of them, so the said cases may be incomplete, but the whole path is not different from the said cases.
Of course, there are more steps to reach a professional data scientist, which I will discuss in detail in the next post.
STEP 1:
Probability and Statistics
conditional and Probability and Bayes rule
the concept of random variable
Expected value, variance, covariance, and correlation coefficient
Distribution and their usage and properties
weak and strong law of large number
central limit theorem
sampling distributions
simple linear regression and the ordinary least square method
Analysis of variance
Estimation and the concept of bias MLE
Confidence intervals and hypothesis tests
STEP 2:
Programming
Python or R
Vectors, matrices, list and data frames
Proficient in working with data frames
Concepts of libraries in R or packages in python
Learn to work with dplyr library in R or pandas in python to manipulate data frames
Functions, for and while loops and logical expressions
For linear algebraic operations you can use numpy in python or basic R library
Learn to work with regular expressions
STEP 3:
Machine learning
Linear algebra
Multivariate normal distribution
Regression
generalized linear models
Stepwise regression
LDA, QDA and Naïve Bayes methods
KNN, decision tree, bagging, boosting random forest, and gradient boosting
Support vector Machines
Regularized regression
Deep learning
K-means, hierarchical and database scan clustering techniques