Syllabus
Introduction to Data Science (Professor Zambom)
- Supervised vs. Unsupervised Learning
- Review of Linear Algebra
- Random Variables
- Introduction to R
- Review of Elementary Statistics
- Time Series Analysis
- Test, Training, and Cross Validation
- Linear, Multiple Linear, Polynomial Regression
- Unsupervised Learning
- Introduction to Graphs
- Graph concepts
- Page rank algorithm
- Markov processes
- Graph based clustering
- Installing Python, understanding Jupyter notebooks. command shells
- Rapid introduction to Python
- Least squares and multiple Linear Regression; train/test split; using scikit-learn
- Polynomial regression; plotting with matplotlib; histograms and boxplots
- Anscombes quartet; plotting multiple axes; splines; nonlinear regression; gradient desscent
- Binary classification; logistic regression; classification metircs; ROC curves.
- Probabilistic methods: KNN, LDA, QDA, Naive Bayes; One-Hot encoding.
- Classification and Regression Trees
- Neural Networks
- Principal Component Analysis
- Support Vector Machines
- Tree Ensembles including Boosting, Bagging and Random Forests
- KMeans, Hierarchical, and DBScan clustering Algorithms
- Programming challenges on Kaggle (two days)
- Relational Databases
- Installing and Using MySQL
- Structure Query Languages
- Big Data
- Data pre-processing
- Feature Engineering
- Frequent pattern mining
- Introduction to Cloud Computing using AWS
- Projects (three days)
- Data Analysis at Kaiser Permanente, Brianna Amador
- Introduction to Latex and Beamer, Miriam Ramirez
- Graph Clustering in the Game of Thrones, Richard Wolff
- Visualizing Student Success, Jorge Martinez
- Q-Learning, Jorge Martinez
- Overview of RL, CV, NLP, Andrew MIller and Seyed Sajjadi