1
Introduction to modeling
- Introduction to the Python language.
- Introduction to the Jupiter Notebook software.
- Steps for building a model.
- Supervised and unsupervised algorithms.
- Choosing between regression and classification.
Hands-on work
Installing Python 3, Anaconda, and Jupiter Notebook.
2
Model evaluation procedures
- Techniques for resampling in training, validation and testing sets.
- Learning data representativeness test.
- Predictive model performance measurements.
- Confusion and cost matrix and AUC-ROC curve.
Hands-on work
Setting up data set sampling. Conducting evaluation tests on multiple provided models.
3
Supervised algorithms.
- The principle of univariate linear regression.
- Multivariate regression.
- Polynomial regression.
- Regularized regression.
- Naive Bayes.
- Logistic regression.
Hands-on work
Implementing regressions and classifications on multiple data types.
4
Unsupervised algorithms
- Hierarchical clustering.
- Non-hierarchical clustering.
- Mixed approaches.
Hands-on work
Handling unsupervised clusters in multiple datasets.
5
Component analysis
- Principal component analysis.
- Correspondence analysis.
- Multiple correspondence analysis.
- Factor analysis for mixed data.
- Hierarchical classification of principal components.
Hands-on work
Reducing the number of variables and identifying underlying factors of dimensions associated with significant variability.
6
Text data analysis
- Collecting and preprocessing text data.
- Extracting primary entities, named entities, and reference resolution.
- Grammatical tagging, syntactical analysis, semantic analysis.
- Lemmatization.
- Text vectorization.
- TF-IDF weighting.
- Word2Vec.
Hands-on work
Explore the contents of a text base using latent semantic analysis.