Computational Data Analy
Computational Data Analy ISYE 6740
Popular in Course
Popular in Industrial Engineering
This 0 page Class Notes was uploaded by Maryse Thiel on Monday November 2, 2015. The Class Notes belongs to ISYE 6740 at Georgia Institute of Technology - Main Campus taught by Alexander Gray in Fall. Since its upload, it has received 47 views. For similar materials see /class/234206/isye-6740-georgia-institute-of-technology-main-campus in Industrial Engineering at Georgia Institute of Technology - Main Campus.
Reviews for Computational Data Analy
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 11/02/15
CSE 6740 Lecture 1 What is Machine Learning Overview Alexander Gray agraycc A gatech edu Georgia Institute of Technology A CSE 6740 Lecture 1 p1l 0 CSE 6740 A 88977 ISYE 6740 A 89574 Welcome to This Course 7 Distance learning CSE 6740 Q 89194 b in b in b b Computational Data Analysis Foundations of Machine Learning and Data Mining TuTh 10051055am 2447 Klaus Office hours Grab me right after a lecture WWWccgatecheduNagrayfallOBhtml Mailing list www2 isye gatech edumailman listinfoisye6740a IANBhantMehEinicheccgatechedu J CSE 6740 Lecture 1 p2l What This Course is V 7 9 This is GT s advanced machine learning course though previous ML knowledge is not required o All of machine learning crash course from O to 60 beginner to professional in one semester L J CSE 6740 Lecture 1 p3l Goals of This Course 7 7 9 Give you the foundations needed for Competent analysis of data application of ML o Design of new methods ML research Give you the big picture including context for other courses 0 Put forth a new version of that picture L J CSE 6740 Lecture 1 p4 Taking this Course 7 7 9 Yes you can get into the class if you can t register online email me for a permit o Yes you should take this class if you have the background Background You will need basic calculus basic linear algebra basic probability L J CSE 6740 Lecture 1 p5l Today If 1 What is machine learning a Datasets b Tasks of machine learning 0 Parts of machine learning cl Relationship to other fields AAAA 2 What is this course a Overview of this course b Main messages of the course 0 Logistics I ll stick around for questions L J CSE 6740 Lecture 1 p6 A Dataset x x x x x m H wbw Featuresattributesdimensions columns Datapointwf Iota n 39 IH rows TargeVoutcomeresponse abeldependent variable special feature to be predicted Independent variablescovariatespredictorsregressorsJ the other features CSE 67m Lectuvet 72 7739 Types of Data If time series data Y ordered xxxxx bwm m iid independent identically distributed vectors Time series dependent vectors Images matrices Variablesize nonvector data eg strings trees graphs text L Objects eg within a relational schema bbbb CSE 674D Lecture 1 7 Err Main Goal of Learning Prediction rThe setup 7 1 You obtain some kind of model of some training data through a process called learning also estimation 2 Then you use that model to predict something about data you haven t seen before but that comes from the same distribution as the training data called test data L J CSE 6740 Lecture 1 p9l 3 Main Learning Tasks r Density estimation predict the density o Regression predict a continuous target variable 3 Classification predict a discrete target variable Others clustering dimensionality reduction o Supervised learning We re predicting a target variable for which we get to see examples regression classification 9 Unsupervised learning We re predicting a target variable for which we never get to see examples density estimation clustering dimensionality reduction J CSE 6740 Lecture 1 p10 Density Estimation density model training learn dalasel sEamu Ledme1 rpM Density Estimation density training dalasel lest dalasel We never see the true value of the target Unsupervised ESEE740Ledme1 7v12 Density Estimation rExample observed matter in the sky 7 Should I fit a Gaussian What would the right parameters be Why Would a histogram be better What would the right bin width be Is it betterthan a Gaussian in general Or worse L J CSE 6740 Lecture 1 p13l Regression training dalasel lest dalasel We re predicting a continuous target variable Supervised sEamu Ledme1 rpm Regression rExample stock price prediction 7 Should Ijust predict the value of the last observation How about a combination of the last k values linear regression What s more general than that What should I expect my maximum error to be L J CSE 6740 Lecture 1 p15l Classi cation targetclass training dalasel lest dalasel We re predicting a discrete target variable Supervised sEamu Ledme1 mar Classi cation rExample automatic zipcode digit recognition 7 Should I model each digit s images with a Gaussian naive Bayes How about taking the class label of the nearest is training points knearest neighbor classifier Where does this method come from How about if we focus on finding the widest decision boundary support vector machine Where does this method come from For this dataset how can I definitely know one method is better than another How do these methods scale with dimensionality statistically and computationally Number of data How do we estimate our future error accurately L J CSE 6740 Lecture 1 p17l Main Parts of Machine Learning 7 7 o Model class Loss error function Generalization mechanism 3 Optimizer Also sometimes important evaluation algorithm L J CSE 6740 Lecture 1 p18 Model Classes VF 7 Parametric example class of all Gaussians at mm exp2ltx ugt2u also gt0 1 irst we must pick a model Class or function Class J3 L J CSE 6740 Lecture 1 p19l Model Classes WC 7 2 ffx66 lass of all parametric models in general Nonparametric example class of all functions with a certain smoothness 3 f f f x2dx lt 00 L J CSE 6740 Lecture 1 p20l Parameters rA model is an instance of the model class corresponding to7 a particular setting of the parameters 6 Confusingly sometimes when we say model we mean the model class 6 true or best parameter value 6 estimated parameter value A A fX fX 6 estimated function L J CSE 6740 Lecture 1 p21l Loss Error Function VD 7 4 ma RX Y fltXgtgt2 Classification example efine a loss error function Regression example A 5 Lac fX 10 7A RX Density estimation example 6 LY f X 2 log Pr X Y L J CSE 6740 Lecture 1 p22l Learning and Prediction 7 rGeneraizationtestprediction error The expected error on a new test data point 7 E E LYfX o Learningestimationtrainingdesign try to find gsuch that E is minimized requires an optimizer and a generalization mechanism Predictiontesting apply fto predict Y for a new test set Note that both of these are done on a computer May be significant computations sometimes requiring an efficient algorithm just to evaluate the models L J CSE 6740 Lecture 1 p23l Some Questions Which notion of error should we use loss functions 7 How do we ensure that the error on future data is minimized generalization Which modelmethod should we use for our data model selection hypothesis testing What will the error of our method be on future data error estimation confidence band learning theory Are there methods that are optimal under various assumptions asymptotic statistics What will our method do when its assumptions don t hold robustness J CSE 6740 Lecture 1 p24l VA 9 9 nswer logically speaking What is Machine Learning 7 Statistics m the science of inference from data Machine learning m multivariate statistics computational statistics Multivariate statistics m prediction of values of a function assumed to underlie a multivariate dataset Computational statistics m computational methods for statistical problems aka statistical computation statistical methods which happen to be computationally intensive Data Mining m exploratory data analysis particularly with massivecomplex datasets J CSE 6740 Lecture 1 p25l Inference 7 rThe process of using data to infer the distribution or some aspect of it that generated the data Main types of inference problems o Point estimation o Confidence sets o Hypothesis testing Machine learning is mostly about point estimation L J CSE 6740 Lecture 1 p26l What is Machine Learning rAnswer culturally speaking T 9 Statistics theory of inference asymptotics notjust point estimation Machine learning within point estimation more emphasis on classification implicitly nonparametric and computational Data mining practical interpretation and discovery applicationoriented sometimes naive You can ask me about the main conferences and journals in each of these areas L J CSE 6740 Lecture 1 p27l Multivariate statistics bbbbb b History of Machine Learning 7 Pattern recognition statistical EE classification highdimensional vision speech information theory Pattern recognition syntactic EE nonvector data Al CS decision trees Cognitive scientists neural nets Physicists statistical physics dynamical systems analogies CS theorists learning theory A CSE 6740 Lecture 1 p28l History of Machine Learning 7L 7 o Return to parametric statistics and Al graphical models graph computations ately 9 Return to pattern recognition kernel machines convex optimization computations Return to asymptotic statistics ensemble methods 9 Return to multivariate statistics manifolds kriging linear algebra computations o I hope Return to nonparametric statistics and EE estimation theory physicsbased and geometric computations L J CSE 6740 Lecture 1 p29l Growth of Machine Learning rLast 10 years 7 o Applications in industry data mining 9 Applications in science computational biology 9 Fastgrowing presence in Al statistics applied math Reasons 9 Data is everywhere This phenomenon is growing 9 Many modeling problems are more easily cast as data problems 9 Both widely useful and intellectually rich mathematics computation L J CSE 6740 Lecture 1 p30 Review of Syllabus VB 7 2 How do I learn a simple Gaussian Probability random variables distributions estimation convergence and asymptotics confidence intervals asic concepts of ML illustrated by 12 ML methods 3 How do I learn a mixture of Gaussians MoG Likelihood the EM algorithm for MoG i generalization model selection crossvalidation kmeans ii hidden Markov model HMM iii 4 How do I learn any density Parametric vs nonparametric estimation Sobolev and other spaces L2 error kernel density estimation KDE iv optimal kernels KDE theory L J CSE 6740 Lecture 1 p31 Review of Syllabus rBasic concepts of ML illustrated by 12 ML methods 7 5 How do I predict a continuous variable regression Linear regression v regularization ridge regression and LASSO vi local linear regression vii conditional density estimation 6 How do I predict a discrete variable classification Bayes classifier naive Bayes ix generative vs discriminative perceptron x weight decay linear support vector machine SVM xi nearestneighbor classifier xii and theory L J CSE 6740 Lecture 1 p32l Review of Syllabus re 7 Which loss function should I use Maximum likelihood estimation theory L2 estimation L2 MoG Bayesian estimation Bayesian MoG minimax and decision theory Bayesianism vs frequentism 8 Which model should I use AIC and BIC VapnikChernonenkis theory crossvalidation theory the bootstrap eneral theory and model frameworks of ML L J CSE 6740 Lecture 1 p33l Review of Syllabus re 9 How can I learn fancier combined models Bagging stacking boosting sieve theory eneral theory and model frameworks of ML 10 How can I learn fancier nonlinear models Generalized linear models logistic regression Kolmogorov s theorem generalized additive models kernelization reproducing kernel Hilbert spaces nonlinear SVM Gaussian process regression 11 How can I learn fancier compositional models Recursive models decision trees hierarchical clustering neural networks backpropagation deep belief networks graphical models mixtures of HMM s conditional random field maxmargin Markov network loglinear models grammars J CSE 6740 Lecture 1 p34l Review of Syllabus TF 7 12 How do I reduce or relate the features Feature selection VS dimensionality reduction wrapper methods for feature selection causality vs correlation partial correlation Bayes net structure learning urther common ML problems and solutions 13 How do I create new features principal component analysis PCA ICA multidimensional scaling manifold learning supervised dimensionality reduction metric learning 14 How do I reduce or relate the data Clustering biclustering constrained clustering association rules and market basket analysis rankingordinal regression link analysis relational data L J CSE 6740 Lecture 1 p35l Review of Syllabus VF 15 How do I treat time series ARMA Kalman filters and statespace models particle filters functional data analysis changepoint detection crossvalidation for time series 7 urther common ML problems and solutions 16 How do treat nonideal data Covariate Shift class imbalance missing data irregularlysampled data measurement errors anomaly detection robustness L J CSE 6740 Lecture 1 p36 Review of Syllabus T 7 G 17 How do I optimize the parameters Unconstrained VS constrainedconvex optimization derivativefree methods first and secondorder methods backfitting natural gradient bound optimization and EM eneral computational frameworks for ML 18 How do optimize linear functions Computational linear algebra matrix inversion for regression singular value decomposition SVD for dimensionality reduction 19 How do optimize with constraints Convexity Lagrange multipliers the KKT conditions interior point method SMO algorithm for SVM s L J CSE 6740 Lecture 1 p37l Review of Syllabus VG 7 20 How do evaluate deeplynested sums Exact graphical model inference variational bounds on sums approximate graphical model inference expectation propagation eneral computational frameworks for ML 21 How do evaluate large sums and searches Generalized Nbody problems GN P s hierarchical data structures nearestneighbor search fast multipole methods Monte Carlo integration Markov Chain Monte Carlo Monte Carlo SVD 22 How do treat even larger problems Paralleldistributed EM paralleldistributed GNP s stochastic gradient descent L online learning A CSE 6740 Lecture 1 p38 Review of Syllabus T 7 R 23 How do apply all this in the real world Overview of the parts of ML choosing between the methods to use for each task prior knowledge and assumptions exploratory data analysis and information visualization evaluation and interpretation using confidence intervals and hypothesis tests ROC curves where the research problems in ML are eaIworld application of ML L J CSE 6740 Lecture 1 p39l Books rRequired T p All of Statistics Wasserman 9 The Elements of Statistical Learning Hastie Tibshirani and Friedman Pattern Recognition and Machine Learning Bishop L J CSE 6740 Lecture 1 p40 Hidden Messages of This Course VI 7 will emphasize the distinctions between 9 logical and cultural 9 statistics and computation 9 principles and methods 9 theoretical and practical I will also 9 blur cultural lines 9 avoid current trends and dogma 9 focus on theory that affects practice L J CSE 6740 Lecture 1 p41 Grading V 7 75 assignments implement and test ML methods in C on real data textimages stock market astronomy creative components contribute to MLPACK 9 25 final on entire class L J CSE 6740 Lecture 1 p42l How Hard Will This Course Be If Pace Fast But hopefully clear 7 9 Mathematical but few proofs to be written 0 Lots of implementationexperimentation work roughly equivalent to writing one paper L J CSE 6740 Lecture 1 p43l Main Things You Should Know 7 7 9 Main goal of ML 9 Tasks ofML 9 Parts ofML 9 Whether this course is for you 9 If you re taking it expectations of the course L J CSE 6740 Lecture 1 p44l