## DATA ANALYSIS TECHNIQUES STUDY GUIDE QUESTION AND ANSWERS

by: DERRICK SHIJOSO

# DATA ANALYSIS TECHNIQUES STUDY GUIDE QUESTION AND ANSWERS CS 201 intro into computer science

DERRICK SHIJOSO

This highlights some of the questions coming in the next exam
COURSE
CS 201 intro into computer science
PROF.
Mr. Burns
This 1 page Study Guide was uploaded by DERRICK SHIJOSO on Wednesday July 27, 2016. The Study Guide belongs to CS 201 intro into computer science at Massachusetts Institute of Technology (MIT) taught by Mr. Burns in Summer 2016.

Date Created: 07/27/16
BIT323: Data Analysis Techniques Answer all Questions Time allowed: 1 hour ================================================== ============================== 1. Define and show the various facets of data analysis (3 marks) Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains. 2. Write EDA and CDA in full and differentiate between them (4 marks) EDA - exploratory data analysis which focuses on discovering new features in the data CDA - confirmatory data analysis and it focuses on confirming or falsifying existing hypotheses. 3. List any two types of data and give two typical examples for each type (4 marks) Quantitative data: data that is a number a. Often this is a continuous decimal number to a specified number of significant digits b. Sometimes it is a whole counting number Categorical data: Data that is one of several categories Qualitative data: data is a pass/fail or the presence or lack of a characteristic 4. Identify and illustrate any four quality tests done in the initial data analysis given the following student ages in years: 7, 15, 3, 21, 18, 22, 23, 16, 37, 20, 18, 17, 21, 19, 18, 22, 18, 19, 23, and 20. (9 marks) Quality of data: the candidate may use any one each from each of the following four areas of quality tests. • frequency counts, • descriptive statistics (mean, standard deviation, and median), • normality (skewness, kurtosis, frequency histograms, normal probability plots), • Associations (correlations, scatter plots). • Checks on data cleaning • Analysis of missing observations • Analysis of extreme observations • Comparison and correction of differences in coding schemes

