Fraud detection A credit card bank is investigating the incidence of fraudulent card use. The bank suspects that the type of product bought may provide clues to the fraud. To examine this situation, the bank looks at the Standard Industrial Code (SIC) of the business related to the transaction. This is a code that was used by the U.S. Census Bureau and Statistics Canada to identify the type of every registered business in North America.2 For example, 1011 designates Meat and Meat Products (except Poultry), 1012 is Poultry Products, 1021 is Fish Products, 1031 is Canned and Preserved Fruits and Vegetables, and 1032 is Frozen Fruits and Vegetables.
A company intern produces the following histogram of the SIC codes for 1536 transactions:
He also reports that the mean SIC is 5823.13 with a standard deviation of 488.17.
a) Comment on any problems you see with the use of the mean and standard deviation as summary statistics.
b) How well do you think the Normal model will work on these data? Explain.
Psy 1003 Lec. 10 Regression/ Prediction Ch. 12 PhD Webcomics www.phdcomics.com Regression or Prediction ● Psychologists and other researchers seek to construct mathematical rules to predict an individual’s score on a variable based off of another variable ◦ Using MCAT scores to predict successfully graduating medical school ◦ Or SAT score to predict College GPA ● So regression is using the a correlation in reverse to predict a score on a variable Regression Predictor and Criterion ● Predictor variables are X ● Criterion variables are Y Linear regression ● Predicted Raw score (criterion)= regression constant + raw score regression coefficient* prediction raw score Linear regression cont. ● Regression constant (a) ◦ Predicted raw score on c