### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# STATISTICS FOR SOCIOLOGISTS SOCI 708

UNC

GPA 3.62

### View Full Document

## 37

## 0

## Popular in Course

## Popular in Sociology

This 11 page Class Notes was uploaded by Rebeka Bruen on Sunday October 25, 2015. The Class Notes belongs to SOCI 708 at University of North Carolina - Chapel Hill taught by Francois Nielsen in Fall. Since its upload, it has received 37 views. For similar materials see /class/228701/soci-708-university-of-north-carolina-chapel-hill in Sociology at University of North Carolina - Chapel Hill.

## Similar to SOCI 708 at UNC

## Popular in Sociology

## Reviews for STATISTICS FOR SOCIOLOGISTS

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/25/15

Stata Handout for Module 4 Helpful tips 1 Creating a variable involves the generate or gen command For example gen educ tells Stata to create a variable called educ and to make it numeric the tells Stata this See the extra material at the end of this module for more about generating variables If you don t want to keep the variable you created you can delete it by typing drop and then the variable name For example drop educ tells Stata to drop the variable above If you have a really long tedious command like a graphic with 6 overlaid plots you will probably NOT want to use the graphics interface because typing the commands is faster once you know what they are Even better you will probably want to copy and paste commands so you don t have to keep retyping the same ones Sometimes it will help to put the command in a text le using WordPad or NotePad and using the search and replace function Some of the graphics commands are complicated in Stata Don t worry about this because graphics are really not the point of the class N 9 4 Tables similar to those created in the slides can be made easily in Stata To take an example that is not in the lecture notes if you open the dataset Chile you can make a table using two categorical variables region and oil I m choosing these to show you simply because they are categorical variables this is not theorydriven You can simply write tab educ vote to get a simple table You can choose to get the row percentages column percentages and or cell percentages by writing row col cell or all three row col cell after a comma For example to get row percentages From the dataset Chile tab educ vote row 1 frequency 1 1 row percentage l 1 vote education l A N NA U Y l Total 1 l 0 0 0 1 0 l 1 l 000 000 000 10000 000 1 10000 NA 1 0 2 1 3 5 l 11 l 000 1818 909 2727 4545 1 10000 P l 52 266 71 295 422 1 1106 l 470 2405 642 2667 3816 1 10000 PS 1 32 224 24 52 130 l 462 l 693 4848 519 1126 2814 1 10000 S l 103 397 72 237 311 1 1120 l 920 3545 643 2116 2777 1 10000 Total 1 187 889 168 588 868 1 2700 l 693 3293 622 2178 3215 10000 If there is another categorical variable and you would like to see what the same table looks like for each value of that Variable For example by sex tells it to make two tables one for each value of sex But first I sorted the data according to sex sort sex sort sex by sex tab educ Vote row 22222222222222 22 Key l iiiiiiiiiiiiii iii frequency row percentage vote education A N NA U Y Total NA 0 2 0 l 4 l 7 l 000 2857 000 l429 5714 l0000 P l 32 ll2 36 l77 250 l 607 l 527 l845 593 29l6 4ll9 l0000 PS l5 86 7 3l 60 l99 754 4322 352 l558 3015 l0000 S l 57 l63 27 l53 l66 566 l007 2880 477 2703 2933 l0000 Total l04 363 70 362 480 l379 754 2632 508 2625 3481 l0000 rgt sex M 77777777777777 77 Key l iiiiiiiiiiiiii iii freguency row percentage vote education A N U Y Total l l 0 0 0 l 0 l l l 000 000 000 l0000 000 l0000 NA 0 0 l 2 l l 4 l 000 000 2500 5000 2500 l0000 P l 20 l54 35 ll8 l72 499 40l 3086 70l 2365 3447 l0000 PS 1 17 138 17 21 70 1 263 1 646 5247 646 798 2662 1 10000 S 1 46 234 45 84 145 1 554 1 830 4224 812 1516 2617 1 10000 Total 1 83 526 98 226 388 1 1321 1 628 3982 742 1711 2937 1 10000 From the dataset Prestige Doing the regression of prestige on income and type requires a little preparation in Stata In module 2 we found out that we could not make a bar chart of vote in Stata unless we created dummy variables for every value of vote We need to do that again in this module in order to analyze the relationship of prestige to income and type of occupation And again it turns out that R does not require this step Like vote type is a categorical variable tab type type 1 Freq Percent Cum NA 1 4 392 392 bc 1 44 4314 4706 prof 1 31 3039 7745 wc 1 23 2255 10000 Total 1 102 10000 The table shows there is a NA value which we will want to turn into a missing value rep1ace type if type NA Generate new categorical variables using the 3 remaining values tab typeinum gentype You will see that 3 new variables typel type2 and type3 have just been created These are dummy variables and correspond to the three categories of type Dummy variables have values of either 0 or 1 and are also called indicator variables and dichotomous variables To see the variables we just created automatically you can get tables of each variable To save a step you can just write tabl and list all the variables you would like a table for in a single command tab1 type1 type2 type3 gt tabulation of type1 typeinumB l lue Collar l Freq Percent Cum 0 l 58 5686 5686 1 l 44 4314 10000 Total 1 102 10000 gt tabulation of type2 typeinum l rofessional l Freq Percent Cum 0 l 71 6961 6961 l l 31 3039 10000 Total 1 102 10000 gt tabulation of type3 typeinu W l nite Collar l Freq Percent Cum 0 l 79 7745 7745 l l 23 2255 10000 Total 1 102 10000 To make it easier on ourselves let s rename the variables so we can see at a glance what they refer to Do this with the command rename e rename oldvar newvar var stands for variable rename typel typeiBC rename type2 typeiPROF rename type3 typefWC Now you can regress prestige on income and type Because type is a categorical variable rather than an ordinal or ratio variable you have to put in the separate dummy variables You can t just put in the original variable type because the categories are not numbers they are categorical values blue collar professional white collar In this example only typeiPROF and typefWC are listed as independent variables which tells Stata that typeiBC is to be understood as the reference category reg prestige income typeiPROF typefWC Source l SS MS Number of obs 102 F 3 98 10929 Model 1 230161084 3 767203612 Prob gt 00000 Residual l 687931774 98 701971198 R squared 07699 1 Adj R squared 07628 Total 1 298954261 101 295994318 Root MSE 83784 prestige l Coef Std Err t Pgtltl 95 Conf Interval income 1 0014871 0002428 612 0000 0010052 0019691 typeiPROF l 2442513 2327585 1049 0000 1980611 2904414 typefWC 1 7010142 2125056 330 0001 2793037 1122725 icons 1 2771983 1749331 1585 0000 2424834 3119132 The following method is based on getting predictions based on the regression equation just obtained You can try this out if you re feeling adventurous But don t worry too much about graphics at this stage Dothesmnereges onreg prestige income typefPROF typefWC Then these as separate commands e the rst command obtains predicted values based on the regression equation the next command creates new variables for each category of type and the graph command tries to put this all together predict yhat separate yhat bytype The predict command predicts the yvalues which are called yhat The separate command creates 3 new yhat variables corresponding to the 3 values of type This graph lets you see the different lines for different values of type by graphing the 3 yhats and then laying a regression line over the whole thing l t graph twoway scatter prestige yhatl yhat2 yhat3 income connecti l l l msymbolo i i i sort lfit prestige income O O 7 O 7 11 O 7 O O 7 v O 7 l l l l l l l 0 5000 10000 15000 20000 25000 income I prestige yhat type be yhat type prof yhat type wc Fitted values To use an interaction in your regression equation you can create a new variable for each interaction You use the gen command and create a new variable that is the multiplication of the two interacting variables For this example we want the interaction of income with two levels of values of the categorical variable type gen PROFiincome typeiPROF k income gen WCiincome typefWC k income When you put these interaction terms in an equation you have to be sure to also include the regular variables that were multiplied to get the interaction income typeiPROF and typefWC reg prestige income typeiPROF typefWC PROFiincome WCiincome Source l SS df MS Number of obs 102 1 F 5 96 9060 Model l 246678187 5 493356373 Prob gt F 00000 Residual l 522760743 96 54454244 R sguared 08251 1 Adj Rrsguared 08160 Total 1 298954261 101 295994318 Root MSE 73793 prestige l Coef Std Err t Pgtltl 95 Conf Interval income 1 0038699 000492 787 0000 0028932 0048466 typeiPROF l 4360598 4041315 1079 0000 3558404 5162793 type WC 1 175677 5174322 340 0001 7296751 2783865 PROFAincome 1 0030247 0005512 549 0000 0041188 0019306 WCiincome l 0020176 000947 213 0036 0038974 0001378 icons l 1531756 277366 552 0000 9811886 2082323 To see this as three lines conditional on type I had to predict yhat again This time I labeled it yhatiinter to indicate that these yhats were predicted based on the regression equation with an interaction in it predict yhatiinter separate yhatiinter bytype graph twoway scatter prestige yhatiinterl yhatiinter2 yhatiinter3 income connecti l l l msymbolo i i i sort O O i O 2 11 O i O O i v 0 O i l I l l l l l l 0 5000 10000 15000 20000 25000 income 0 prestige yhatinter type be yhatinter type prof yhatinter type we Dot plots don t seem that great in Stata Maybe one of you can improve on this Prestige data graph dot sum typeiBC typeiPROF typefWC Creates this not very useful graph showing that there are about 23 white collar workers 31 professionals 44 blue collar workers 10 20 30 40 0 sum oftypeBC I sum oftypePROF 0 sum oftypeWC EXTRA Data Management Exercise Turning a string variable into a numeric variable As we saw earlier having data in a string variable can sometimes make things a little harder in Stata We encountered this with the variable type in the Prestige dataset This variable has values bc prof and wc We can make this variable into a numeric variable though it doesn t actually solve the problems with making a bar chart or using it in regression those things still appear to be easier in R Here is one way to do it This is our old variable tab type type l Freq Percent Cum NA 1 4 392 392 bc 1 44 4314 4706 prof l 31 3039 7745 WC 1 23 2255 10000 Tota1 l 102 100 00 First generate a new variable Let s call it typeinum We want to make sure Stata knows it s numeric so we tell Stata this as follows gen typeinum If we had written gen typeinum Stata would have thought it was a string variable Two quotation marks indicate a string variable This was true before when we had to put quotation marks for the value of a string variable in the if clause from Module 3 handout Now we need to make sure the correct values are assigned We do this by telling Stata to replace a value in the new variable depending on the value of the old variable Here we are using an if clause 7 and using the quotation marks because the old variable is a string variable replace typeinum0 if typequotbcquot 44 real changes made The above commands tells Stata that every time there is a bc value in the variable type there should now be a zero value in the variable typeinum replace typeinum1 if typequotprofquot 31 real changes made replace typeinum2 if typequotwcquot 23 real changes made There was a NA value as well so we might as well change that too Above we turned it into missing data but maybe for some reason we want to represent it as a numeric value who knows replace typeinum3 if typequotNAquot 4 real changes made Now let s see what we have by making a table of our new variable tab typeinum typeinum Freq Percent Cum 0 l 44 4314 4314 1 l 31 3039 7353 2 l 23 2255 9608 3 l 4 392 10000 Total 102 10000 It looks ne 7 all 102 observations are there and in 4 categories But we need to label the numbers now using value labels so that people who read this know that 0 stands in for blue collar Here is how you define a label and then apply the label to the values of the variable See me if you have ques ons label def typeinum 0 quotBlue Collarquot 1 quotProfessionalquot 2 quotWhite Collarquot 3 quotNAquot lab values typeinum typeinum tab typeinum typeinum i Freq Percent Cum Blue Collar i 44 4314 4314 Professional i 31 3039 7353 White Collar i 23 2255 9608 NA i 4 392 10000 Total i 102 10000 Alternative ways of graphing The graph below is actually 7 graphs one on top of the other 3 scatter plots for the different values of type 3 plots with a tted line for the different values of type and l plot showing the tted line for all values of type Iturned legend off see end of command because the legend was not informative for this graph This graph is not as good as the one that uses yha twoway scatter prestige income if typequotbcquot mcolorred msymbolcircehoowfit prestige income if typequotbcquot coorred scatter prestige income if typequotprofquot mcolormidgreen msymboltriangehoow lfit prestige income if typequotprofquot coormidgreenscatter prestige income if typequotwcquot mcolorblue msymbolplus lfit prestige income if typequotwcquot coorblue lfit prestige income coorblack egendoff 100 80 60 4o 20 10000 15000 income 20000 25000

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I made $350 in just two days after posting my first study guide."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.