Data analysis

Data analysis 150 150 Affordable Capstone Projects Written from Scratch

POLI 380 (002) 2017 Assignment 2

Due Monday  March 5th at  7pm on the course site


There are FOUR questions for a total of 15 points – points per question are indicated in parentheses before each question.


In none of this should you paste in your Stata results to give us a string of numbers to look at. Even if it looks OK when you paste it, it doesn’t look good when we grade it. Work numbers into sentences when asked to do so.


You may work on the assignment with other students, but all answers must be written up individually.  Answers that are substantially identical to those of another student will be treated as plagiarism.


This assignment is going to use TWO different datasets!

The first is the same as last time: your individual sample of the Census Microdata File.

For question 2-5 you will use the American National Election Study from 2016. It is posted on Canvas under /modules/course materials under ‘course materials’. (Make sure you used 2016 not 2012).


After you’ve done the answers in a document that you save for yourself, submit the answers in the appropriate question box on Canvas.


Q1: Proportion mean & CI census data. [2pts]


(1 point) a. Use your census dataset sample to estimate the NUMBER (not the percentage) of people living in Canada who are visible minorities.  Assume the total population of Canada is exactly 35 million.   (You have no other source to help you estimate the number, just your sample).  Use the variable ‘vismin’.


(1 point) b. Indicate how far away from the true number of visible minority people (i.e. not the percentage) you would expect to be, 19 times out of 20.
Say: ± ______ number of people. (NOT ± %) You can calculate this using the formula for the standard error of a proportion and then use that percentage to calculate the number of Canadians, as you will have done in question a.


In Canvas, just enter one number for the number of visible minority people and another number for amount you expect to be away from the true mean (+-). Separate them with a semi-colon.





Questions 2-4: American National Election Study.


Now switch to the 2016 American National Election Study. This survey was conducted during the 2016 American election.



Draw a random sample of 3,500 cases from the dataset. That way you will all get different samples that I can have my computer replicate.

First, set the random number seed by typing:  set seed courseidnumber    (where you replace ‘courseidnumber’ with the same number as your course id (also the number from your census data set), NOT your real student number).

Use the command sample:  sample 3500, count.

If you do not include “count” in your command, Stata thinks you want 3500% of your sample and won’t be able to do anything.

Now use the separate command count to double check that you now have 3500 cases to work with:


Stata should simply report: 3500. (If it’s close, it’s ok).



Q2: Generational differences on gun control? [5pts]


With gun control in the news, let’s look at support for gun control across different generations. To start, recode the age variable (V161267) into four different generations. Baby boomers (52-70), Gen X (35-51), Millennials (21-34), & ‘next’ (18-20). To create this new variable, that will have four values, you’ll need to use an ‘if statement’. Have a look at section 3.1 of the workbook for one way to do it (note that section deals with a dummy variable, which can only have the values 0 or 1. You are making a variable with four values, but the section has examples of how to use ‘if statements’).


The dependent variable is V161187 which measures individuals’ positions on ease of access to guns. Recode it so that 2=”easier”, 1 = “about the same” and 0 = “more difficult”.


In a single, concise, and engaging paragraph, report the mean value of the gun control variable across the four generations (so 4 means). Your audience is someone reading a newspaper article or op-ed. You’ll need to explain, in general terms, what the variables measure and what different values/scores mean.




Q3 : Economic outlook and vote choice in the 2016 election [5 points]


Next let’s look at the relationship between individuals’ concern about the economy and vote choice. First find V161141x, which measures respondents’ beliefs about what the economy will be like over the year following the election. Recode appropriate cases to be ‘missing’.


Next use V162034a to create a binary measure of whether people voted for Donald Trump (=0) or Hillary Clinton (=1).


Now run a crosstab telling us the percentage of people who voted for Clinton or Trump within each level of the economy variable. Report results of this crosstab in a clear and compelling paragraph. Use some of the results from the table, but not all when explaining what you found(there should be 10 cells in the table).



Q4 : Interpreting a p-value [3pts]


Create a variable that indicates whether people live in Washington State or Oregon (everyone else will be missing). Use V161010e to start and look for the states “WA” and “OR”. Note that this is a ‘string’ variable not a numeric one.


Run a crosstab using the vote choice variable you created in Question 3 and the Wash/Oregon variable. When you run the cross tab be sure to use the “,chi” command to get a p-value. In a single sentence, report and interpret the p-value for an intelligent reader who knows little about statistics. Your answer should only talk about the p-value and what it means. Don’t discuss the percentages present in your crosstab.