Econ 378 Data Analysis – Hire Academic Expert

1 Econ 378 Data Analysis Project Overview This project gives you hands-on experience summarizing and analyzing data of your own interest. You are welcome to use spreadsheet or statistical software such as Excel or Stata. Some major statistical databases are listed on Learning Suite, and numerous data sources are available freely on the internet. It will be easy to find something that fits the parameters of the project, but I encourage you to find something that is important to you personally, to make the project much more meaningful.1 Feel free to consult with me or with other professors if you need help finding a specific type of data. Examples that you might find interesting include: • Price data (e.g. wages, interest rates, stock returns, home values, insurance premiums) • National statistics (e.g. GDP, employment, inflation, crime, tax rates) over time or across countries or states • Sales data from a business (with permission/confidentiality, as appropriate) • Health/sports/political statistics, opinion polls, or your own experimental research2 • Your own personal finances, time use, grades, etc. This class prepares you to answer questions such as: (1) On average, how big is variable 𝑋𝑋? (2) How widely does 𝑋𝑋 vary across observations? (3) Is variable 𝑋𝑋 positive or negatively correlated with variable 𝑌𝑌, and how strong is this relationship? (4) How can I use variable 𝑋𝑋 to predict variable 𝑌𝑌? To answer these questions, you will need at least two variables, but this will not be difficult. Additional variables may make the analysis more interesting, but you will only analyze two at a time. You can also analyze multiple variables using Econometrics (Econ 388), so keep your data. Part 1 – Data Collection & Summary (+35) 1 If you lack research ideas, imagine that you have a magic crystal ball that can answer any one question of your choice. What do you wish to ask? That question is your research topic. Next, suppose that you have to answer that question on your own, but that you can ask the crystal ball for any secondary facts that will aid you in answering your big question for yourself. What more specific questions will lead you eventually to the answers you had wanted? Continue this procedure until you reach a question that is sufficiently specific (albeit several steps removed from your original interest) that it becomes feasible to collect the relevant data and get to work. 2 If you collect data from human subjects, you must take care to preserve their safety and privacy, and ensure that participation is voluntary. If you wish to publish your data or results beyond this class, you will need advance approval from the BYU Internal Review Board, who monitor compliance with federal regulations (see for more details). Start early in that case, to leave time for the approval process, and request additional time if necessary.2 1. (+15) Collect data of interest You do not need to submit your data files; just describe the data: If it is not obvious already, what exactly do the variables measure (e.g., what units)?3 How were they collected? Do you have data for the entire population of interest? Or just a sample? The first column of data should list the unit of observation (e.g. individual, firm, country, or time period). 4 For each observation, you need at least one quantitative variable (e.g. price, number of sales, age, GDP) and one binary variable (e.g. gender, race, industry, political party, sport position).5 While not required, it is often interesting to pull data from multiple sources, or to construct new variables from existing data.6 In the spreadsheet below, for example, government finance variables come from one source and a binary political variable comes from another. Per capita variables are then computed simply as ratios; growth variables are computed simply as differences (as a ratio of the original level); and additional binary variables are constructed either by reducing a quantitative variable into “high” and “low” categories (e.g. GDP growth above or below 1.5%) or by comparing two existing variables (e.g. Gov. growth > GDP growth?). Unit Original Variables Constructed Variables GDP Population Gov. Spending Republican House? Per capita GDP Per capita GDP growth GDP Growth > 1.5%? Per capita Gov. spending Per capita Gov. growth Gov. growth > GDP growth? ($ bil.) (mil.) ($ bil.) ($ thous.) (%) ($ thous.) (%) Year 2008 14,834 304 4,665 0 48.8 – – 15.3 – – 2009 14,418 307 5,179 0 47.0 -3.7% 0 16.9 10.1% 1 2010 14,779 309 5,057 0 47.8 1.7% 1 16.3 -3.2% 0 2011 15,052 312 5,116 1 48.3 1.1% 0 16.4 0.4% 0 2012 15,471 314 5,042 1 49.3 2.0% 1 16.1 -2.2% 0 2013 15,759 316 4,955 1 49.8 1.1% 0 15.7 -2.5% 0 2014 16,077 319 4,957 1 50.4 1.3% 0 15.5 -0.7% 0 2. (+3) Identify your audience Identify some audience that might find this data interesting: a policy maker, a business leader, a consumer, etc. In Part 2, you will report your findings to this individual. List any questions (at 3 For example, a humanitarian agency might rate sovereign governments as “corrupt” or not, and designate individuals as “in poverty” or not, but how are these categories assigned? What exactly do they mean? 4 You need at least three observations; larger samples increase precision. If you have trouble identifying the unit of observation, it may be that your data are actually a summary of more primitive raw data. If so, this may be unusable, as the number of observations is effectively reduced to one. 5 You can make a categorical variable binary simply by combining categories. For example, a “race” variable might have several codes for different races, but can be reduced simply to “white” and “minority”. You can also construct binary variables from quantitative variables (see below). 6 When the unit of observation is a time period (e.g. year or week), it can also double as a quantitative variable.3 least two) that this audience might have, that you believe your data can shed (at least partial) light on. 3. (+6) Summarize individual variables a. Summarize at least one binary variable by reporting the total fraction in each category. b. Summarize at least one quantitative variable by reporting the minimum, maximum, mean, and standard deviation. c. Use one binary variable to divide your data into subgroups, and report the conditional minimum, conditional maximum, conditional mean, and conditional standard deviation for this subgroup (e.g. average wages among female workers). Note: for all subsequent analysis of this project, you may use the full sample or this restricted sample, as you wish. d. Represent at least one quantitative variable graphically, using a histogram.7 4. (+6) Correlation and causation Choose two variables, and do the following: a. Identify reasons why the variables might be positively or negatively correlated. Might one cause the other to increase or decrease? Is reverse causation possible? Are there outside factors that might cause both variables to move? Predict the sign and magnitude of the correlation coefficient 𝜌𝜌 between these variables. b. For any outside factors that you identify in part a, tell what additional data could be collected and examined, to control for these outside factors. c. Compute the actual correlation coefficient, and compare it with your prediction above. 5. (+5) Graphical Summary Compare two variables graphically, using something like the following. Include labels (e.g. colorcode, axis labels, legend, etc.) so that your graphic is clear. • Scatter chart (two quantitative variables) • Double pie chart (two categorical variables) • Color-coded scatter chart (two quantitative and one categorical variable) • Bar or column chart (one categorical and one or more quantitative or categorical variables) • Line graph (one quantitative variable and time) 7 In MS Excel 2010, load the “Data Analysis” tool pack (File>Options>Add-ins for PC or Tools>Add-ins for Mac), and then select Data>Data Analysis>Histogram. Select the Input Range and Bin Range, and be sure to select the box for “Chart Output”. Note that a bar chart is not the same as a histogram.4 • Bubble chart (three quantitative variables) Briefly describe some facet of the relationship between the two variables that is apparent in the type of graphic you chose. Part 2 – Statistical Inference (+28) Do the following, stating any important assumptions that your answers rely on. 8 You do not need to write out all of your computations, but should make clear how you arrived at your answers. 1. Mean a. (+2) For at least one quantitative variable, find a point estimate of the underlying population mean 𝜇𝜇. 9 Compute a confidence interval for 𝜇𝜇, at a confidence level of your choice. 10 b. (+2) Perform a one- or two-sided test, at the level of your choice, of the hypothesis that 𝜇𝜇 is equal to a specific value of your choice. State the associated p-value. 2. Standard Deviation (OPTIONAL; must do 2 or 5 or 6) a. (+2) For at least one quantitative variable, find a point estimate of the underlying population standard deviation 𝜎𝜎. Compute a confidence interval for 𝜎𝜎, at a confidence level of your choice.11 b. (+2) Perform a one- or two-sided test, at the level of your choice, of the hypothesis that 𝜎𝜎 is equal to a specific value of your choice. State the associated p-value. 3. Proportion a. (+2) For at least one binary variable, find a point estimate of the underlying proportion 𝑝𝑝 in a particular category. Compute a confidence interval for 𝑝𝑝, at a confidence level of your choice. 12 8 This section presumes that your data represent a sample from a larger population of interest. If your data represent an entire population (e.g. all 50 of the United States), merely perform the following analysis as if it were only a sample from a much larger population (e.g. a huge pool of states, from which you have randomly drawn 50). In your write-up, make clear that you understand this distinction. 9 If you wish to type equations or mathematical symbols in MS Word 2010, hold the “alt” key and type the = sign to open the equation editor. For Greek variables, type and then spell the name of the letter (e.g. mu for 𝜇𝜇, sigma for 𝜎𝜎). 𝑥𝑥can be typed as xbar. 10 In MS Excel 2010, the formula for the cdf of a t-distribution is T.DIST(x, df, ”true”), and 𝑝𝑝-percentiles can be obtained by T.INV(p, df). 11 The formula for sample standard deviation in MS Excel 2010 is STDEV.S([data range]). The formula for a Chisquare cdf is CHISQ.DIST(x, df, ”true”), and percentiles can be obtained by CHISQ.INV(p, df). 12 In MS Excel 2010, the formula for a normal cdf is NORM.DIST(x, mu, sigma, ”true”) and percentiles can be obtained by NORM.INV(p, mu, sigma).5 b. (+2) Perform a one- or two-sided test, at the level of your choice, of the hypothesis that 𝑝𝑝 is equal to a specific value of your choice. State the associated p-value. Next, divide your data into two subgroups. Then do the following, stating any important assumptions that your answers rely on. 4. Difference of Means a. (+2) For at least one quantitative variable, find a point estimate of the difference 𝜇𝜇1 − 𝜇𝜇2 between the means of the underlying subpopulations. Compute a confidence interval for 𝜇𝜇1 − 𝜇𝜇2, at a confidence level of your choice. b. (+2) Perform a one- or two-sided test, at the level of your choice, of the hypothesis that 𝜇𝜇1 − 𝜇𝜇2 equals a specific value of your choice. State the associated p-value. 5. Ratio of Standard Deviations (OPTIONAL; must do 2 or 5 or 6) a. (+2) For at least one quantitative variable, find a point estimate of the ratio 𝜎𝜎1 2 𝜎𝜎2 2 of the variances of the underlying subpopulation distributions. Compute a confidence interval for 𝜎𝜎1 2 𝜎𝜎2 2, at a confidence level of your choice.13 b. (+2) Perform a one- or two-sided test, at the level of your choice, of the hypothesis that 𝜎𝜎1 2 𝜎𝜎2 2 equals a specific value of your choice. State the associated p-value. 6. Difference of Proportions (OPTIONAL; must do 2 or 5 or 6) a. (+2) For at least one binary variable, find a point estimate of the difference 𝑝𝑝1 − 𝑝𝑝2 between the underlying subpopulation proportions. Compute a confidence interval for 𝑝𝑝1 − 𝑝𝑝2, at a confidence level of your choice. b. (+2) Perform a one- or two-sided test, at the level of your choice, of the hypothesis that 𝑝𝑝1 − 𝑝𝑝2 is equal to a specific value of your choice. State the associated p-value. 7. Regression a. (+3) Regress one variable 𝑌𝑌 on another variable 𝑋𝑋. Report point estimates 𝛽𝛽 0 and 𝛽𝛽 1 of the intercept and slope coefficients and the coefficient of determination 𝜌𝜌2. 14 b. (+1) Give a confidence interval for 𝛽𝛽1, at the level of your choice. 13 In MS Excel 2010, the formula for the cdf of an F-distribution is F.DIST(x, df1, df2, ”true”) and percentiles can be obtained by F.INV(p, df1, df2). 14 In MS Excel 2010, regression estimates can be easily computed using Data > Data Analysis > Regressions.6 c. (+2) Perform a one- or two-sided test, at the level of your choice, of the hypothesis that 𝛽𝛽 1 equals a specific value of your choice. State the associated p-value. d. (+2) Identify a particular value 𝑥𝑥∗ of the dependent variable that is interesting. Use the regression of part (a) to predict 𝑌𝑌∗ for 𝑋𝑋 = 𝑥𝑥∗. Report a confidence interval for 𝑌𝑌∗ at the level of your choice. Part 3 –Executive Summary (+20+4) Write a short15 report or memo, addressed to the audience identified in part 1, summarizing your most interesting findings from above. Feel free also to supplement the above with additional graphics or analysis. In your report, you should do the following: 1. (+1) Clearly state the question or issue that this analysis addresses. 2. (+1) Make sure the nature of the data, including key variables, is clear. 3. (+6) Clearly explain key findings. (Graphical representations may be helpful here.) 4. (+3) Emphasize and explain the significance (i.e. practical relevance, and perhaps statistical significance) of any key results. Include any policy recommendations (e.g. shopping strategies, legal regulations, etc.) that your analysis favors. 5. (+4) Be clear and forthright about any caveats, assumptions, or limitations of your data, your analysis, or your policy recommendations, including questions of causation. Indicate what additional data or analysis would be necessary in order to provide more complete answers to the questions of interest. 6. (+5) Write cleanly (i.e. error-free) and effectively. As if your audience has only a limited knowledge of statistics, avoid overly technical jargon. (For example, units of dollars are easier to understand than standard deviations or correlation coefficients.) 7. (Bonus +4) To improve your paper’s exposition, attend a consultation at the FHSS Writing Center (1175 JFSB; or BYU Writing Center (4026 JKB; Attach a note from the writing center to verify your attendance. 15 There is no required length. Your goal is to be as clear, informative, and concise as possible.7 Economics 378 Homework Many of the homework problems below are written out fully. Starting with HW 2, some refer you to the WMS (Wackerly, Mendenhall, and Schaeffer) textbook. You may submit homework in groups of up to four students. HW 0 (+0) FERPA and Homework Groups This assignment is not awarded any points but must be completed before points will be awarded for subsequent homework. 1. Please write and sign one of the following sentences. a. “I give my permission for my corrected homework to be distributed in class.” b. “I will arrange with my designated TA to collect my corrected homework separately.” 2. Please do one of the following. a. List 1 to 5 names, including your own. This will be your designated homework group. b. Write a number between 2 and 5 (inclusive), indicating the size of group to which you wish to be assigned. (Note: group size preferences will be accommodated if possible but cannot be guaranteed. Students who do not request a group size will be assigned into groups of four.)8 HW 1 (+30) Math Preview Note: This first homework assignment is meant to be representative of the hardest math problems that you will face later in the course. If this is easy for you, that means that your math preparation is sufficient, and you can focus on learning statistics. If this is beyond your capabilities, you should consider delaying this course and take additional math first, or be prepared to devote extra time to future homework, so as to master both the underlying mathematics and the concepts specific to statistical analysis. Syllabus review 1. (+3) To make sure that you reviewed the syllabus, answer the following: a. True or False: not every homework question will be graded. b. True or False: the midterm and final together comprise 70% of your final grade. c. Which of the “tips for success” do you expect will be most important for you? Fractions 2. (+1) True or False: if 𝑤𝑤, 𝑥𝑥, 𝑦𝑦, and 𝑧𝑧 are real numbers then 𝑤𝑤/𝑥𝑥 𝑦𝑦/𝑧𝑧 = 𝑤𝑤/𝑦𝑦 𝑥𝑥/𝑧𝑧 . Factoring polynomials 3. (+1) Factor the following polynomials, or state that they cannot be factored. a. 9𝑥𝑥2 − 4𝑦𝑦2 b. 3𝑥𝑥2 + 6𝑥𝑥𝑥𝑥 + 𝑦𝑦2 c. 2𝑥𝑥2 − 4𝑥𝑥𝑥𝑥 + 2𝑦𝑦29 Solving systems of equations 4. (+1) Solve the following system of equations for 𝑎𝑎 ≤ 𝑏𝑏: 𝑎𝑎 + 𝑏𝑏 2 = 30 (𝑏𝑏 − 𝑎𝑎)2 12 = 12 Factorials 5. (+2) Answer the following: d. Simplify 100! 98! . Then evaluate. e. Simplify 25! 22!0!3! . Then evaluate. f. Write the following in factorial notation: i. 5 ⋅ 6 ⋅ 7 ii. 10∙9∙8 3∙2∙1 Exponentials, Logarithms 6. (+2) Simplify the following, or state that the expression cannot be simplified: a. ln (𝑒𝑒2 ∙ 𝑒𝑒−3 ∙ 𝑒𝑒4) + ln(𝑥𝑥2) − ln(𝑥𝑥) b. 𝑒𝑒2 ln�𝑥𝑥2�−3 ln(𝑥𝑥+1)+1 7. (+1) Solve 1 − 𝑒𝑒𝑥𝑥⁄2 = .75 for 𝑥𝑥. Summations, Products10 8. (+1) Evaluate ∑ 𝑛𝑛! 𝑘𝑘!(𝑛𝑛−𝑘𝑘)! 𝑝𝑝𝑘𝑘(1 − 𝑝𝑝) 4 𝑛𝑛−𝑘𝑘 𝑘𝑘=3 , where 𝑛𝑛 = 4 and 𝑝𝑝 = 0.25. 9. (+2) Simplify ln �∏ 1 𝜎𝜎√2𝜋𝜋 𝑒𝑒−1 2 � 𝑥𝑥𝑖𝑖−𝜇𝜇 𝜎𝜎 � 2 𝑛𝑛 𝑖𝑖=1 � to be a function of ∑ 𝑥𝑥𝑖𝑖 𝑛𝑛 𝑖𝑖=1 and ∑ 𝑥𝑥𝑖𝑖 𝑛𝑛 2 𝑖𝑖=1 (with no other 𝑥𝑥𝑖𝑖) where 𝑛𝑛, 𝜋𝜋, 𝜇𝜇, and 𝜎𝜎 are constants. 10. (+2) Evaluate 𝑚𝑚 and 𝑣𝑣, defined as follows, for 𝑛𝑛 = 4 and 𝑥𝑥1 = 100, 𝑥𝑥2 = 120, 𝑥𝑥3 = 80, and 𝑥𝑥4 = 100. a. 𝑚𝑚 = 1 𝑛𝑛 ∑ 𝑥𝑥𝑖 𝑛𝑛 𝑖𝑖=1 b. 𝑣𝑣 = 1 𝑛𝑛−1 ∑ (𝑥𝑥𝑖𝑖 − 𝑚𝑚) 𝑛𝑛 2 𝑖𝑖=1 11. (+2) True (T) or not always true (F): g. ∑ 𝑥𝑥𝑖𝑖𝑦𝑦𝑖𝑖 𝑛𝑛 𝑖𝑖=1 = (∑ 𝑥𝑥𝑖𝑖 𝑛𝑛 𝑖𝑖=1 )(∑ 𝑦𝑦𝑖𝑖 𝑛𝑛 𝑖𝑖=1 ); that is, a summation symbol can be distributed through a product. h. (∑ 𝑥𝑥𝑖𝑖 𝑛𝑛 𝑖𝑖=1 )2 = ∑ 𝑥𝑥𝑖𝑖 𝑛𝑛 2 𝑖𝑖=1 ; that is, an exponent can be distributed through a sum. i. ∏ 3𝑥𝑥𝑖𝑖 𝑛𝑛 𝑖𝑖=1 = 3 ∏ 𝑥𝑥𝑖𝑖 𝑛𝑛 𝑖𝑖=1 ; that is, a coefficient can be pulled outside of a product symbol. j. ∏ (𝑥𝑥𝑖𝑖 + 𝑦𝑦𝑖𝑖) 𝑛𝑛 𝑖𝑖=1 = (∏ 𝑥𝑥𝑖𝑖 𝑛𝑛 𝑖𝑖=1 )(∏ 𝑦𝑦𝑖𝑖 𝑛𝑛 𝑖𝑖=1 ); that is, a product symbol can be distributed through a sum. k. If 𝑥𝑥= 1 𝑛𝑛 ∑ 𝑥𝑥𝑖𝑖 𝑛𝑛 𝑖𝑖=1 then 1 𝑛𝑛 ∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥) 𝑛𝑛 𝑖𝑖=1 = 0; that is, on average, the numbers in a list are no higher and no lower than the average of the numbers in the list. Limits 12. (+1) Answer the following: l. Find the limit of 1 𝑛𝑛 as 𝑛𝑛 → ∞. m. Find the limit of 𝑛𝑛−1 𝑛𝑛 𝑓𝑓(𝑛𝑛) as 𝑛𝑛 → ∞, where 𝑓𝑓(𝑛𝑛) → 100.11 Derivatives 13. (+2) Find 𝑓𝑓′ (𝑥𝑥) for each of the following functions of 𝑥𝑥, where 𝑎𝑎 and 𝑎𝑎𝑖𝑖 are constants. a. 𝑓𝑓(𝑥𝑥) = [1 + ln(𝑥𝑥)]2 b. 𝑓𝑓(𝑥𝑥) = ∑ 𝑎𝑎𝑖𝑖𝑥𝑥 𝑛𝑛 𝑖𝑖=1 14. (+2) Find 𝑥𝑥 to maximize 𝑓𝑓(𝑥𝑥) = ln � 1 𝜎𝜎√2𝜋𝜋 𝑒𝑒−1 2 � 𝑥𝑥−𝜇𝜇 𝜎𝜎 � 2 � where 𝜋𝜋, 𝜇𝜇, and 𝜎𝜎 are positive constants. Integrals 15. (+2) Evaluate the following definite integrals, where 𝑎𝑎, 𝑏𝑏, and 𝑐𝑐 are constants. c. ∫ (𝑎𝑎𝑎𝑎2 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐)𝑑𝑑𝑑𝑑 1 −1 d. ∫ � 1 2 𝑦𝑦2� � 3 2 𝑦𝑦2 + 4� 𝑑𝑑𝑑𝑑 1 0 16. (+2) Evaluate ∫ ∫ 𝑥𝑥𝑥𝑥𝑥𝑥𝑥𝑥𝑥𝑥𝑥𝑥 10 0 1 0 . 17. (+3) Differentiate 𝑓𝑓(𝑥𝑥) = −𝑒𝑒−𝑥𝑥. Then use this to find ∫ 𝑒𝑒−𝑥𝑥𝑑𝑑𝑑𝑑 ∞ 0 . HW 2 (+19) Probability, Combinatorics Set Notation 1. (+2) WMS 2.8 (pg. 26) 2. (+3) Let 𝐴𝐴 = {1,2,3,4,5}, 𝐵𝐵 = {2,4,6,8,10}, and 𝑆𝑆 = {1,2, … ,10}. Find the following:12 a. 𝐴𝐴 ∩ 𝐵𝐵 b. 𝐴𝐴 ∪ 𝐵𝐵 c. 𝐴𝐴 ∩ 𝐵𝐵� d. 𝐴𝐴 ∪ 𝐵𝐵� e. (𝐴𝐴 ∪ 𝐵𝐵) ���������� f. (𝐴𝐴 ∩ 𝐵𝐵) ���������� 3. (+2) State whether each of the following is always true (T) or not always true (F), where 𝐴𝐴 and 𝐵𝐵 are sets: a. 𝐴𝐴���∩��� 𝐵𝐵� = 𝐴𝐴∩ 𝐵𝐵� b. 𝐴𝐴���∩��� 𝐵𝐵� = 𝐴𝐴∪ 𝐵𝐵� c. 𝐴𝐴���∪��� 𝐵𝐵� = 𝐴𝐴∩ 𝐵𝐵� d. 𝐴𝐴���∪��� 𝐵𝐵� = 𝐴𝐴∪ 𝐵𝐵� Probability 4. (+2) Three students try independently to solve a difficult math problem. Individually, each is successful with . 6 probability. What is the probability that at least one is successful? 5. (+1) State whether the following are true (T) or not always true (F), where 𝐴𝐴 and 𝐵𝐵 are events: a. If 𝐴𝐴 and 𝐵𝐵 are mutually exclusive, then they are independent. b. If event 𝐴𝐴 and event 𝐵𝐵 both occur with positive probability and 𝐴𝐴 ⊆ 𝐵𝐵 then 𝐴𝐴 and 𝐵𝐵 are not independent.13 Combinatorics 6. (+3) Tasks A and B each require ten workers. Suppose that 20 workers, including five minority workers, are divided randomly into two groups of ten. What is the probability that all five minority workers are assigned to task B? 7. (+3) WMS 2.55 8. (+3) WMS 2.56 [For additional practice see WMS 2.2, 4*, 6, 7, 11, 15, 17, 25, 26, 29*, 31*, 34, 35, 38*, 39, 50*, 53, 60, 63*, 74, 86, 96* and WMS examples 2.7*, 10*, 11, 12] HW 3 (+23) Conditional Probability and Bayes’ Rule Conditional Probability and Independence 1. (+2) Let 𝐴𝐴 and 𝐵𝐵, respectively, denote the events that a worker is employed, and that a worker is a minority. In words, interpret the following: a. 𝑃𝑃(𝐴𝐴∩ 𝐵𝐵) b. 𝑃𝑃(𝐴𝐴|𝐵𝐵) c. 𝑃𝑃(𝐵𝐵|𝐴𝐴 ) 2. (+1) For events 𝐴𝐴, 𝐵𝐵, and 𝐶𝐶, state whether each of the following is always true (T) or not always true (F): a. 𝑃𝑃(𝐴𝐴|𝐵𝐵) + 𝑃𝑃(𝐴𝐴𝐵𝐵) = 114 b. If A and B are independent events then 𝐴𝐴 is independent of 𝐵𝐵. c. If 𝑃𝑃(𝐴𝐴) > 𝑃𝑃(𝐵𝐵) then 𝑃𝑃(𝐴𝐴|𝐶𝐶) > 𝑃𝑃(𝐵𝐵|𝐶𝐶). 3. (+1) Use the definition of conditional probability to prove that the following two definitions of independence are equivalent: if 𝑃𝑃(𝐴𝐴|𝐵𝐵) = 𝑃𝑃(𝐴𝐴) then 𝑃𝑃(𝐵𝐵|𝐴𝐴) = 𝑃𝑃(𝐵𝐵). 4. (+2) WMS 2.71 5. (+2) WMS 2.95 6. (+2) WMS 2.96 7. (+3) WMS 2.156 Event-Decomposition, Bayes’ Rule 8. (+2) WMS 2.124 9. (+4) The purchase website for a smart phone includes an advertisement for a protective case. If phone customers click on this advertisement they receive more information about the case, and the option to purchase the case along with the phone. The website host records the following data about visitors to the website: 40% click on the advertisement, and 80% of these make some purchase: half purchase just the phone and not the case, while the other half purchase both (no one purchases the case without the phone). The remaining 20% of these customers leave the website without making any purchase at all. Of the customers who do not click on the advertisement, 70% purchase a phone (but do not even have the option of purchasing the case) while 30% leave the website without making a purchase. Use this information to answer the 15 following, or state that there is not enough information provided, and specify what additional information would be needed in order to determine the answer: a. What fraction of visitors to the phone website ultimately purchase (at least) a phone? b. Of those who do ultimately purchase a phone, what fraction at least click on the advertisement to learn about the protective case? What fraction actually purchase the protective case? 10. (+4) A certain disease afflicts 2% of a certain population. A diagnostic test for the disease is quite accurate, producing positive results for 95% of patients who actually have the disease and negative results for 90% of patients who do not have the disease. If a person chosen at random from the population receives a positive test result, what is the conditional probability that she has the disease? (Many students are surprised at the answer.) [See also WMS 2.77, 102, 103*, 110, 111*, 114*, 115*, 121*, 125*, 128, 130, 133*, 137*, 142, 151*, 154a*, 155, 157, 172, 178* and WMS examples 2.17*, 22*,23*] HW 4 (+17) Distributions Mean and Variance16 1. (+1) The distribution of wages among a certain group of workers has mean 𝜇𝜇 = $24 and standard deviation 𝜎𝜎 = $3. Interpret this in non-technical terms. 2. (+3) WMS 3.12 3. (+2) Find the standard deviation of the following data: $4, $2, -$1, $2, $2 4. (+1) What is the modal value of 𝑌𝑌 in exercise WMS 3.12? 5. (+3) WMS 3.19 6. (+3) Prove the equivalence of the two formulas for variance: 𝑉𝑉(𝑋𝑋) = 𝐸𝐸[(𝑋𝑋 − 𝜇𝜇)2] = 𝐸𝐸(𝑋𝑋2) − 𝜇𝜇2 7. (+2) Suppose that the happiness, or “utility”, associated with additional wealth 𝑤𝑤, is 𝑢𝑢(𝑤𝑤) = √𝑤𝑤, and consider two investments: investment A produces 𝑊𝑊 = $0 with probability . 5 and 𝑊𝑊 = $1000 with probability . 5. Investment B produces 𝑊𝑊 = $500 with probability 1. Which investment provides greater expected utility 𝐸𝐸[𝑢𝑢(𝑊𝑊)]? 8. (+1) The cost 𝐶𝐶 of producing a quantity 𝑄𝑄 of widgets to satisfy demand is 𝐶𝐶 = 4000 + 20𝑄𝑄, but the quantity demanded is random. If the mean and standard deviation of demand are 500 and 200, respectively, then what are the mean and standard deviation of costs?17 9. (+1) True or False: If a random variable 𝑋𝑋 has mean 𝐸𝐸(𝑋𝑋) = 𝜇𝜇 and variance 𝑉𝑉(𝑋𝑋) = 𝜎𝜎2 and a random variable 𝑍𝑍 is defined by 𝑍𝑍 = (𝑋𝑋−𝜇𝜇) 𝜎𝜎 then 𝐸𝐸(𝑍𝑍) = 0 and 𝑉𝑉(𝑍𝑍) = 1. [See also WMS 3.1*, 5, 6, 10, 14*, 22, 24, 27*, 34*, 211, and WMS example 3.1, 2*, 3*, 4] HW 5 (+24) Correlation Joint Distributions 1. (+1) A survey counts the numbers of adults and children in each of 55 households in a particular apartment complex, with results displayed in the table below. Based on this information, write down the joint distribution function for the numbers 𝑋𝑋 and 𝑌𝑌 of adults and children in each household, respectively. no children 1 child 2 children 3 children 1 adult 5 6 8 4 2 adults 6 11 9 6 Marginal Distributions and Independence 2. (+3) Find the marginal distributions of 𝑋𝑋 and 𝑌𝑌 for the distribution in question 1. Use these to find the average numbers of adults and children in a household.18 3. (+1) Are 𝑋𝑋 and 𝑌𝑌 independent, for the distribution in question 1? How can you tell? 4. (+1) Fill in the following joint distribution to make 𝑌𝑌 and 𝑍𝑍 are independent. 𝑍𝑍 = 0 𝑍𝑍 = 1 𝑌𝑌 = 0 ? . 2 𝑌𝑌 = 1 ? . 6 5. (+2) Let 𝑃𝑃(𝑥𝑥, 𝑦𝑦, 𝑧𝑧) = ⎩ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎧ . 1 𝑖𝑖𝑖𝑖 (𝑥𝑥, 𝑦𝑦, 𝑧𝑧) = (0,0,0) . 2 𝑖𝑖𝑖𝑖 (𝑥𝑥, 𝑦𝑦, 𝑧𝑧) = (0,0,1) . 1 𝑖𝑖𝑖𝑖 (𝑥𝑥, 𝑦𝑦, 𝑧𝑧) = (0,1,0) . 1 𝑖𝑖𝑖𝑖 (𝑥𝑥, 𝑦𝑦, 𝑧𝑧) = (0,1,1) 0 𝑖𝑖𝑖𝑖 (𝑥𝑥, 𝑦𝑦, 𝑧𝑧) = (1,0,0) 0 𝑖𝑖𝑖𝑖 (𝑥𝑥, 𝑦𝑦, 𝑧𝑧) = (1,0,1) . 2 𝑖𝑖𝑖𝑖 (𝑥𝑥, 𝑦𝑦, 𝑧𝑧) = (1,1,0) . 3 𝑖𝑖𝑖𝑖 (𝑥𝑥, 𝑦𝑦, 𝑧𝑧) = (1,1,1)⎭ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎫ denote the joint probability of binary random variables 𝑋𝑋, 𝑌𝑌, and 𝑍𝑍. Find the marginal distribution of 𝑋𝑋. That is, find 𝑃𝑃𝑥𝑥(𝑋𝑋 = 0) and 𝑃𝑃𝑥𝑥(𝑋𝑋 = 1). Expectations 6. (+2) Suppose that a food aid program provides every household in the apartment complex described in question 1 with an extra $30 of food benefits per week for each adult in a household, and $10 per week per child. Find the average aid amount per household. Covariance and Correlation 7. (+3) Find 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) for the distribution in question 1. 8. (+1) Find 𝐶𝐶𝐶𝐶𝐶𝐶(2𝑋𝑋, −3𝑌𝑌) and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶(2𝑋𝑋, −3𝑌𝑌) for the distribution in question 1.19 9. (+4) Applications of the concepts of covariance and correlation have been extremely important in the field of finance. The purpose of this question is to illustrate one such application, which is the value of maintaining a diversified portfolio. To that end, let 𝑋𝑋 and 𝑌𝑌 denote the (unknown) future returns associated with two stocks. A stock is most attractive to an investor if its payoffs are expected to be high, and have low risk—that is, a high mean and low variance. Suppose that the two stocks are equally attractive, with the same mean 𝜇𝜇𝑥𝑥 = 𝜇𝜇𝑦𝑦 = 𝜇𝜇 and standard deviation 𝜎𝜎𝑥𝑥 = 𝜎𝜎𝑦𝑦 = 𝜎𝜎, implying that 𝜌𝜌 = 𝜎𝜎𝑥𝑥𝑥𝑥 𝜎𝜎𝑥𝑥𝜎𝜎𝑦𝑦 = 𝜎𝜎𝑥𝑥𝑥𝑥 𝜎𝜎2 , or 𝜎𝜎𝑥𝑥𝑥𝑥 = 𝜌𝜌𝜎𝜎2. a. In terms of 𝜇𝜇 and 𝜎𝜎, find the mean and variance of the returns associated with a portfolio 2𝑋𝑋 consisting of two shares of stock 𝑋𝑋. b. In terms of 𝜇𝜇, 𝜎𝜎, and 𝜌𝜌, find the mean and variance of the returns associated with a diversified portfolio (𝑋𝑋 + 𝑌𝑌), consisting of one share of stock 𝑋𝑋 and one share of stock 𝑌𝑌. c. For what values of 𝜌𝜌 is the diversified portfolio better than two shares of the same stock? That is, when is 𝐸𝐸(𝑋𝑋 + 𝑌𝑌) > 𝐸𝐸(2𝑋𝑋) or 𝑉𝑉(𝑋𝑋 + 𝑌𝑌) 𝑉𝑉(2𝑋𝑋)? For two stocks in the real world, how likely is this condition to be satisfied? d. What is the lowest possible variance for the portfolio 𝑋𝑋 + 𝑌𝑌? What would be required to achieve this variance? 10. (+3) WMS 5.103 11. (+3) Prove the equivalence of the two formulas for covariance: 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = 𝐸𝐸�(𝑋𝑋 − 𝜇𝜇𝑥𝑥)�𝑌𝑌 − 𝜇𝜇𝑦𝑦�� = 𝐸𝐸(𝑋𝑋𝑋𝑋) − 𝜇𝜇𝑥𝑥𝜇𝜇𝑦𝑦20 [See also WMS 5.4, 19a] HW 6 (+17) Continuous Distributions Probability Density Functions, Cumulative Distribution Functions, and Percentiles 1. (+3) The pdf 𝑓𝑓(𝑥𝑥) of a random variable 𝑋𝑋 is given by 𝑓𝑓(𝑥𝑥) = � 𝑘√𝑥𝑥 𝑖𝑖𝑖𝑖0 ≤ 𝑥𝑥 ≤ 1 0 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 . a. Find 𝑘𝑘 to make this is a legitimate pdf. b. Find the cumulative distribution function 𝐹𝐹(𝑥𝑥). 2. (+2) The pdf 𝑓𝑓(𝑥𝑥) of a random variable 𝑋𝑋 is given by 𝑓𝑓(𝑥𝑥) = � 𝑒𝑒−𝑥𝑥 𝑖𝑖𝑓𝑓𝑥𝑥 ≥ 0 0 𝑖𝑖𝑖𝑖 𝑥𝑥 𝐹𝐹(𝑥𝑥). (Hint: first find the derivative of −𝑒𝑒−𝑥𝑥, and use this to find the anti-derivative of 𝑒𝑒−𝑥𝑥) 3. (+2) Find the median of 𝑋𝑋 for the distribution in question 1, along with the 10th and 95th percentiles 𝜙𝜙.10 and 𝜙𝜙.95. 4. (+3) WMS 4.12, part c-e 5. (+1) A random variable 𝑋𝑋 has cdf 𝐹𝐹(𝑥𝑥) = � 0 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑥𝑥2 + 4𝑥𝑥 + 1 𝑖𝑖𝑖𝑖 − 1 2 ≤ 𝑥𝑥 ≤ 0 1 𝑖𝑖𝑖𝑖 𝑥𝑥 > 0 . Find its density function 𝑓𝑓(𝑥𝑥). Mean and Variance 6. (+3) WMS 4.30, parts a-b Review21 7. (+2) A pizza restaurant sells pizzas in three sizes, and a customer can choose up to three toppings, from a list of ten. A customer may not repeat toppings (e.g. triple pepperoni, or pepperoni + pepperoni + sausage). How many pizza configurations are possible? (Hint: separately derive the numbers of 1-, 2-, and 3-topping pizzas.) [See also WMS 4.8*, 9, 11*, 13, 17, 21*, 22, 24, 27*, 28*, and WMS examples 4.4*, 5*, 6*] HW 7 (+15) Continuous Joint Distributions Joint Density Functions 1. (+2) Each month, a manufacturer stocks one warehouse with good 𝑥𝑥 and two warehouses with good 𝑦𝑦. Monthly sales for 𝑥𝑥 and 𝑦𝑦 (as fractions of a full warehouse) are then random, described by the following joint density: 𝑓𝑓(𝑥𝑥, 𝑦𝑦) = � 1 − 𝑘𝑘(𝑥𝑥 + 𝑦𝑦) 𝑖𝑖𝑖𝑖 𝑥𝑥 ∈ [0,1], 𝑦𝑦 ∈ [0,2] 0 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 . Find 𝑘𝑘 to make 𝑓𝑓(𝑥𝑥, 𝑦𝑦) a legitimate density. Marginal Density Functions 2. (+2) Using the joint density given in question 1, find the marginal densities of sales for goods 𝑥𝑥 and 𝑦𝑦. Independence22 3. (+1) Using the joint density given in question 1 and your answer to question 2, are 𝑥𝑥 and 𝑦𝑦 independent? How can you tell? Expectations 4. (+2) Suppose that goods 𝑥𝑥 and 𝑦𝑦 sell for $50,000 and $40,000 (per warehouse-full), respectively. For the joint density given in question 1, find average total monthly revenue. Covariance and Correlation 5. (+5) For the joint density given in question 1, find 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) and 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌). 6. (+1) WMS 5.99 Review 7. (+2) A website gets 75% of its traffic from link A and 25% from link B. 20% of the customers who enter from link A and 10% of the customers who enter from link B eventually make a purchase. Of those who make a purchase, what fraction entered from link B? [See also WMS 5. 16, 36, 60*, 64*, 80*, 82, 94, 96*, 109, 110*, 145, 148, 149*, and WMS examples 5.4, 5, 6*, 7, 11*, 13, 15*, 25*, 28]23 HW 8 (+17 +1) Conditional Distributions Correlation and Causation 1. (+2) A particular institution rates nations on a ten-point scale, on the basis of political corruption (where a 10 rating reflects rampant corruption). A researcher finds a −0.8 correlation between this corruption index and per-capita income. On the basis of this evidence, a policy maker who wishes to raise per-capita income decides to apply his efforts toward reducing corruption. a. Give one reason why the evidence described might not actually warrant the policy maker’s efforts. b. What types of additional evidence would strengthen the policy maker’s interpretation of the evidence described above? 2. (+2) A researcher observes that the rate of a country’s economic growth is correlated with its current level of debt. One interpretation of this is that debt inhibits economic growth. a. Give one alternative explanation for the observed correlation. b. List one type of additional evidence that could help the researcher distinguish between the explanation given above and the alternative explanation proposed in part a. 3. (+1) A study finds that women, on average, live longer than men. Is it appropriate on the basis of such a study for a life insurance firm to predict that future payouts to the beneficiaries of male policy holders will be higher than payouts to the beneficiaries of female policy holders? Why or why not?24 4. (Bonus +1) Describe a claim that you have encountered recently (e.g. in a newspaper, in conversation, etc.) that you believe erroneously interprets correlation as causation. Explain your reasoning. Conditional Distribution Functions Questions 5-7 refer to a population with the following demographics. no children 1 child 2 children 3 children 1 adult 5 6 8 4 2 adults 6 11 9 6 5. (+3) Let 𝑋𝑋 denote the number of adults and 𝑌𝑌 denote the number of children in a family. Find the conditional distribution of 𝑌𝑌 (that is, the conditional distribution function 𝑃𝑃𝑦𝑦(𝑌𝑌 = 𝑦𝑦|𝑋𝑋 = 𝑥𝑥)) for each of the following: a. 𝑋𝑋 = 1. b. 𝑋𝑋 = 2. Conditional Mean and Variance 6. (+3) Find the average number of children in households with 1 adult, and the average number of children in households with 2 adults. 7. (+1) Interpret 𝐸𝐸(𝑋𝑋|𝑌𝑌 = 3) in words (you do not need to compute this value). Conditional Densities 8. Answer the following, for the joint density function analyzed in class:25 𝑓𝑓(𝑥𝑥, 𝑦𝑦) = � 1 4 𝑥𝑥 + 1 2 𝑦𝑦 𝑖𝑖𝑖𝑖 𝑥𝑥 ∈ [0,2], 𝑦𝑦 ∈ [0,1] 0 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 a. (+2) The conditional density 𝑓𝑓𝑦𝑦(𝑦𝑦|𝑋𝑋 = 1). b. (+1) The conditional mean 𝐸𝐸(𝑌𝑌|𝑋𝑋 = 1). c. (+2) The conditional standard deviation 𝜎𝜎𝑦𝑦|𝑋𝑋=1. [See also WMS 5.22] HW 9 (+22) Regressions Regressions 1. (+2) Show that choosing the slope coefficient 𝛽𝛽1 = 𝜎𝜎𝑥𝑥𝑥𝑥 𝜎𝜎𝑥𝑥 2 to minimize 𝑉𝑉(𝜀𝜀) has the side effect of also ensuring that error terms are uncorrelated with the explanatory variable, 𝐶𝐶𝐶𝐶𝐶𝐶(𝜀𝜀, 𝑋𝑋) = 0. (Hint: By definition, 𝜀𝜀 = 𝑌𝑌 − 𝛽𝛽0 − 𝛽𝛽1𝑋𝑋, so its suffices to show that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑌𝑌 − 𝛽𝛽0 − 𝛽𝛽1𝑋𝑋, 𝑋𝑋) = 0.) 2. (+4) In a certain demographic group, the average height for men is 68 inches, with standard deviation 4 inches. The average weight is 185 lbs., with standard deviation 25 lbs. The correlation between height and weight is 𝜌𝜌 = .4. a. How heavy would you expect a 6-foot-tall (i.e. 72 inches) man to be? b. How tall do you expect a man to be, who weighs 200 lbs.? c. If a man is one standard deviation lighter than average, how many standard deviations taller or shorter than average do you expect him to be? d. What fraction of the variation in weight is associated with variation in height?26 3. According to the Bureau of Labor Statistics website, the average price per gallon of regular unleaded gasoline in July over four years was $2.74, $3.65, $3.45, and $3.63. a. (+2) Compute intercept and slope parameters for a linear regression describing the relationship between year and price during this period. (Hint: Let each year-price pair occur with 25% probability.) b. (+2) Assuming that the same relationship continues to hold, predict the price of gasoline for the following year. (Note: your intercept parameter above will depend on how you label years, but your prediction should not.) c. (+2) In which of the four years was the price of gasoline furthest from its trend? 4. (+6) We have seen in class lectures that linear regressions shed light on the interpretation of the correlation coefficient 𝜌𝜌 between two variables (in that 𝜌𝜌2 corresponds to the fraction of variation in one variable related in a linear way to variation in the other). The purpose of this question is to show that, in fact, the correlation coefficient formulated earlier is fundamentally only a measure of the linear relationship between two variables. To see this, let 𝑋𝑋 be a random variable that equals 0, 1, or 2 with equal probability, and consider another random variable 𝑌𝑌 = 𝑋𝑋2, so that the joint distribution of 𝑋𝑋 and 𝑌𝑌 is given by the following. 𝑌𝑌 = 0 𝑌𝑌 = 1 𝑌𝑌 = 4 𝑋𝑋 = 0 1/3 0 0 𝑋𝑋 = 1 0 1/3 0 𝑋𝑋 = 2 0 0 1/327 Clearly, 𝑋𝑋 and 𝑌𝑌 are perfectly correlated, in the sense that if we know the realization of either one of the two variables then we can perfectly forecast the other. A linear regression cannot provide these perfect forecasts, however, because the relationship between the two variables is not linear (it’s quadratic). Consistent with this, the correlation coefficient 𝜌𝜌 is less than one. a. Determine 𝜌𝜌. What fraction of the variation in 𝑌𝑌 is related to variation in 𝑋𝑋? b. Determine coefficients 𝛽𝛽0 and 𝛽𝛽1 for a linear regression of 𝑌𝑌 on 𝑋𝑋, and use this regression to predict 𝑌𝑌 for 𝑋𝑋 = 0, 𝑋𝑋 = 1, and 𝑋𝑋 = 2. 5. (+4) A crime study finds that, in a certain population of international cities, those with higher population density also have higher rates of crime. Suppose that the average population density among these cities is 5,612 people per square mile (with standard deviation 1,384), the average number of property crimes is 4,028 (per 100k people) per year (with standard deviation 2,266), and the correlation between property crime and population density is 0.18. Which has lower crime, for a city of its population density: a city with 4,000 people per square mile and 3,000 property crimes per 100k people per year, or a city with 5,000 people per square mile and 5,000 property crimes per 100k people per year? HW 10 (+13) Common Distributions 1 Bernoulli and Binomial Distributions 1. (+3) WMS 3.4428 2. (+1) WMS 3.56 3. (+2) What are some situations that could be appropriately modeled using the following distributions? Give two situations for each, and interpret the parameters for each situation. (For example, a coin toss is a Bernoulli experiment with 𝑝𝑝 = 1 2 ; flipping 20 coins and counting the number of heads is a Binomial experiment with 𝑝𝑝 = 1 2 and 𝑛𝑛 = 20). a. Bernoulli b. Binomial Uniform Distribution 4. (+2) WMS 4.45 5. (+1) WMS 4.46 Review 6. (+4) Given the following joint distribution, 𝑌𝑌 = 0 𝑌𝑌 = 5 𝑌𝑌 = 10 𝑋𝑋 = 0 .20 .15 .10 𝑋𝑋 = 1 .15 .15 .2529 derive the correlation coefficient 𝜌𝜌. [See also WMS 3.37*, 39, 40, 41*, 51, 57*, 184*, 206*, 208, 214 and WMS examples 8*, 9] HW 11 (+14) Common Distributions 2 Exponential Distribution 1. (+1) WMS 4.97 2. (+2) WMS 4.98 Normal Distribution 3. (+3) WMS 4.58, parts a through e. 4. (+2) WMS 4.64, part a 5. (+2) WMS 4.65 6. (+2) Suppose that 𝑋𝑋~𝑁𝑁(15,20) and 𝑌𝑌~𝑁𝑁(10,30) are mutually independent. Find the distributions (including parameters, if any) of 𝑋𝑋 + 𝑌𝑌, 𝑋𝑋 − 𝑌𝑌, and 3𝑋𝑋 + 2𝑌𝑌. 7. (+2) WMS 4.161 [See also WMS 4.58*, 59, 61, 62*, 64b, 66b, 67, 70, 72*, 74, 77*, 78, and WMS example 4.9]30 HW 12 (+14) Common Distributions 3 Chi-Square 1. (+3) Suppose that 𝑋𝑋 has Chi-square distribution with variance 40. Find a constant 𝑏𝑏 such that P(𝑋𝑋 > 𝑏𝑏) = 0.1. Find a constant 𝑎𝑎 such that P(𝑋𝑋 𝑎𝑎) = 0.1. 2. (+1) Suppose that 𝑊𝑊1~𝜒𝜒2(20) and 𝑊𝑊2~𝜒𝜒2(20) are mutually independent. Find the distribution (including parameters, if any) of 𝑊𝑊1 + 𝑊𝑊2. 𝑡𝑡-distribution 3. (+2) Suppose that 𝑇𝑇 has a 𝑡𝑡-distribution with 𝜈𝜈 = 15 degrees of freedom. a. Find constants 𝑎𝑎 and 𝑏𝑏 such that 𝑃𝑃(𝑎𝑎 𝑌𝑌 𝑏𝑏) = .99. b. Find a constant 𝑐𝑐 such that 𝑃𝑃(𝑌𝑌 𝑐𝑐) = .99. 𝐹𝐹-distribution 4. (+3) Suppose that 𝑌𝑌 has an 𝐹𝐹-distribution with 𝜈𝜈1 = 30 numerator degrees of freedom and 𝜈𝜈2 = 40 denominator degrees of freedom. Find constants 𝑎𝑎 and 𝑏𝑏 such that 𝑃𝑃(𝑎𝑎 𝑌𝑌 𝑏𝑏) = .95. 5. (+1) True or False: if 𝑌𝑌 ∼ 𝑡𝑡(𝜈𝜈) then 𝑌𝑌2 ∼ 𝐹𝐹(1, 𝜈𝜈) Review 6. (+4) Let 𝑋𝑋 and 𝑌𝑌 be random variables with joint density 𝑓𝑓(𝑥𝑥, 𝑦𝑦) = � 2 3 (𝑥𝑥 − 𝑦𝑦) 𝑖𝑖𝑖𝑖 𝑥𝑥 ∈ [0,1], 𝑦𝑦 ∈ [−1,0] 0 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒31 and find Pr �𝑋𝑋 > 1 2 �𝑌𝑌 = − 1 2 �. [See also WMS 4.89, 92*, 4.160, 169, 175*, 176, 182*, 190a, and WMS example 4.10] HW 13 (+11) Method of Moments Data Collection 1. (+1) Give two examples of data collection methods that might yield samples that are not i.i.d. (Unfortunately, as a practical matter, some of these pitfalls are hard to avoid!) Method of Moments 2. (+4) Using the method of moments and the data in the following table, find point estimates of the population parameters below. Individual 1 2 3 4 5 6 Education (years) 10 15 13 9 16 17 Income ($ thousands) 44 51 65 52 75 74 a. The mean and standard deviation of education b. The mean and standard deviation of income c. The correlation between education and income d. The proportion of workers who have more than 12 years of education 3. (+1) A Poisson distribution, with parameter 𝜆𝜆, has mean and variance both equal to 𝐸𝐸(𝑋𝑋) = 𝑉𝑉(𝑋𝑋) = 𝜆𝜆. Use the method of moments to derive two estimates, 𝜆𝜆 1 and 𝜆𝜆 2, 32 from a simple random sample 𝑥𝑥1, 𝑥𝑥2, … , 𝑥𝑥𝑛𝑛 that has mean 𝑥𝑥= 153 and standard deviation �1 𝑛𝑛 ∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥) 𝑛𝑛 2 𝑖𝑖=1 = 14. 4. (+3) WMS 9.74, part a 5. (+1) Let 𝑌𝑌1, 𝑌𝑌2, … , 𝑌𝑌𝑛𝑛 denote a random sample of size 𝑛𝑛 such that each 𝑌𝑌𝑖𝑖 follows an 𝐹𝐹(𝜐𝜐1, 𝜐𝜐2) distribution where 𝜐𝜐1 and 𝜐𝜐2 are the numerator and denominator degrees of freedom, respectively. It can be shown that the mean 𝐸𝐸(𝑌𝑌𝑖𝑖) = 𝜐𝜐2 𝜐𝜐2−2 of 𝑌𝑌𝑖𝑖 only depends on the denominator degrees of freedom 𝜐𝜐2; use this fact to derive an estimator of 𝜐𝜐2 using the method of moments. 6. (+1) The skewness of a distribution is given by 𝐸𝐸(𝑋𝑋 − 𝜇𝜇)3. If 𝑋𝑋1, 𝑋𝑋2, … , 𝑋𝑋𝑛𝑛 represents a random sample from an unknown distribution, use the method of moments to find an estimator of the skewness of the population distribution. [See also WMS 9.71, 72*, 73*, and WMS examples 9. 11*] HW 14 (+14) Maximum likelihood Estimation Maximum Likelihood Estimation33 1. (+3) Suppose that 𝑋𝑋1,𝑋𝑋2, … , 𝑋𝑋𝑛𝑛 is an i.i.d. random sample, where 𝑋𝑋𝑖𝑖 has a Bernoulli distribution with parameter 𝑝𝑝. Find a maximum likelihood estimator 𝑝𝑝 𝑀𝑀𝑀𝑀. (Hint: write the Bernoulli distribution function as 𝑝𝑝𝑥𝑥(1 − 𝑝𝑝)1−𝑥𝑥.) 2. (+3) Suppose that 𝑋𝑋1,𝑋𝑋2, … , 𝑋𝑋𝑛𝑛 is an i.i.d. random sample, where 𝑋𝑋𝑖𝑖 has an Exponential distribution, with parameter 𝛽𝛽. Find a maximum likelihood estimator 𝛽𝛽 𝑀𝑀𝑀𝑀. (Recall that, using the Method of Moments, we derived two estimators of 𝛽𝛽, using the first and second moment equations: 𝛽𝛽 𝑀𝑀𝑀𝑀𝑀𝑀1 = 𝑥𝑥and 𝛽𝛽 𝑀𝑀𝑀𝑀𝑀𝑀2 = �∑ (𝑥𝑥𝑖𝑖 − 𝑥𝑥) 𝑛𝑛 2 𝑖𝑖=1 .) 3. (+4) Suppose that 𝑋𝑋1,𝑋𝑋2, … , 𝑋𝑋𝑛𝑛 is an i.i.d. random sample, where 𝑋𝑋𝑖𝑖 follows a normal distribution, with mean zero and unknown standard deviation 𝜎𝜎. Find the maximum likelihood estimator 𝜎𝜎�𝑀𝑀𝑀𝑀 of 𝜎𝜎. Review 4. As an experiment in a large prison population, convicted drug criminals are admitted to a drug rehabilitation program for varied numbers of weeks. The average duration of rehabilitation is 𝜇𝜇𝑥𝑥 = 6 weeks, with standard deviation 𝜎𝜎𝑥𝑥 = 2 weeks. After the test period, program participants are examined by a health professional who rates their mental health with a score ranging from 0 to 10. The average score is 𝜇𝜇𝑦𝑦 = 4 points, with a standard deviation of 1 point. The correlation coefficient between the duration of drug rehabilitation treatment 𝑋𝑋 and mental health score 𝑌𝑌 is 𝜌𝜌 = .2. a. (+3) Use a linear regression to predict the mental health score of a criminal who receives 10 weeks of treatment.34 b. (+1) If treatment duration is randomly assigned, does the information above constitute evidence constitute that drug rehabilitation treatment improves mental health? Why or why not? [See also WMS 9.81*, 86*, 89, 96*, and WMS examples 9.14*, 15*, 17] HW 15 (+13) Properties of Estimators 1. (+3) WMS 8.3 (Hint: what function 𝜃𝜃� of the original estimator 𝜃𝜃� yields an expected value of 𝜃𝜃?) 2. (+3) Consider a random sample 𝑋𝑋1,𝑋𝑋2,𝑋𝑋3 of three observations from an unknown distribution with finite mean 𝜇𝜇 and variance 𝜎𝜎2. The standard estimator for 𝜇𝜇, of course, is the sample average 𝑋𝑋� = 1 3 𝑋𝑋1 + 1 3 𝑋𝑋2 + 1 3 𝑋𝑋3. Another possibility, however, is to put more weight on the first observation: 𝜇 2 = 1 2 𝑋𝑋1 + 1 4 𝑋𝑋2 + 1 4 𝑋𝑋3. a. Derive the bias of 𝜇𝜇 2. b. Which estimator is more efficient, 𝑋𝑋� or 𝜇𝜇 2? Justify your answer. 3. (+2) If 𝑋𝑋1,𝑋𝑋2, … , 𝑋𝑋𝑛𝑛 represents a random sample from an unknown distribution with finite mean 𝜇𝜇 and variance 𝜎𝜎2, show that 𝑋𝑋� = 1 𝑛𝑛 ∑ 𝑋𝑋𝑖𝑖 𝑛𝑛 𝑖𝑖=1 is a consistent estimator of 𝜇𝜇. 4. (+3) It can be shown that the “sample covariance” 𝑆𝑆𝑥𝑥𝑥𝑥 = 1 𝑛𝑛−1 ∑ (𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�) 𝑛𝑛 𝑖𝑖=1 is an unbiased and consistent estimator of population covariance 𝜎𝜎𝑥𝑥𝑥𝑥. Use these facts to answer the following. (Hint: note that 𝜎𝜎�𝑥𝑥𝑥𝑥 = 𝑛𝑛−1 𝑛𝑛 𝑆𝑆𝑥𝑥𝑥𝑥, and derive the mean and variance 35 of 𝜎𝜎�𝑥𝑥𝑥𝑥 in terms of the mean and variance of 𝑆𝑆𝑥𝑥𝑥𝑥, which are implied by its zero bias and consistency.) a. What is the bias of the method of moments covariance estimator 𝜎𝜎�𝑥𝑥𝑥𝑥 = 1 𝑛𝑛 ∑ (𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�) 𝑛𝑛 𝑖𝑖=1 ? b. Is 𝜎𝜎�𝑥𝑥𝑥𝑥 consistent? Justify your answer. 5. (+1) Explain in non-technical words why a researcher might wish to use 𝑆𝑆2 rather than the method-of-moments variance estimator 𝜎𝜎𝑀𝑀𝑀𝑀𝑀𝑀 2 = 1 𝑛𝑛 ∑ (𝑋𝑋𝑖𝑖 − 𝑋𝑋�) 𝑛𝑛 2 𝑖𝑖=1 . [See also WMS 8.6, 7, 8, 9*, 10, 13*, 17*; 9.1*, 2] HW 16 (+13) Confidence Intervals Central Limit Theorem 1. (+2) WMS 7.43 Margin of Error 2. (+2) An exam is administered to 20 high school students, chosen randomly from within a large school district. Suppose that, if everyone in the school district were to take the exam, scores would follow a normal distribution, with mean . 69 and standard deviation . 18. What is the probability that the average score for these 20 students is more than 5 percentage points away from the population average (i.e. less than . 64 or greater than . 74)? 3. (+1) A newspaper reports that the average consumer in a certain neighborhood spends $565 per month on groceries, but doesn’t state the margin of error associated 36 with this statistic. If the sample size is 40 individuals and the standard deviation of grocery expenditures is $60, calculate the omitted margin of error. Confidence Intervals 4. (+3) WMS 8.58 5. (+3) WMS 8.82 6. (+2) Data Analysis Project Part 2, Question 1a [See also WMS 7.9, 12*, 45*, 48, 55, examples 7.8*, 9] HW 17 (+17) Hypothesis Tests Hypothesis Tests, statistical significance 1. (+3) WMS 10.18 2. (+3) WMS 10.50 3. (+3) WMS 10.67 4. (+1) WMS 10.115, parts a and b 5. (+3) Data Analysis Project Part 2, Question 1b Probabilities 6. (+4) WMS 7.96 [See also WMS 10.2a,b, 10.3a, 9*, 24, 61, 62, 64*, 122*, and WMS examples 10. 5, 7, 12*, 13*]37 HW 18 (+20 +2) Difference in Means, Proportions Difference in Means 1. (+3) WMS 10.121 2. (+3) Data Analysis Project Part 2, Question 4 Proportions 3. (+2) WMS 8.63, part a 4. (+1) WMS 8.104 5. (+3) Data Analysis Project Part 2, Question 3 Differences in Proportions 6. (+3) WMS 8.65 7. (+3) In a survey of 𝑛𝑛 = 200 (randomly chosen) working-age adults, 176 are employed and 24 are unemployed. a. (+1) Estimate the overall unemployment rate for this population, and state the margin of error associated with your estimate. b. (+2) At the 𝛼𝛼 = .05 level, test whether the population unemployment rate exceeds 10%. Report the p-value associated with this test. 8. (+2) WMS 10.35 [Note: the WMS answer key for this question is wrong.] 9. (Bonus +2) Data Analysis Project Part 2, Question 6 [See also WMS 7.15*, 92; 8.22*, 24, 31*, 34, 40, 41*, 45, 57*, 60*, 63, 66*, 67, 85, 86*, 87*, 90*, 122*; 10.17*, 25*, 30, 32, 33*, 35, 54, 57*, 58, 59*, 60*, 119 and WMS examples 8.2, 3*, 5*, 6, 7*, 8*, 11*, 10.1, 6*]38 HW 19 (+13 +4) Variance Estimation Confidence Intervals 1. (+3) A restaurant chain needs to decide whether to open a location in a particular town or not. To gauge the town’s demand for a new restaurant, a market analyst asks 51 families how many times per year they eat out, obtaining a sample average of 45 times, with a sample variance of 225. If the true variance for demand is too high, the restaurant may find the new market too risky to enter. Assuming that the numbers of times that families in the town eat out actually follow a normal distribution with unknown standard deviation 𝜎𝜎, find a 95% upper confidence limit 𝑏𝑏 such that Pr(𝜎𝜎 𝑏𝑏) = .95. Hypothesis Tests 2. (+3) WMS 10.86 3. (Bonus +2) Data Analysis Project Part 2, Question 2 Variance Ratio Estimation and Inference 4. (+6) To simplify decision-making, a university changes its admissions policy one year from a holistic approach (i.e. considering many factors, such as SAT scores, GPA, extracurricular involvement, etc.) to a simple SAT cutoff rule. One fear of doing this is that when ability is measured less precisely it may vary more among incoming students. To test this possibility, the university administers an aptitude test to 41 students chosen randomly from the year before the policy change and to 61 students from the year after the change. The resulting sample variance increases from 𝑠𝑠1 2 = 85.2 to 𝑠𝑠2 2 = 142.8 between policy regimes.39 a. Provide a 95% (two-sided) confidence interval for the percentage increase 𝜎𝜎2 2 𝜎𝜎1 2 of the variance in test scores between policy regimes. b. Is there sufficient evidence to conclude that the variance of ability increased with the policy change? Test at the 𝛼𝛼 = .05 level. c. Is there sufficient evidence to conclude that the variance of ability increased by at least 25% with the policy change? Test at the 𝛼𝛼 = .05 level. 5. (+1) WMS 10.83 6. (Bonus +2) Data Analysis Project Part 2, Question 5 [See also Examples 10.16*, 17*, WMS 10.78] HW 20 (+8) Regression Estimation Least Squares Estimation 1. (+4) A young entrepreneur starts a lemonade stand. Each week, she operates her business for a different length of time and earns a different amount of money. After five weeks of operation, she wishes to figure out how much money he makes per hour (to evaluate whether this enterprise is more lucrative than doing chores for her parents). Below is a record of her hours and earnings over the period. Hours 1 5 2 4 3 Earnings $2 $6 $3 $5 $940 From this data, use the ordinary least squares technique to estimate 𝛽𝛽0 and 𝛽𝛽1. As a check on these calculations, plot the data and sketch the fitted line implied by your estimates. [Note: when you submit your homework, save your answers, since you will need to refer to them on HW 21] 2. (+2) For the regression of question 1, calculate the coefficient of determination 𝑟𝑟2, and interpret this value. 3. (+2) For the regression of question 1, give a point estimate 𝑠𝑠𝜀𝜀 2 of the error variance. HW 21 (+20) Regression Inference Regression Inference 1. For the hours-earnings regression of question 1 on HW 20, answer the following. a. (+2) Find a 95% confidence interval for 𝛽𝛽1. b. (+2) Do the data present sufficient evidence to indicate that the slope 𝛽𝛽1 differs from zero? Assume that sales are normal, and test at the 5% significance level. c. (+3) How much would the entrepreneur earn if she worked full time (i.e. 40 hours a week)? Give a 95% confidence interval for this prediction. 2. Consider the following summary statistics of study times 𝑋𝑋 and exam scores 𝑌𝑌 for a sample of 𝑛𝑛 = 52 students. 𝑥𝑥 7 hours 𝑠𝑠𝑥𝑥 2 hours 𝑦𝑦� 73% 𝑠𝑠𝑦𝑦 14%41 𝑟𝑟 0.3 a. (+3) Using these summary statistics, estimate the slope and intercept parameters of a linear regression of exam scores on study times, and predict the exam score of a student who studies for exactly ten hours. b. (+1) Use the equation 𝑆𝑆𝜀𝜀𝜀𝜀 = (1 − 𝑟𝑟2)𝑆𝑆𝑦𝑦𝑦𝑦 to derive an estimate 𝑠𝑠𝜀𝜀 of the standard deviation of errors off the regression line. c. (+6) Use your answer to part b to find 95% confidence intervals for the three point estimates reported in part a. 3. (+3) Data Analysis Project, Part 2 Question 7 [See also Examples 11.1*, 2*, 3, 4*, 7*, WMS 11.3, 5, 10, 17, 23, 43] HW 22 (+5) Project Discussion Groups (In Class) HW 23 (+7) Matrix Multiplication Definitions Use the following matrices to answer questions 1-3 below. 𝑨𝑨 = � 0 −1 0 3 � 𝑩𝑩 = � 4 −5 −5 −3 � 𝑪𝑪 = � 2 −3 4 � 𝑫𝑫 = � 5 6 −2 � 𝑬𝑬 = � 1 −1 2 −2 −3 3 � 𝑭𝑭 = � 2 0 0 0 −1 0 0 0 −4 � 1. (+1) Which (if any) of the above matrices are idempotent?42 Basic Operations 2. (+1) Verify that (𝑨𝑨𝑨𝑨)𝑇𝑇 = 𝑩𝑩𝑇𝑇𝑨𝑨𝑇𝑇 for the specific matrices above. 3. (+4) Find the following sums and differences, or state that the matrices are not conformable for addition or multiplication. a. 3𝑨𝑨 − .5𝑩𝑩 b. 𝑨𝑨 + 𝑬𝑬′ c. 2(𝑪𝑪 + 𝑫𝑫)𝑇𝑇 d. [(𝑩𝑩𝑇𝑇)𝑇𝑇] 𝑇𝑇 e. 𝑨𝑨𝑨𝑨𝑨𝑨′ f. 𝑫𝑫𝑫𝑫 g. (𝑪𝑪𝑪𝑪)𝑬𝑬43 4. (+1) Simplify (𝑨𝑨𝑨𝑨′)′(𝑩𝑩𝑩𝑩)′, where 𝑨𝑨 and 𝑩𝑩 are 𝑛𝑛 × 𝑛𝑛 matrices that are symmetric and idempotent. HW 24 (+22) Matrix Inversion 1. (+3) Consider the following system of two equations with two unknowns, 𝑥𝑥 + 3𝑦𝑦 = 7 4𝑥𝑥 + 2𝑦𝑦 = 8 Rewrite these equations in matrix notations. Solve for x and y using matrix inversion. 2. (+3) Consider the following market model for a commodity, 𝑄𝑄𝑑𝑑 = 24 − 2𝑃𝑃 𝑄𝑄𝑠𝑠 = −5 + 7𝑃𝑃 Rewrite this market in matrix notation. Indicate the dimensions of each matrix. Solve for the equilibrium price and quantity in this market (remember that in equilibrium the quantity demanded equals the quantity supplied, 𝑄𝑄𝑠𝑠 = 𝑄𝑄𝑑𝑑 = 𝑄𝑄). 3. (+3) Rewrite the following system of equations in matrix notation, and solve using matrix inversion: −2𝑥𝑥 − 𝑦𝑦 = 1 𝑥𝑥 − 𝑦𝑦 = −1 4. (+2) Suppose that 𝑿𝑿 is an 𝑛𝑛 × 𝑘𝑘 matrix with 𝑛𝑛 > 𝑘𝑘, while 𝜎𝜎2 is a scalar. Simplify the following, or state that it cannot be simplified further. Assume that 𝑿𝑿′𝑿𝑿 is nonsingular, and keep in mind that 𝑿𝑿 is not square, so 𝑿𝑿−1 does not exist.44 [𝑿𝑿(𝑿𝑿′𝑿𝑿)−1𝑿𝑿′]𝜎𝜎2[𝑿𝑿(𝑿𝑿′𝑿𝑿)−1𝑿𝑿′]′ 5. (+3) Simplify the following, where 𝑨𝑨, 𝑩𝑩, and 𝑪𝑪 are nonsingular 𝑛𝑛 × 𝑛𝑛 matrices, or state that no simplification is possible. a. (𝑩𝑩𝑩𝑩)−𝟏𝟏(𝑨𝑨 + 𝑩𝑩) b. �(𝑨𝑨−𝟏𝟏𝑨𝑨′ )′𝑨𝑨−𝟏𝟏� c. 𝑩𝑩(𝑨𝑨′ 𝑩𝑩)−1𝑨𝑨′ 6. (+4) Suppose that 𝑿𝑿 is an 𝑛𝑛 × 𝑘𝑘 matrix with 𝑛𝑛 > 𝑘𝑘, and that 𝑿𝑿′𝑿𝑿 is nonsingular. Of interest in econometrics are a “projection” or “hat” matrix 𝑷𝑷 = 𝑿𝑿(𝑿𝑿′𝑿𝑿)−1𝑿𝑿′ and an “annihilator” matrix 𝑴𝑴 = 𝑰𝑰𝑛𝑛 − 𝑷𝑷. a. What are the dimensions of 𝑷𝑷? b. Show that 𝑷𝑷 is symmetric and idempotent. c. Show that 𝑴𝑴 is idempotent and symmetric. d. Show that 𝑷𝑷𝑷𝑷 = 𝑿𝑿, 𝑴𝑴𝑴𝑴 = 𝟎𝟎, and 𝑴𝑴𝑴𝑴 = 𝟎𝟎. 7. (+4) Determine whether the following are positive or negative definite or semidefinite, or indefinite. a. � 1 2 3 4 � b. � −2 2 −2 −1 � c. � 1 0 0 2 2 2 0 0 1 �45 HW 25 (+20) Matrix Calculus 8. (+2) Let 𝑓𝑓(𝒙𝒙) = 𝑥𝑥1𝑥𝑥2 2 + 2𝑥𝑥3. a. Evaluate 𝑓𝑓(𝒙𝒙) at 𝒙𝒙 = � 3 2 −1 �. b. Differentiate 𝑓𝑓(𝒙𝒙). Evaluate at 𝒙𝒙 = � 3 2 −1 �. 9. (+3) Let 𝑓𝑓(𝒙𝒙) = 𝒂𝒂′𝒙𝒙, where 𝒂𝒂 = � 2 −1 1 �. d. Rewrite 𝑓𝑓(𝒙𝒙) algebraically, in terms of the elements 𝑥𝑥1, 𝑥𝑥2, and 𝑥𝑥3 of 𝒙𝒙. Evaluate 𝑓𝑓(𝒙𝒙) at 𝒙𝒙 = � 3 2 −1 �. e. Differentiate 𝑓𝑓(𝒙𝒙). Evaluate the derivative at 𝒙𝒙 = � 3 2 −1 �. 10. (+3) Let 𝑓𝑓(𝒙𝒙) = 𝒙𝒙′𝑴𝑴𝑴𝑴, where 𝑴𝑴 = � 1 0 2 −1 1 0 0 1 1 �. f. Rewrite 𝑓𝑓(𝒙𝒙) algebraically, in terms of the elements 𝑥𝑥1, 𝑥𝑥2, and 𝑥𝑥3 of 𝒙𝒙. Evaluate 𝑓𝑓(𝒙𝒙) at 𝒙𝒙 = � 3 2 −1 �. g. Differentiate 𝑓𝑓(𝒙𝒙) with respect to 𝒙𝒙. 11. (+3) Let 𝒇𝒇(𝒙𝒙) = � 𝑥𝑥1 3𝑥𝑥2 + 2𝑥𝑥3 𝑥𝑥1 − 2𝑥𝑥3 �. h. Evaluate 𝒇𝒇(𝒙𝒙) at 𝒙𝒙 = � 3 2 −1 �.46 i. Differentiate 𝒇𝒇(𝒙𝒙). Evaluate the derivative at 𝒙𝒙 = � 3 2 −1 �. 12. (+3) Let 𝒇𝒇(𝒙𝒙) = 𝑴𝑴𝑴𝑴, where 𝑴𝑴 = � 1 0 2 −1 1 0 0 1 1 �. j. Rewrite 𝒇𝒇(𝒙𝒙) algebraically, in terms of the elements 𝑥𝑥1, 𝑥𝑥2, and 𝑥𝑥3 of 𝒙𝒙. Evaluate 𝒇𝒇(𝒙𝒙) at 𝒙𝒙 = � 3 2 −1 �. k. Differentiate 𝒇𝒇(𝒙𝒙) with respect to 𝒙𝒙. Evaluate at 𝒙𝒙 = � 3 2 −1 �. 13. (+2) For the cases of 𝑛𝑛 = 2 and 𝑛𝑛 = 3, determine whether the matrix 𝑴𝑴 = 𝑰𝑰 − 1 𝑛𝑛 𝟏𝟏𝟏𝟏′ is singular or non-singular, where 𝑰𝑰 denotes an 𝑛𝑛 × 𝑛𝑛 identity matrix and 𝟏𝟏 denotes an 𝑛𝑛 × 1 vector of ones. (Note: the same turns out to be true for all 𝑛𝑛.