# Interpreting or misinterpreting correlation | Numerical analysis homework help

Question 1 – Interpreting or Misinterpreting Correlation

a)      Various factors are associated with the gross domestic product (GDP) of nations. State whether each of the following statements is reasonable or not. If not, explain the blunder.

(i)                 A correlation of –0.722 shows that there is almost no association between GDP and Infant Mortality Rate.

(ii)               There is a correlation of 0.44 between GDP and Continent.

(iii)             There is a very strong correlation of 1.22 between Life Expectancy and GDP.

(iv)             The correlation between Literacy Rate and GDP was 0.83. This shows that countries wanting to increase their standard of living should invest heavily in education.

b)     An article in a business magazine reported that Internet E-commerce has doubled nearly every three years. It then stated that there was a high correlation between sales made on the Internet and year. Do you think this is an appropriate summary? Explain in one sentence.

c)      Simpson’s Paradox can occur in regression, when a relationship between variables within groups of observations is reversed if all the data are combined. Here is an example.

 Group X Y Group X Y 1 1 10.1 2 6 18.3 1 2 8.9 2 7 17.1 1 3 8.9 2 8 16.2 1 4 6.9 2 9 15.1 1 5 6.1 2 10 14.3

(i)                 Make a scatterplot of the data for Group 1 and add the least squares line. Describe the relationship between Y and X for Group 1. Find the correlation (using Excel).

(ii)               Do the same for Group 2.

(iii)             Make a scatterplot using all 10 observations and add the least squares line. Find the correlation (using Excel).

(iv)             Summarize your findings in one or two sentences.

d)     Since 1980, average mortgage interest rates in the U.S. have fluctuated from a low of under 6% to a high of over 14%. Is there a relationship between the amount of money people borrow and the interest rate that’s offered? Here is a scatterplot of Total Mortgages in the U.S. (in millions of 2005 dollars) vs. Interest Rates at various times over the past 26 years. The correlation is -0.84.

(i)                 Describe the relationship between Total Mortgages and Interest Rate.

(ii)               If we standardized both variables, what would the correlation coefficient between the standardized variables be?

(iii)             If we were to measure Total Mortgages in thousands of dollars instead of millions of dollars, how would thecorrelation coefficient change?

(iv)             Suppose in another year, interest rates were 11%, and mortgages totalled \$250 million. How would including that year with these data affect the correlation coefficient?

(v)               Do these data provide proof that if mortgage rates are lowered; people will take out more mortgages? Explain.

Question 2 – Regression and the Market Model (Calculations from Summary Statistics)

It is usual in finance to describe the returns from investing in a single stock by regressing the stock’s returns of the returns from the stock market as a whole. This helps us see how closely the stock follows the market. We analyzed the monthly percent total return y on Research in Motion (RIM), now called BlackBerry, stock and the monthly return x on the NASDAQ index, which represents the market, for the period between January 2005 andDecember 2009. Here are the results.

A scatterplot shows no very influential observations.

a)      Find the equation of the least-squares line. What percent of the variation in RIM stock is explained by the linear relationship with the market as a whole?

b)     Interpret what the slope and the y-intercept of the regression line indicate. The slope is called “beta” in investment theory.

c)      Based on the statistics above, how effective do you think the monthly return on the NASDAQ index would be in predicting the monthly percent total return on RIM stock? Explain.

d)     Returns on most individual stocks have a positive correlation with returns on the entire market. Explain why an investor should prefer stocks with beta > 1 when the market is rising and stocks with beta < 1 when the market is falling.

Question 3 – Residual Plots – Halifax Real Estate Listings (yes, again)

In Assignment 2, Question 3, you provided scatterplots and regression calculations. One more step in the analysis is a residual plot.For your convenience, I have extracted the data needed from the original spreadsheet. The worksheet RealEstate has the List Price and Total Area data for all 98 listings of properties, sorted from lowest to highest total area. And, to make your life a little easier, here is the regression equation that you should have computed in Assignment 2:            Price = 42424.75 + 307.06´Area.

a) Compute the residuals, and construct a residual plot. Your solution should only show the plot; do not include the listing of the 98 residuals!

b) Does the plot show that the linear regression model is appropriate here? Explain in one sentence.

c) Compute the standard deviation of the residuals (se).

Question 4 – Project Management and Random Variables

PERT (Project Evaluation and Review Technique) and CPM (Critical Path Method) are related management science techniques that help operations managers control the activities and amount of time it takes to complete a project. The longest path from starting point to completion is called the critical path because any delay along this path will result in a project delay.

The operations manager of a large plant wishes to overhaul a machine. His critical path has five activities. The mean (i.e. expected value) and the variance of completion time for each activity is listed below. Assume the activities are independent of one another.

 Activity Mean (mins.) Variance 1. Disassemble machine 35 10 2. Determine parts that need replacing 20 6 3. Find needed parts in inventory 20 4 4. Reassemble machine 50 13 5. Test machine 20 3

a)      What are the mean,variance and standard deviation of the project total completion time?

b)     Assuming that the total completion time is approximately normally distributed:

i)                    Find the probability that the project will take more than 165minutes to complete.

ii)                  Find the probability that the project will take less than 141minutes to complete.

iii)                Find the probability that the project will take between 141 and 165minutes to

complete.

iv)                If this project were repeated many times, what total completion time would be exceeded by at most 5% of such projects.Report your answer rounded to the nearest whole minute.

Question 5 – Sampling Distribution of Proportions

A university bookstore claims that 50% of its customers are satisfied with the service and prices.

a)      What is general shape of the sampling distribution of the sample proportion of customers who are satisfied? Why can assume that? Then give the mean and standard deviation of the sample proportion .

b)     If we assume that the bookstore’s claim is true (i.e. that the true proportion is indeed 0.50), what is the probability that in a simple random sample of 600 customers less than 45% are satisfied?

c)      Repeat part b) using a simple random sample of 1200.

d)     Suppose that in a random sample of 600 customers, 270 express satisfaction with the bookstore. What does this tell you about the bookstore’s claim?Hint: Refer to part b).

Question 6 – Sampling Distribution of Means

A scientist claims that people who eat high-fibre cereal for breakfast will consume, on average, fewer calories for lunch than people who don’t eat high-fibre cereal for breakfast. If this is true, high-fibre cereal manufacturers will be able to claim potential weight reduction for dieters as an advantage of eating their product. As a preliminary test of the claim, 150 people were randomly selected and asked what they regularly eat for breakfast and lunch. Each person was identified as either as a Consumer or a Non-consumer of high-fibre cereal, and the number of calories eaten at lunch was measured and recorded. The data are provided in the accompanying spreadsheet under the tab Cereal.

a)      For the 43 Consumers, compute the mean and standard deviation of the number of lunchtime calories.

b)     Assume that the mean and standard deviation computed in (a) are true for the general population of Consumers.What is the general shape of the sampling distribution of the sample mean?What are the mean and standard deviation of the sample mean?

c)      What is the probability that the sample mean number of lunchtime calories of Consumers will be less than 600?

d)     Repeat partsa) and b) for the 107 Non-consumers.

e)      This part is harder: What is the probability that the mean number of lunchtime calories of Non-consumers exceeds the mean number of lunchtime calories of Consumers? Assume that the means and standard deviations computed for the samples are true of the general population.