超星Categorical Data Analysis答案(学习通2023课后作业答案)

分类: 名华慕课题库发布于:2024-06-02 17:00:10ė41799次浏览636条评论

超星Categorical Data Analysis答案(学习通2023课后作业答案)

Chapter 1 Introduction

1.1 Preface to Categorical Data Analysis随堂测验

1、超星Is the variable nominal,答答案 or ordinal? Political party affiliation (Democrat, Republican, unaffiliated).

2、Is the variable nominal,案学 or ordinal? Highest degree obtained (none, high school, bachelor’s, master’s, doctorate).

3、Is the variable nominal,习通 or ordinal? How often feel depressed (never, occasionally, often, always).

1.2 Probability Distributions for Categorical Data + 1.3 Statistical Inference for A Proportion随堂测验

1、For the binomial outcome of y successes in n trials,课后 the maximum likelihood estimate of π equals p = y / n.

2、In the formula of a large sample 100(1-α)% confidence interval for π,作业 α is the right-tail probability.

3、In the binomial distribution,超星 when n is large, the distribution of y can be approximated by a normal distribution with

homework 1

1、A coin is 答答案?ipped twice. Let Y =number of heads obtained, when the probability of a head for a ?ip equals π. a. Assuming π =0.50 , specify the probabilities for the possible values for Y, and ?nd the distribution’s mean and standard deviation. b. Find the binomial probabilities for Y when π equals (i) 0.60, (ii) 0.40. c. Suppose you observe y =1 and do not know π. Calculate and sketch the likelihood function. d. Using the plotted likelihood function from (c), show that the ML estimate of π equals 0.50.

2、Genotypes AA,案学 Aa, and aa occur with probabilities (π1,π2,π3). For n=3 independent observations, the observed frequencies are (n1,n2,n3). a. Explain how you can determine n3 from knowing n1 and n2. Thus, the multinomial distribution of (n1,n2,n3) is actually two-dimensional. b. Show the set of all possible observations, (n1,n2,n3) with n=3. c. Suppose (π1,π2,π3)=(0.25,0.50,0.25). Find the multinomial probability that (n1,n2,n3)=(1,2,0). d. Refer to (c).What probability distribution does n1 alone have? Specify the values of the sample size index and parameter for that distribution.

3、To collect data in an introductory statistics course,习通 recently I gave the students a questionnaire. One question asked whether the student was a vegetarian. Of 25 students, 0 answered “yes.”They were not a random sample, but let us use these data to illustrate inference for a proportion. (You may wish to refer to Section1.4.1onmethodsofinference.) Let π denote the population proportion who would say “yes.” Consider H0:π =0.50 and Ha:π =0.50. a. What happens when you try to conduct the “Wald test,” for which uses the estimated standard error? b. Find the 95% “Wald con?dence interval” for π. Is it believable? (When the observation falls at the boundary of the sample space, often Wald methods do not provide sensible answers.) c. Conduct the “score test,” for which uses the null standard error. Report the P-value. d. Verify that the 95% score con?dence interval (i.e., the set of π0 for which |z|< 1.96 in the score test) equals (0.0,0.133). (Hint: What do the z test statistic and P-value equal when you test H0:π =0.133 against Ha:π ≠ 0.133.)

Chapter 2 Contingency Tables(1)

2.1 Probability Structure for Contingency Tables随堂测验

1、In a two-by-two contingency table,课后 the row totals of the joint probabilities equals to 1.

2.2 Comparing Proportions in Two-by-Two Tables随堂测验

1、A large sample (Wald) confident interval for is 作业.

2.3 The Odds Ratio随堂测验

1、

2、超星

2.4 Chi-Square Test of Independence随堂测验

1、答答案For testing independence in 2×2 contingency tables,案学 the Pearson statistics’ chi-square distribution has df= ____?

Chapter 2 Contingency Tables(2)

2.5 Testing Independence for Ordinal Data随堂测验

1、In the independence testing, the statistics has approximately a chi-squared distribution with .

2.7 Association in Three-Way Tables随堂测验

1、The conditional independence of X and Y can imply marginal independence of X and Y.

homework 2

1、A newspaper article preceding the 1994 World Cup semi?nal match between Italy and Bulgaria stated that “Italy is favored 10–11 to beat Bulgaria, which is rated at 10–3 to reach the ?nal.” Suppose this means that the odds that Italy wins are 11/10 and the odds that Bulgaria wins are 3/10. Find the probability that each team wins, and comment.

2、Data posted at the FBI website (www.fbi.gov) stated that of all blacks slain in 2005, 91% were slain by blacks, and of all whites slain in 2005, 83% were slain by whites. Let Y denote race of victim and X denote race of murderer. a. Which conditional distribution do these statistics refer to, Y given X, orX given Y? b. Calculate and interpret the odds ratio between X and Y. c. Given that a murderer was white, can you estimate the probability that the victim was white?What additional information would you need to do this? (Hint: How could you use Bayes’s Theorem?)

3、A statistical analysis that combines information from several studies is called a meta analysis. A meta analysis compared aspirin with placebo on incidence of heart attack and of stroke, separately for men and for women (J. Am. Med. Assoc., 295: 306–313, 2006). For the Women’s Health Study, heart attacks were reported for 198 of 19,934 taking aspirin and for 193 of 19,942 taking placebo. a. Construct the 2×2 table that cross classi?es the treatment (aspirin, placebo) with whether a heart attack was reported (yes, no). b. Estimate the odds ratio. Interpret. c. Find a 95% con?dence interval for the population odds ratio for women. Interpret.(As of 2006,results suggested that for women, aspirin was helpful for reducing risk of stroke but not necessarily risk of heart attack.)

Chapter 3 Generalized Linear Model(1)

3.1 Components of A Generalized Linear Model随堂测验

1、The random component of a GLM model identifies the explanatory variables.

2、The systematic component of a GLM specifies the response variable.

3、The GLM choice of link function is separate from the choice of random component.

3.2 Generalized Linear Models for Binary Data随堂测验

1、The parameter β determines the rate of increase or decrease of the curve. As |β| decreases, the curve has a steeper rate of change.

2、probit(0.975)=?

Chapter 3 Generalized Linear Model(2)

3.3 Generalized Linear Models for Count Data随堂测验

1、A one-unit increase in x has a multiplicative impact of on Y in Poisson regression model.

3.4 Statistical Inference and Model Checking随堂测验

1、A large_sample 100(1-α)% (Wald) confidencr interval for is

homework 3

1、 Refer to Table 2.7 on x =mother’s alcohol consumption and Y =whether a baby has sex organ malformation.With scores (0,0.5,1.5,4.0,7.0) for alcohol consumption, ML ?tting of the linear probability model has the output: a. State the prediction equation, and interpret the intercept and slope. b. Use the model ?t to estimate the (i) probabilities of malformation for alcohol levels 0 and 7.0, (ii) relative risk comparing those levels.

2、Refer to the previous exercise1 and the solution to (b). a. The sample proportion of malformations is much higher in the highest alcohol category than the others because,although it has only one malformation, its sample size is only 38. Is the result sensitive to this single malformation observation? Re-?t the model without it (using 0 malformations in 37 observations at that level), and re-evaluate estimated probabilities of malformation at alcohol levels 0 and 7 and the relative risk. b. Is the result sensitive to the choice of scores? Re-?t the model using scores (0,1,2,3,4), and re-evaluate estimated probabilities of malformation at the lowest and highest alcohol levels and the relative risk. c. Fit a logistic regression or probit model. Report the prediction equation. Interpret the sign of the estimated effect.

Chapter 4 Logistic Regression(2)

homework 4

1、A study used logistic regression to determine characteristics associated with Y =whether a cancer patient achieved remission (1=yes). The most important explanatory variable was a labeling index (LI) that measures proliferative activity of cells after a patient receives an injection of tritiated thymidine. It represents the percentage of cells that are “labeled.” The first table shows the grouped data. Software reports the second table for a logistic regression model using LI to predict a. Conduct a Wald test for the LI effect. Interpret. b. Construct a Wald con?dence interval for the odds ratio corresponding to a 1-unit increase in LI. Interpret. c. Conduct a likelihood-ratio test for the LI effect. Interpret. d. Construct the likelihood-ratio con?dence interval for the odds ratio. Interpret.

2、For the horseshoe crab data , ?t the logistic regression model for π =probability of a satellite, using weight as the predictor. a. Report the ML prediction equation. b. Find at the weight values 1.20, 2.44, and 5.20kg, which are the sample minimum, mean, and maximum. c. Find the weight at which . d. At the weight value found in (c), give a linear approximation for the estimated effect of (i) a 1kg increase in weight. This represents a relatively large increase, so convert this to the effect of (ii) a 0.10kg increase, and (iii) a standard deviation increase in weight (0.58kg). e. Construct a 95% con?dence interval to describe the effect of weight on the odds of a satellite. Interpret. f. Conduct the Wald or likelihood-ratio test of the hypothesis that weight has no effect. Report the P-value, and interpret. Note: you can get the data in the R package "icda" which is named by "horseshoecrabs".

3、A study in Florida that stated that the death penalty was given in 19 out of 151 cases in which a white killed a white, in 0 out of 9 cases in which a white killed a black, in 11 out of 63 cases in which a black killed a white, and in 6 out of 103 cases in which a black killed a black. The table shows results of ?tting a logit model for death penalty as the response (1=yes), with defendant’s race (1=white) and victims’ race (1=white) as indicator predictors. a. Based on the parameter estimates, which group is most likely to have the “yes” response? Estimate the probability in that case. b. Interpret the parameter estimate for victim’s race. c. Using information shown, construct and interpret a 95% likelihood-ratio con?dence interval for the conditional odds ratio between the death penalty verdict and victim’s race. d. Test the effect of victim’s race, controlling for defendant’s race, using a Wald test or likelihood-ratio test. Interpret.

Chapter 5 Building and Applying Logistic Regression Models

homework 5

1、Table 4.13 shows the result of cross classifying a sample of people from the MBTI Step II National Sample (collected and compiled by CPP, Inc.) on whether they report drinking alcohol frequently (1=yes, 0=no) and on the four binary scales of the Myers–Briggs personality test: Extroversion/Introversion (E/I), Sensing/iNtuitive (S/N), Thinking/Feeling (T/F) and Judging/Perceiving (J/P). The 16 predictor combinations correspond to the 16 personality types: ESTJ, ESTP, ESFJ, ESFP, ENTJ, ENTP, ENFJ, ENFP, ISTJ, ISTP, ISFJ, ISFP, INTJ, INTP, INFJ, INFP. Table 5.10 shows the result of ?tting a model using the four scales as predictors of whether a subject drinks alcohol frequently. a. Conduct a model goodness-of-?t test, and interpret. b. If you were to simplify the model by removing a predictor, which would you remove?Why? c. When six interaction terms are added, the deviance decreases to 3.74. Show how to test the hypothesis that none of the interaction terms are needed, and interpret. .

2、From the same survey referred to in Problem 1, Table 5.11 cross-classi?es whether a person smokes frequently with the four scales of the MBTI personality test. SAS reports model ?2 log likelihood values of 1130.23 with only an intercept term, 1124.86 with also the main effect predictors, 1119.87 with also all the two-factor interactions, and 1116.47 with also all the three-factor interactions. a. Write the model for each case, and show that the numbers of parameters are 1, 5, 11, and 15. b. According to AIC, which of these four models is preferable? c. When a classi?cation table for the model containing the four main effect terms uses the sample proportion of frequent smokers of 0.23 as the cutoff, sensitivity = 0.48 and speci?city = 0.55. The area under the ROC curve is c =0.55. Does knowledge of personality type help you predict well whether someone is a frequent smoker? Explain.

3、Table 5.12 summarizes eight studies in China about smoking and lung cancer. a. Fit a logistic model with smoking and study as predictors. Interpret the smoking effect. b. Conduct a Pearson test of goodness of ?t. Interpret. c. Check residuals to analyze further the quality of ?t. Interpret.

Chapter 6 Multicategory Logit Models(2)

homework 6

1、Table 6.14 displays primary food choice for a sample of alligators, classi?ed by length (≤2.3 meters, >2.3 meters) and by the lake in Florida in which they were caught. a. Fit a model to describe effects of length and lake on primary food choice. Report the prediction equations. b. Using the ?t of your model, estimate the probability that the primary food choice is “?sh,” for each length in Lake Oklawaha. Interpret the effect of length.

2、Forthe 2002 General Social Survey, counts in the happiness categories (not, pretty, very) were (6,43,75) for below average income, (6,113,178) for average income, and (6,57,117) for above average income. Table shows output for a cumulative logit model with scores{ 1,2,3}for the income categories. a. Explain why the output reports two intercepts but one income effect. b. Interpret the income effect. c. Report a test statistic and P-value for testing that marital happiness is independent of family income. Interpret. d. Does the model ?t adequately? Justify your answer. e. Estimate the probability that a person with average family income reports a very happy marriage.

Mid-term Exam

Mid-term Exam

1、To analyze whether smoking is associated with lung cancer, one organization collected 9925 samples from a town. Table 1 shows the survey result. smoker was defined as a person who had smoked at least one cigarette a day for at least a year. Table 1 Status Cases Nonsmokers without lung cancer 7765 Nonsmokers with lung cancer 32 Smokers without lung cancer 2089 Smokers withou lung cancer 39 Please answer questions below: a. (6s) Construct contingency table based on collected samples. b. (9s) Regard smoking status as explanatory variable and lung cancer status as response variable. Assume that smoking group and non-smoking group are independent binomial samples. Let and denote the probability of lung cancer status in smoking group and non-smoking group. For a desired significance level of 0.05, construct Score test of vs .

2、Refer to previous exercise. a. (21s) Treat two rows of contingency table as independent binomial samples. i. (11s) Find the sample proportions of lung cancer status in two groups , the sample difference of proportions and the Wald standard error of sample difference of proportions. ii. (10s) For a desired significance level of 0.05, testing vs using sample difference of proportions and Wald method. b. (8s) Calculate estimated expected frequencies for each cell. c. (6s) For a desired significance level of 0.05, using chi-squared test to test independence in contingency table vs , j=1,2.

3、(10s) Table 2 contains results of a study comparing radiation therapy with surgery in treating cancer of the larynx. Use Fisher’s exact test to test vs . Interpret results. Table 2 Cancer Controled Cancer Not Canctroled Surgery 15 1 Radiation therapy 10 2

4、(20s) Table 3 classifies a sample of 800 boys according to their socioeconomic status (S), their boy scouts status (B) and whether they had committed juvenile delinquency (D). Table 3 Socioeconomic Status Boy Scouts Juvenile Delinquency Yes No Low Yes 11 43 No 42 169 Middle Yes 14 104 No 20 132 High Yes 8 196 No 2 59 a. (6s) Construct the marginal table and sample odds ratio of B and D. b. (8s) Testing the independence between B and D vs . (at the 0.05 significance level) c. (6s) Find the sample marginal odds ratio , , for different level (Low, Middle, High) of S.

5、As shown in table 4, new data are collected for study of snoring and heart disease. The study based on a survey of 2390 subjects to investigate snoring as a possible risk factor for heart disease. The subjects were classified according to their snoring level, as reported by their spouses. We treat the rows of the table as independent binomial samples with probabilities , and use scores (0, 2, 4, 5) for snoring level . Probit regression model and logistic regression model are fitted separately. Table 4 Snoring Heart Disease Logit Fit Probit Fit Yes No Never 20 1300 ( ) ( ) Occasional 30 600 0.047 0.064 Near Every Night 20 190 0.083 0.115 Every Night 30 200 ( ) ( ) a. (4s) Supposed the fitted logistic regression model is And fitted probit regression model is Report the prediction equation of in logistic regression model and probit regression model, where snoring level (in forms of ). b. (16s) Refer to (a), fill in the blanks in table 4 with snoring level 0, 6. Please write down the calculation in detail.



Ɣ回顶部