The Pearson chi-square test (χ2) compares differences between groups on variables measured at the nominal level. The χ2 compares the frequencies that are observed with the frequencies that are expected. When a study requires that researchers compare proportions (percentages) in one category versus another category, the χ2 is a statistic that will reveal if the difference in proportion is statistically improbable.
A one-way χ2 is a statistic that compares different levels of one variable only. For example, a researcher may collect information on gender and compare the proportions of males to females. If the one-way χ2 is statistically significant, it would indicate that proportions of one gender are significantly higher than proportions of the other gender than what would be expected by chance (Daniel, 2000). If more than two groups are being examined, the χ2 does not determine where the differences lie; it only determines that a significant difference exists. Further testing on pairs of groups with the χ2 would then be warranted to identify the significant differences.
A two-way χ2 is a statistic that tests whether proportions in levels of one nominal variable are significantly different from proportions of the second nominal variable. For example, the presence of advanced colon polyps was studied in three groups of patients: those having a normal body mass index (BMI), those who were overweight, and those who were obese (Siddiqui, Mahgoub, Pandove, Cipher, & Spechler, 2009). The research question tested was: “Is there a difference between the three groups (normal weight, overweight, and obese) on the presence of advanced colon polyps?” The results of the χ2 test indicated that a larger proportion of obese patients fell into the category of having advanced colon polyps compared to normal weight and overweight patients, suggesting that obesity may be a risk factor for developing advanced colon polyps. Further examples of two-way χ2 tests are reviewed in Exercise 19.
Research Designs Appropriate for the Pearson χ2
Research designs that may utilize the Pearson χ2 include the randomized experimental, quasi-experimental, and comparative designs (Gliner, Morgan, & Leech, 2009). The variables may be active, attributional, or a combination of both. An active variable refers to an intervention, treatment, or program. An attributional variable refers to a characteristic of the participant, such as gender, diagnosis, or ethnicity. Regardless of the whether the variables are active or attributional, all variables submitted to χ2 calculations must be measured at the nominal level.
Statistical Formula and Assumptions
Use of the Pearson χ2 involves the following assumptions (Daniel, 2000):
- Only one datum entry is made for each subject in the sample. Therefore, if repeated measures from the same subject are being used for analysis, such as pretests and posttests, χ2 is not an appropriate test.
- The variables must be categorical (nominal), either inherently or transformed to categorical from quantitative values.
- For each variable, the categories are mutually exclusive and exhaustive. No cells may have an expected frequency of zero. In the actual data, the observed cell frequency may be zero. However, the Pearson χ2 test is sensitive to small sample sizes, and other tests, such as the Fisher’s exact test, are more appropriate when testing very small samples (Daniel, 2000; Yates, 1934).
The test is distribution-free, or nonparametric, which means that no assumption has been made for a normal distribution of values in the population from which the sample was taken (Daniel, 2000).
The formula for a two-way χ2 is:
χ 2 =n[(A)(D)−(B)(C)] 2 (A+B)(C+D)(A+C)(B+D)
The contingency table is labeled as follows. A contingency table is a table that displays the relationship between two or more categorical variables (Daniel, 2000):
With any χ2 analysis, the degrees of freedom (df) must be calculated to determine the significance of the value of the statistic. The following formula is used for this calculation:
R=Number of rows
C=Number of columns
A retrospective comparative study examined whether longer antibiotic treatment courses were associated with increased antimicrobial resistance in patients with spinal cord injury (Lee et al., 2014). Using urine cultures from a sample of spinal cord–injured veterans, two groups were created: those with evidence of antibiotic resistance and those with no evidence of antibiotic resistance. Each veteran was also divided into two groups based on having had a history of recent (in the past 6 months) antibiotic use for more than 2 weeks or no history of recent antibiotic use.
The data are presented in Table 35-1. The null hypothesis is: “There is no difference between antibiotic users and non-users on the presence of antibiotic resistance.”
ANTIBIOTIC RESISTANCE BY ANTIBIOTIC USE
Antibiotic Use No Recent Use
Resistant 8 7
Not resistant 6 21
The computations for the Pearson χ2 test are as follows:
Step 1: Create a contingency table of the two nominal variables:
Used Antibiotics No Recent Use Totals
Resistant 8 7 15
Not resistant 6 21 27
Totals 14 28 42 ←Total n
Step 2: Fit the cells into the formula:
χ 2 =n[(A)(D)−(B)(C)] 2 (A+B)(C+D)(A+C)(B+D)
χ 2 =42[(8)(21)−(7)(6)] 2 (8+7)(6+21)(8+6)(7+21)
χ 2 =666,792158,760
χ 2 =4.20
Step 3: Compute the degrees of freedom:
Step 4: Locate the critical χ2 value in the χ2 distribution table (Appendix D) and compare it to the obtained χ2 value.
The obtained χ2 value is compared with the tabled χ2 values in Appendix D. The table includes the critical values of χ2 for specific degrees of freedom at selected levels of significance. If the value of the statistic is equal to or greater than the value identified in the χ2 table, the difference between the two variables is statistically significant. The critical χ2 for df = 1 is 3.84, and our obtained χ2 is 4.20, thereby exceeding the critical value and indicating a significant difference between antibiotic users and non-users on the presence of antibiotic resistance.
Furthermore, we can compute the rates of antibiotic resistance among antibiotic users and non-users by using the numbers in the contingency table from Step 1. The antibiotic resistance rate among the antibiotic users can be calculated as 8 ÷ 14 = 0.571 × 100% = 57.1%. The antibiotic resistance rate among the non-antibiotic users can be calculated as 7 ÷ 28 = 0.25 × 100% = 25%.
The following screenshot is a replica of what your SPSS window will look like. The data for subjects 24 through 42 are viewable by scrolling down in the SPSS screen.
Step 1: From the “Analyze” menu, choose “Descriptive Statistics” and “Crosstabs.” Move the two variables to the right, where either variable can be in the “Row” or “Column” space.
Step 2: Click “Statistics” and check the box next to “Chi-square.” Click “Continue” and “OK.”
Interpretation of SPSS Output
The following tables are generated from SPSS. The first table contains the contingency table, similar to Table 35-1 above. The second table contains the χ2 results.
The last table contains the χ2 value in addition to other statistics that test associations between nominal variables. The Pearson χ2 test is located in the first row of the table, which contains the χ2 value, df, and p value.
Final Interpretation in American Psychological Association (APA) Format
The following interpretation is written as it might appear in a research article, formatted according to APA guidelines (APA, 2010). A Pearson χ2 analysis indicated that antibiotic users had significantly higher rates of antibiotic resistance than those who did not use antibiotics, χ2(1) = 4.20, p = 0.04 (57.1% versus 25%, respectively). This finding suggests that extended antibiotic use may be a risk factor for developing resistance, and further research is needed to investigate resistance as a direct effect of antibiotics.
- Do the example data meet the assumptions for the Pearson χ2 test? Provide a rationale for your answer.
- What is the null hypothesis in the example?
- What was the exact likelihood of obtaining a χ2 value at least as extreme or as close to the one that was actually observed, assuming that the null hypothesis is true?
- Using the numbers in the contingency table, calculate the percentage of antibiotic users who were resistant.
- Using the numbers in the contingency table, calculate the percentage of non-antibiotic users who were resistant.
- Using the numbers in the contingency table, calculate the percentage of resistant veterans who used antibiotics for more than 2 weeks.
- Using the numbers in the contingency table, calculate the percentage of resistant veterans who had no history of antibiotic use.
- What kind of design was used in the example?
- What result would have been obtained if the variables in the SPSS Crosstabs window had been switched, with Antibiotic Use being placed in the “Row” and Resistance being placed in the “Column”?
- Was the sample size adequate to detect differences between the two groups in this example? Provide a rationale for your answer.