Kuder richardson formula 20

Kuder Richardson Formula 20

The Kuder Richardson Formula 20 (KR-20) is a statistical measure used to assess the internal consistency reliability of tests or scales, particularly those composed of dichotomous items. It falls under the broad category of statistical measurement and is a fundamental concept in psychometrics. This formula helps determine how consistently all items within a test measure the same underlying construct. When a test has high internal consistency, its individual items are highly correlated with each other, suggesting they are measuring a single, unified concept.¹²

History and Origin

The Kuder Richardson Formula 20 was developed by G. Frederic Kuder and M. W. Richardson and first published in their seminal 1937 paper, "The Theory of the Estimation of Test Reliability."¹¹ Their work sought to provide a method for estimating the reliability of a test using a single administration, a significant advancement over methods requiring multiple test administrations or split-half approaches. The KR-20 formula, specifically the twentieth formula presented in their paper, became a standard tool for evaluating the internal consistency of tests where items are scored dichotomously, such as true/false questions or multiple-choice questions with a single correct answer.

Key Takeaways

The Kuder Richardson Formula 20 (KR-20) is a measure of internal consistency reliability for assessments with dichotomous (e.g., right/wrong, yes/no) items.
It quantifies how consistently all items within a test measure the same underlying construct.
KR-20 values range from 0 to 1, with higher values indicating greater reliability.
A high KR-20 suggests that the test items are homogeneous and closely related.
It is most appropriate for tests where answers are scored as either correct (1) or incorrect (0).

Formula and Calculation

The Kuder Richardson Formula 20 is calculated using the following formula:

KR_{20} = \left( \frac{k}{k-1} \right) \left( 1 - \frac{\sum_{i=1}^{k} p_i q_i}{\sigma_X^2} \right)

Where:

( k ) = the total number of items on the test.
( p_i ) = the proportion of individuals who answered item ( i ) correctly.
( q_i ) = the proportion of individuals who answered item ( i ) incorrectly (which is ( 1 - p_i )).
( \sum_{i=1}^{k} p_i q_i ) = the sum of the products of ( p_i ) and ( q_i ) for all items. This term represents the sum of the item variances.
( \sigma_X^2 ) = the variance of the total scores for all individuals on the test.

This formula provides a coefficient that reflects the overall internal consistency across all test items. Calculating the variance of scores is a critical step in this statistical method.

Interpreting the Kuder Richardson Formula 20

The value obtained from the Kuder Richardson Formula 20 (KR-20) is a reliability coefficient that ranges from 0.00 to 1.00. A higher KR-20 value indicates greater internal consistency and, consequently, higher reliability for the assessment.¹⁰

Values close to 1.00: Suggest that the test items are highly interrelated and consistently measure the same construct. For instance, a KR-20 of 0.85 or higher is generally considered excellent for standardized tests in educational or psychological settings, indicating strong reliability.
Values between 0.70 and 0.80: Are often considered acceptable for many research and practical applications, suggesting a reasonable degree of internal consistency.
Values below 0.70: May indicate questionable or low reliability, implying that the items might not be consistently measuring the same construct or that there might be significant error variance in the scores.⁹

It is important to note that while a high KR-20 indicates consistency, it does not guarantee the validity of the test—that is, whether the test actually measures what it intends to measure. Reliability is a necessary, but not sufficient, condition for validity.

Hypothetical Example

Consider a financial literacy quiz designed for new investors, consisting of 10 true/false questions (dichotomous variables). A financial institution wants to assess the internal consistency of this quiz before widely distributing it. They administer the quiz to a pilot group of 50 participants.

For each of the 10 questions, they calculate the proportion of correct answers ((p_i)) and incorrect answers ((q_i)). For example, if 40 out of 50 participants answered Question 1 correctly, (p_1 = 0.80) and (q_1 = 0.20), so (p_1 q_1 = 0.80 \times 0.20 = 0.16). They repeat this for all 10 questions and sum the (p_i q_i) values to get ( \sum p_i q_i ).

Next, they calculate the total score for each of the 50 participants (number of correct answers out of 10) and then compute the variance of these total scores ((\sigma_X^2)).

Suppose the sum of (p_i q_i) for all 10 questions is 1.5, and the variance of the total scores ((\sigma_X^2)) is 2.5.
Using the KR-20 formula:

KR_{20} = \left( \frac{10}{10-1} \right) \left( 1 - \frac{1.5}{2.5} \right)

KR_{20} = \left( \frac{10}{9} \right) \left( 1 - 0.6 \right)

KR_{20} = 1.111 \times 0.4

KR_{20} \approx 0.444

In this hypothetical example, a KR-20 of approximately 0.444 suggests low internal consistency for the financial literacy quiz. This indicates that the questions may not be consistently measuring the same aspect of financial knowledge, and the test developers might need to revise the questions to improve the reliability of test scores.

Practical Applications

While primarily rooted in psychometrics and educational assessment, the principles of Kuder Richardson Formula 20 (KR-20) can extend to other fields, including certain aspects of data analysis relevant to finance. In a broader sense, whenever a survey or assessment with dichotomous responses is used to gauge a consistent trait or understanding, KR-20 can be a useful tool.

Some practical applications include:

Educational Testing: Its most common application is in evaluating the reliability of standardized tests, classroom quizzes, and other assessments with multiple-choice or true/false questions. This ensures that the test accurately reflects a student's knowledge or skill set.
*⁸ Survey Design: In research methodology, if a survey uses a series of "yes/no" or "agree/disagree" questions to measure a specific attitude, belief, or consumer behavior, KR-20 can assess the internal consistency of those survey items.
Certification Exams: For professional certifications where candidates must answer a series of questions correctly to demonstrate competency, KR-20 can help validate the reliability of the examination.
Financial Literacy Assessments: As seen in the hypothetical example, a financial institution might use KR-20 to check the reliability of an assessment designed to measure an individual's financial literacy. Such assessments often use dichotomous items to evaluate understanding of concepts like diversification or portfolio performance.
Quantitative Analysis of Risk Perception: If a questionnaire uses binary responses to gauge an individual's risk tolerance or perception of specific financial risks, KR-20 could be applied to ensure the consistency of those items. An article on Education Resources Information Center (ERIC) discusses its use for assessing test reliability.

⁷## Limitations and Criticisms

Despite its utility, the Kuder Richardson Formula 20 has several limitations and criticisms:

Dichotomous Items Only: KR-20 is strictly applicable only to tests with dichotomous items (e.g., correct/incorrect, true/false). It cannot be used for scales with polytomous scoring (e.g., Likert scales, partial credit), where Cronbach's Alpha is the appropriate measure.
*⁶ Assumption of Homogeneity (Unidimensionality): While a high KR-20 value is often interpreted as an indicator of a homogeneous test, homogeneity is actually an assumption, not a conclusion. It assumes that all items measure a single, underlying construct (unidimensionality). If a test measures multiple different constructs, the KR-20 may underestimate or inaccurately represent the true internal consistency.
Sensitivity to Item Difficulty: The KR-20 can be affected by the difficulty level of the test items and the spread of scores. If all items are either extremely easy or extremely difficult, leading to little variance in responses, the KR-20 may be misleadingly low even if the items are internally consistent.
Not Suitable for Speeded Tests: If a test is highly speeded (i.e., test takers are unable to complete all items due to time limits), KR-20 estimates may be inflated because items not reached are often scored as incorrect, artificially increasing consistency.
*⁵ Comparison with Cronbach's Alpha: As discussed in academic literature, while KR-20 is a special case of Cronbach's Alpha for dichotomous data, there are nuances in their application and interpretation, particularly when considering modern measurement theory.

⁴## Kuder Richardson Formula 20 vs. Cronbach's Alpha

The Kuder Richardson Formula 20 (KR-20) and Cronbach's Alpha are both widely used measures of internal consistency reliability, but they are applied in different contexts.

Feature	Kuder Richardson Formula 20 (KR-20)	Cronbach's Alpha
Item Scoring	Specifically for dichotomous items (e.g., 0 or 1, yes/no).	For polytomous items (e.g., Likert scales, partial credit scores).
Relationship	A special case of Cronbach's Alpha. If items are dichotomous, KR-20 and Cronbach's Alpha will yield the same result.	³ A generalization of KR-20, applicable to a wider range of data types.
Primary Use Case	Educational tests with right/wrong answers, binary surveys.	Psychological scales, surveys with graded responses, general questionnaires.
Underlying Assumption	Items measure a single construct, but less flexible with varying item difficulties.	Items measure a single construct; more flexible with item difficulty.

The key distinction lies in the type of data they are designed to handle. KR-20 is specifically for binary responses, where each item has only two possible outcomes. Cronbach's Alpha, on the other hand, is a more versatile coefficient that can be used for items with multiple ordered response categories, such as rating scales. Researchers often encounter confusion due to their close theoretical relationship.

FAQs

What does a high KR-20 value mean?

A high KR-20 value, typically above 0.70, indicates strong internal consistency. This means that the individual items within a test are highly correlated with each other and are likely measuring the same underlying concept or trait reliably.

²### Can KR-20 be used for a survey with "strongly agree" to "strongly disagree" options?
No, the Kuder Richardson Formula 20 is only suitable for dichotomous items, meaning questions that have only two possible correct/incorrect or yes/no answers. For surveys with a range of ordered responses, like "strongly agree" to "strongly disagree" (a Likert scale), Cronbach's Alpha would be the appropriate measure of internal consistency.

What is the difference between reliability and validity in the context of KR-20?

Reliability, as measured by KR-20, refers to the consistency of a test's results. A reliable test will produce similar scores under similar conditions. Validity, however, refers to whether the test actually measures what it claims to measure. A test can be reliable without being valid, but a valid test must generally be reliable.

¹### How does the number of items affect KR-20?
Generally, increasing the number of items in a test can lead to a higher KR-20 value, assuming the new items are of similar quality and measure the same construct. Longer tests tend to be more reliable as random errors on individual items tend to cancel each other out over more items. However, simply adding items without consideration for their quality will not necessarily improve reliability.