News

Validity: What does it mean for the TOEIC tests?

What are the characteristics that guarantee the quality of a test?

November 2019

In today’s global workplace environment, the ability to speak English is essential. Consequently, many employers must make critical decisions concerning the English language skills of their employees. The TOEIC® tests are designed specifically to facilitate these decisions. For these tests, as for any products, we may ask ourselves what makes us decide to buy something? Most of the time, we look for great service and a great product.

In this article, based on an ETS research conducted by Donald E. Powers, we will try to answer the question: what makes a test a great product? The TOEIC® tests are widely available and their scores are recognized worldwide and there is one element that really defines their value: the meaningful or valid scores that the test yields.

But what does validity mean?

It is both a very simple concept and a very complex one. In simple terms, and according to ETS researcher Donald E. Powers « validity is the degree to which a test is doing the job it was intended to do, whether that job is certifying that someone has mastered a given body of knowledge, whether it is to facilitate admissions decisions to colleges or universities by measuring a person’s readiness for further study, or whether it is to attest to the fact that someone possesses the knowledge, skills and abilities needed to practice medicine or law or to begin teaching ». In other words, do the scores mean what we think they mean, and does the test fulfil the purpose for which it was designed? Validation is considered by some to be never-ending, and so there is usually no clear agreement on just how much evidence is necessary. However, the more evidence, the better.

What is done to ensure validity as the tests are being developed?

First, TOEIC developers are specialists in language learning and language testing and have access to specialists in other areas of testing. But that is not all. Very thorough and detailed test specifications (called test blueprints) are followed to ensure that each form is highly similar to every other one and to ensure that the right content is covered in the same proportion in each test form. The reviewing process is exceedingly thorough. For the TOEIC items, some 20 reviewers inspect each and every test question before it is used. Some questions don’t survive this scrutiny, and others may undergo extensive revision before they meet the required standards of quality. And finally, each distinct type of question is thoroughly pilot-tested in the design phase to make sure it performs properly, to ensure that test takers know how to deal with each kind of question format and to make sure that questions are appropriately difficult. Afterwards, routine statistical analyses, performed after tests are administered, make sure that test items are working properly. Also, ongoing research of various kinds is conducted to support the test.

Validity evidence can take many forms, but there are basically only a few major kinds of evidence, which are: logical analyses, examination of how test-takers approach the test questions, examination of differences in test structure processes or language use across groups (professional writers vs. novice writers for instance) and finally, examination of how test scores relate to other variables or criteria.

One kind of evidence that has been collected for the TOEIC tests involves self-assessments provided by test takers themselves: test takers are asked how well they can perform each of a variety of different language tasks in English. As a result, we have found that when these questions are asked properly, people are reasonably accurate at reporting how well they can perform these tasks. The extent that these self-reports agree reasonably well with the TOEIC scores provides good evidence for the validity of scores (i.e., that the scores mean what the test makers say they mean).

In 2007, ETS researchers asked about 5,000 test-takers who took the TOEIC® Listening and Reading test to complete a self-assessment inventory when they took the test. The inventory asked test takers to consider each of 25 reading tasks and 24 listening tasks and indicate how easily they could perform each task. The same kind of inventory was then done for each TOEIC tests. Results showed that for each of them there is evidence that links scores to the likelihood that someone can perform, either easily or with little difficulty, a wide variety of every day or workplace language tasks in English. This information gives meaning to the TOEIC scores in very practical terms and it demonstrates the value of the TOEIC scores to test score users.

To conclude, one of the key added values of the TOEIC tests lies in their validity, that means the extent to which the tests do what we claim they can do. The very careful way in which the tests are designed greatly contribute to the validity of the TOEIC tests and further evidence of this validity comes from special studies. More evidence will continue to be generated to make a stronger and stronger case for the value/validity of the TOEIC assessments.