VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
FACULTY OF ENGLISH LANGUAGE TEACHER EDUCATION
GRADUATION PAPER
AN EVALUATION OF SOME ASPECTS OF THE
VALIDITY OF A READING ACHIEVEMENT
TEST (THE 3B END-OF-TERM READING TEST)
FOR SECOND YEAR MAINSTREAM STUDENTS
IN THE SCHOOL YEAR 2013 – 2014 AT FELTE,
ULIS, VNU
Supervisor: Dr. Dương Thu Mai
Student: Vũ Thị Hương
Group: QH2010
HA NOI – 2014
ĐẠI HỌC QUỐC GIA HÀ NỘI
TRƯỜNG ĐẠI HỌC NGOẠI NGỮ
KHOA SƯ PHẠM TIẾNG ANH
KHÓA LUẬN TỐT NGHIỆP
ĐÁNH GIÁ MỘT SỐ KHÍA CẠNH TRONG TÍNH GIÁ
TRỊ CỦA MỘT BÀI THI HẾT HỌC PHẦN MÔN
ĐỌC- HIỂU (BÀI THI HẾT HỌC PHẦN MÔN ĐỌC
TIẾNG ANH 3B) NĂM HỌC 2013-2014 DÀNH CHO
SINH VIÊN NĂM THỨ HAI, HỆ ĐẠI TRÀ, KHOA SƯ
PHẠM TIẾNG ANH, TRƯỜNG ĐẠI HỌC NGOẠI
NGỮ, ĐẠI HỌC QUỐC GIA HÀ NỘI
Giáo viên hướng dẫn: TS. Dương Thu Mai
Sinh viên: Vũ Thị Hương
Khóa: QH2010
HÀ NỘI – NĂM 2014
difficulties to complete this study.
i
ABSTRACT
Test evaluation is a complicated phenomenon, which has been paid
much attention by number of researchers since the importance of language
tests in assessing the achievements of students was raised. When evaluating a
test, evaluators should focus on criteria of a good test of which validity and
reliability are two important factors. In this current study, researcher chose the
English 3B end-of-term reading test for second year mainstream students at
FELTE, ULIS, VNU in the school year 2013 - 2014 to evaluate with an aim at
checking the content validity and construct validity as well as estimating the
internal reliability of the test. From the interpretation of the data got from the
test scores, survey questionnaires, and test specifications analysis, the
researcher has found out that the English 3B end-of-term reading test is
reliable in the aspect of internal reliability. The content validity has been
checked as well and the test is concluded to demonstrate a relatively high level
of content relevance and show some evidence of the representativeness.
Besides, it is also proved to show structure validity to some extent. However,
the study remains limitations that lead to the researcher’s directions for future
studies.
ii
TABLE OF CONTENTS
ACKNOWLEGEMENT i
ABSTRACT…………………………………………………………………… ii
TABLE OF CONTENTS iii
LIST OF FIGURES AND TABLES ……………………… vii
LIST OF ABBREVIATIONS … …………………………………………… x
PART I: INTRODUCTION…………………………………… 1
1. Statement of research problem and rationale for the study……………… 2
2. Goals and objectives of the study………………………………………… 4
1.5.3.2. Types of reliability………………………………… 26
2. Review of related studies on validity of reading test…………………… 29
2.1. Studies on the validity of reading tests worldwide……………… 29
2.2. Studies on the validity of reading test in Vietnam…………………30
CHAPTER 2: METHODOLOGY …………………………………………… 32
1. The reading assessment context for second year mainstream students at
FELTE, ULIS, VNU……………………………………………………….32
1.1. Test administration procedure…………………………………… 32
1.2. Test specifications………………………………………………… 32
2. Research questions…………………………………………………………38
3. Research Participants and the selection of participants ………………… 39
4. Data collection method…………………………………………………….39
4.1. Survey………………………………………………………………39
4.2. Document observation…………………………………………… 40
5. Data collection procedure………………………………………………….40
5.1. Survey questionnaire……………………………………………….40
5.2. Document observation…………………………………………… 41
6. Data analysis and procedure……………………………………………….42
CHAPTER 3: FINDINGS AND DISCUSSION……………………………… 44
1. Data analysis and results………………………………………………… 44
1.1. Research question 1: The content validity of the test as perceived by
iv
teachers ……………………………………………………………….44
1.2. Research question 2: The structure validity of the test……………61
1.3. Research question 3: The internal reliability of the test ………… 64
2. Findings and discussion…………………………………………………66
2.1. Major findings………………………………………………………66
2.2. Content validity of the test……………………………………….66
2.3. Structure validity of the test…………………………………… 67
2.4. Reliability of the test…………………………………………… 68
Table 3.7: Teachers’ opinions about the tested skills in questions 86-89…………49
Table 3.8: Teachers’ opinions about the tested skills in questions 90-93…………49
Table 3.9: Teachers’ opinions about the tested skills in questions 94-99……… 50
Table 3.10: Teachers’ opinions about the tested skills in questions 100-101……51
Table 3.11: Teachers’ opinions about the tested skills in question 102………….51
Table 3.12: Teachers’ opinions about the tested skills in question 103………….52
Table 3.13: Teachers’ opinions about the tested skills in questions 104-105……52
Table 3.14: Teachers’ opinions about the tested skills in question 106………… 53
Table 3.15: Teachers’ opinions about the tested skills in questions 107-110……54
Table 3.16: Teachers’ opinions about the difficulty level of questions 71-75……54
Table 3.17: Teachers’ opinions about the difficulty level of questions 76-80……55
Table 3.18: Teachers’ opinions about the difficulty level of question 81……… 55
Table 3.19: Teachers’ opinions about the difficulty level of question 82……… 56
Table 3.20: Teachers’ opinions about the difficulty level of questions 83-84……56
Table 3.21: Teachers’ opinions about the difficulty level of question 85……… 56
Table 3.22: Teachers’ opinions about the difficulty level of questions 86-89……57
Table 3.23: Teachers’ opinions about the difficulty level of questions 90-93……57
Table 3.24: Teachers’ opinions about the difficulty level of questions 94-99……58
Table 3.25: Teachers’ opinions about the difficulty level of questions 100-101…58
vii
Table 3.26: Teachers’ opinions about the difficulty level of questions 102………59
Table 3.27: Teachers’ opinions about the difficulty level of question 103………59
Table 3.28: Teachers’ opinions about the difficulty level of questions 104-105…60
Table 3.29: Teachers’ opinions about the difficulty level of question 106………60
Table 3.30: Teachers’ opinions about the difficulty level of questions 107-110…60
Table 3.31: The distribution of the skill tested in the test specifications and the
course guide……………………………………………………………………… 62
Table 3.32: Internal reliability statistics of the test……………………………… 65
viii
LIST OF ABRREVIATIONS
self-evaluate their ability through testing. Thus, Read (1997) states that “a test can
help both teachers and learners to clarify what the learners really need to know.”
Obviously, not only the teachers but also learners may achieve the benefits through
testing. That’s the reason why testing is implemented in schools at different levels
in general and at University of Languages and International Studies, Vietnam
National University, Hanoi (ULIS, VNU) in particular.
In spite of its crucial importance, designing a test is not an easy work.
Sometimes, the content of the tests maybe suitable for this type of learners and their
levels but it is not suitable for other types of learners who are at different levels.
Thus, learners’ abilities may be evaluated inappropriately. In fact, some universities
in Vietnam, a non-native English speaking environment, have to buy the test format
from a prestigious university to have a standardized test. Nonetheless, it is still not
2
fair enough if the original tests are applied for Vietnamese students without any
changes in the test content.
In the context of ULIS, VNU, tests for students are sometimes adapted from
Cambridge University with the tests at various levels such as PET, FCE and CAE
or tests from International English Language Testing System (IELTS) and
Test of English as a Foreign Language (TOEFL). However, there is a fact that what
is tested may not be exactly what is taught in the course because the test is designed
from faraway universities and the levels of students of those universities are
different from the supposed standards of the university, ULIS, VNU. Besides, some
tests are made by teachers themselves. In this case, the tests may not be verified
and thus its quality cannot be assured. In addition, “what test writers are concerned
with seems to be the reliability of the test and its validity” (Le, 2010). That is to
say, reliability and validity are the two most essential qualities of a good language
test. They are also the main considerations of test writers when designing a test.
However, a test may be reliable but not valid. For example, a reading test with
many multiple-choice questions about vocabulary and grammar used in the passage
is reliable; nonetheless, it is not valid because it tests not only students’ reading
VNU valid in terms of content validity as perceived by teachers?
To what extent is the 3B end-of-term reading test for second year
mainstream students in the school year 2013 - 2014 at FELTE, ULIS,
VNU valid in terms of structure validity?
What is the internal reliability of the 3B end-of-term reading test for
second year mainstream students in the school year 2013 - 2014 at
FELTE, ULIS, VNU?
4. Significance of the study
Once completed, the study would bring about certain advantages. To be
more specific, the findings of the research would supply to test makers and
teachers in the targeted context some useful information about whether their
inferences about the students are accurate or not and whether how the reading
result is true to students’ ability. Hence, test makers and teachers or testing
experts would get more information about the real situation of testing at
FELTE, ULIS, VNU and thus, find out solutions for the inadequacy, if any, in
the test’s content and structure. In addition, the research would also be a source
4
of references for further research in the same field.
5. Scope of the study
Firstly, in this paper the researcher emphasizes the reading achievement tests
instead of investigating other kinds of tests such as replacement tests or
diagnostic tests. Secondly, the exploration of other language skills like writing,
listening and speaking is not included in this study. Furthermore, within the
scope of graduation paper, only two aspects of validity, including content and
structure validity, together with the internal reliability of the test are studied. It
is said that content and structure are two of the most important aspects of the
validity of a test; moreover, internal reliability is also a crucial factor in
evaluating the quality of a test. The consistency within test items contributes a
great part in the quality of a good test. Thirdly, due to the limitation of time and
experience, the study is carried out only with a reading achievement test in
Part II: Development – This part consists of three chapters
Chapter 1: Literature Review – in which the literature that related to
language of testing and test evaluation.
Chapter 2: Methodology – is concerned with the methods of the study,
the selection of participants, the materials and the methods of data
collection and analysis as well the results of the process of data analysis.
Chapter 3: Results and Findings - in which the results of the study is
presented and analyzed; and some findings are also reported.
Part III: Conclusion – this part will be the summary to the study,
limitations as well the recommendations for further studies.
6
PART II: DEVELOPMENT
CHAPTER 1: LITERATURE REVIEW
This chapter is an attempt to establish the theoretical background for the study.
The key concepts of language testing including measurement, tests and evaluation,
validity and reliability and some related studies worldwide and in Vietnam will be
reviewed.
7
1. Key concepts
1.1. Assessment, measurement, test and evaluation
The terms “assessment”, “measurement”, “test” and “evaluation” are sometimes
used as synonyms and in reality they can mention to the same activity (Bachman,
1990). For example, when someone is asked about his or her evaluation of a
student, he or she often gives the test score of that students and bases on that to
evaluate the student. However, they still have some distinctive features.
1.1.1. Assessment
According to Nitko (1996), assessment is “a broad term defined as a process
for obtaining information that is used for making decisions about students,
curricula and programs, and educational policy.” For example, basing on
assessment, teachers can make decisions about managing classroom instruction,
Nitko (1996) gives the definition of evaluation as “the process of making a
value judgment about the worth of a student’s product of performance.” At this
point, he emphasizes the relationship between students’ behaviors and the judgment
on them. At the same point, Genesee and Upshur (1996) claim that evaluation is
basically about making decision. This is also the view of Weiss (1972, cited in
Bachman, 1990) in which evaluation is “the systematic gathering of information for
the purpose of making decisions.” However, evaluation may be separated from
tests and measurements. In this situation, evaluation might be carried without any
test or measurement because “evaluation may or may not be based on
measurements or test results” (Nitko, 1996). As such, evaluation “does not
necessarily entail testing” (Bachman, 1990). The relationship between evaluation,
tests and measurement is represented in the chart below:
9
Figure 1.1: Relation between evaluation, tests and measurements
(Bachman, 1990)
As can be seen from the graph, evaluation and measurement are two different
notions but they still have some common features. Furthermore, testing is a method
of measurement and hence tests also have some shared characteristic with
evaluation. In a nutshell, evaluation, measurement and tests always have a close
relationship with each other and they are essential components of language testing.
1.2. Test purposes
There are various ways to classify test purposes. Wiersma and Jurs (1990)
suggest a list of test purposes about the tasks that a test is expected to perform.
Description: Many tests are developed to describe the current status of
individuals on a wide range of variables.
Prediction: It means that some tests are used for the purpose of predicting
examinees’ performance in the future.
Assessing individual differences: Some tests are used to differentiate
between people in order to identify those who are the highest and those who
are the lowest on some measures.