Đánh giá tính giá trị của các bài kiểm tra tiếng Anh dành cho học sinh lớp 10 ở một số trường THPT miền Trung và miền Bắc Việt Nam, từ Hà Tĩnh đến Hà Nam - Pdf 25

Vietnam national university, ha noi
College of foreign languages
. . nguyễn thị hoàng lân

An evaluation on the validity of english tests
used for English 10 at some Higher Secondary
Schools in the middle and north of viet nam ,
from ha tinh to ha nam

đánh giá tính hiệu lực của các bài kiểm tra tiếng anh dành cho
học sinh lớp 10 ở một số tr-ờng thpt miền trung và miền bắc việt
nam; từ hà tĩnh đến hà nam

======== ========

Ma thesis Field: methodology
Code:
Supervisor: Dr. Hà cẩm tâm

đánh giá tính hiệu lực của các bài kiểm tra tiếng anh dành cho
học sinh lớp 10 ở một số tr-ờng thpt miền trung và miền bắc việt
nam; từ hà tĩnh đến hà nam

======== ========

Ma thesis

Field: methodology
Code:

Ha Noi, 2009
v
table of contents Introduction

1.3.4.6. Souces of invalidity 21
1.4. Test items for phonetics, structures and vocabulary 22
1.4.1. Test items 22
1.4.2. Language components 23
1.4.3. The test items types used to evaluate phonetics, structures 23
and vocabulary
1.5. Syllabus Objectives on language components 24

Chapter 2: The study

2.1. Research Questions 26
2.2. Data Description 26
2 3.Analytical framework for data analysis 29
2.3.1. Content validity 29
2.3.2. Construct validity 30
2.3.3. Face validity 30
2.3. Data Analysis and Discussion 31
2.3.1. Content validity of the tests used 31
2.3.1.1. Content validity of 45 minute written tests 31
2.3.1.2. Content validity of 15 minute written tests 36
2.3.1.3. Content validity of final written tests 41
2.3.2. Construct validity of the tests used 46
2.3.3. Face validity of the tests used 53

Conclusion

3.1. Conclusion 55
3.2. Implications 56
2. Limitations and Suggestions for further studies 59

1
Introduction 1. Rationale of the study

English has played an integral role in increasing the development of science,
technology, culture and international relations. This fact has resulted in the growing
demand for English language learning and teaching in many parts of the world. In addition,
the world-wide globalization process has confined English the most widely used means of
international communication. The need to master English to access to information and
interactions with each other is increasingly growing in many parts of the world. English
teaching is undoubtly the ultimate capacity-building tool.

Fully recognizing the importance of this global language, Vietnamese Ministry of
Education has encouraged and required pupils of Secondary Schools to learn it as a
compulsory subject during at least seven years. English is also a compulsory subject in the
Higher Secondary graduation examination.

students' understanding is testing. Besides, thanks to testing, teachers can evaluate the
effectiveness of the used syllabus or its contents, objectives, methods and to identify,
locate the difficult areas that their pupils are being confronted with in learning process
through tests.

For the past ten years or so, there have been a number of changes in the practice of
English teaching in Viet Nam tertiary education. Some regard methodology, from
Grammar translation method to Communicative approach. Some involve in course books.
Some are concerned with technology, from traditional tape recorders to modern LCD
projectors. Some are related to testing. For example, at Higher Secondary Schools in recent
years there is a shift in testing from Subjective tests to Objective tests, which has great
effects on teaching and learning process. Therefore, in testing pupils’ progress, teachers
tend to design more objective tests and many mid-term or final tests are multiple choice
questions. This is considered as a good preparation for students to perform well in the
entrance university tests which exists in multiple choice questions. However, the problem
is that the English 10 is one of the three new course books of Ministry of Education which
focus on improving the four skills reading, writing, speaking and listening and help
students to consolidate their grammar in the Language Focus part. Thus, multiple choice
questions seem to fail to test pupil’s progress accurately. The question arising is that
whether the tests used at High Schools test what students are supposed to acquire

3
according to the objectives of the textbook. This is also one of the major reasons why I
carried the research on validity.

In addition, Test researchers and developers have admitted that validity are critical
for tests and referred to as integral measurement qualities. Because this quality provides
major justification for using test score numbers as a basic for making inferences or
decisions (Bachman and Palmer, 1996:19). From educative perspectives that both teachers
and students should have their voice heard about instructional content, mode of syllabus

help us remove weak items even before we record the results of the tests.

Another reason for the selection of this research topic lies in the fact that language
testing at Higher Secondary Schools has not been paid enough attention to. As a teacher, I
have been involved in designing, administering and marking any kinds of English tests.
Yet I have also witnessed neither comprehensive nor systematic evaluation nor research on
the effectiveness and appropriateness of these tests. No formal discussions or seminar on
test construction or test methods have been carried out. There is a lack of a language test
item bank, a professionals testing committee, who judges the quality of the tests and takes
the responsibility for the given tests.

For the above-mentioned reasons, as a learner, a teacher, and a beginning
researcher of English, the author has been encouraged to conduct the study entitled: “An
evaluation on the validity of English tests used for English 10 at Higher Secondary Schools
in the middle and north of Viet Nam , from Ha Tinh to Ha Nam” with a view to evaluate
the validity of the tests used for pupils at Higher Secondary School. It is hoped that the
study will benefit the author as well as teachers at Higher Secondary Schools and those
who are concerned with language testing in general and English testing techniques at
Higher Secondary School in particular.

2. Scope of the study.

In this study the author intends to focus mainly on the content validity, construct
validity, face validity of progress achievement tests including 15 minute tests, mid-term
tests, and final achievement tests consisting of final- term tests and final tests in the school
years of 2007-2008 and 2008-2009at the 12 high schools in 6 provinces from Ha Tinh to
Ha Nam. The results can be seen as the basis for providing some suggestions for test

to investigate their evaluative comments on the face validity of the tests they designed.

Beside the use of critical reading, analysis and questionnaires for data collection,
the study made use of other supporting methods such as interviews, informal discussions,
opinion exchanges with teachers and students to gather necessary information about the
learning, teaching and testing situations at High Schools.

6

The methods used in the study are quantitative and qualitative. 5. Research questions

This study is implemented to find the answers to the following research question:
- Do the achievement tests for Higher Secondary School pupils of grade meet the
following criteria: content validity, construct validity, face validity? 6. organization of the thesis

This thesis is comprised three parts:
Part one introduces the rationale of the study, the scope, the aims, the methods,
research questions.
Part two is the development of the thesis which is divided into three chapters
Chapter one reviews the literature related to language testing (basic concepts, roles,
types of testing, criteria of a good test and test items for reading, writing, grammar and
vocabulary.).
Chapter two presents the methodology including the curricula of English 10, Data,
Participants and Analytical framework for data analysis (Construct validity, Content

Development

chapter 1. Literature Review

This chapter reviews the theories and literature relevant to the topic under
investigation in the present study. The chapter starts with basics concepts of testing and
then the definition and types of achievement tests are reviewed. A brief review of major
characteristics of a good language test is presented with a major focus on test validity,
especially construct, content and face validity. Next, test items for phonetics, structures and
vocabulary is discussed. Finally, Curricula of English 10 is provided with the objectives
and the content of the English 10. 1.1. Basic concepts of testing 8
Testing is an essential part of every teaching and learning experience and becomes
one of the main aspects of methodology. Many researchers have demonstrated definitions
of testing with different point of view.

Allen (1974: 313) emphasizes testing as an instrument to ensure that students have
a sense of competition rather than to know how good their performance is and in which
condition a test can take place. He contends that “test is a measuring device which we use
when we want to compare an individual with other individuals who belong to the same
group."

Carrol (1968: 46) holds that a psychological or educational test is a procedure
designed to elicit certain behavior from which one can make inferences about certain

Brown (1994:252) states that "A test, in plain or ordinary words, is a method of
measuring a person's ability or knowledge in a given area". Moore (1992:138) proposes
that evaluation is an essential tool for teachers because it gives them feedback concerning
what the students have learned and indicates what should be done next in the learning
process. Evaluation helps us to better understand students, their abilities, interests,
attitudes, and needs so as to teach more effectively and motivate them. However, in the
book of Brown (1994:373) he stresses that are seen by learners as dark clouds hanging
over their heads, upsetting them with thunderous anxiety as they anticipate the lightning
bolts of questions they do not know and worst of all a flood of disappointed if they do not
make the grade.

From the above descriptions, though different researchers holds different point of
view on testing, in short, testing is an effective means of measuring and assessing students'
language knowledge and skills. It is of great use to both language teaching and learning.

1.2. Achievement tests

Just as there are many purposes for which language tests are developed, so there are
many types of language tests. Some types of tests serve a variety of purposes while others
are more restricted in their applicability. The tests collected were designed basing on the
text book English 10 and were intended to assess pupils' progress, therefore in this part
definition as well as kinds of achievement tests are presented

1.2.1. Definition 10
Achievement tests are defined differently depending on researchers' points of view.
Hughes (1990:10) held that.“, achievement tests are directly related to language

11

Final achievement tests are administered at the end of a course and its purpose is to
measure the achievement of the course as a whole. These tests may be written and
administered by ministries of education, official examining boards, or by members of
teaching institutions. Obviously, the content of these tests must be related to the courses
with which they are concerned, but the nature of this relationship is a matter of
disagreement amongst language testers.
According to some testing experts, the content of a final achievement test should be
based directly on a detailed course syllabus or on the books and other materials used. This
is known as the syllabus-content approach. The test should has an obvious appearance for
it only contains what it is thought that the pupils have actually encouraged and therefore
can be considered, in this respect at least, a fair test. However, this test holds a
disadvantage that if the syllabus is badly designed, or the books and other materials are
badly chosen, then the results of the test can be very misleading. Successful performance
on the test may not truly indicate successful achievement of course objectives.

The alternative approach is to design the test content basing directly on the
objectives of the course, which has a variety of advantages. First, it forces course designers
to elicit about course objectives. This in turn puts pressure on those who are responsible for
the syllabus and the selection of books and materials to ensure that these are consistent
with the course adjectives. Tests based on course objectives work against the perpetuation
of poor teaching practice, a kin of course-content-based test, almost as if conspiracy fails to
do. I strongly believe that test content based on course objectives is much preferable,
which provides more accurate information about individual and group achievement, and is
likely to promote a more beneficial backwash effect on teaching.

1.2.2.2. Progress achievement tests

- Discrimination
Moreover, we will have further details as follow 1.3.1. Reliability

Reliability is a necessary characteristic of any good test. It is of primary importance
in the use of proficiency tests for both public achievement and classroom tests. An
appropriateness of the various factors affecting reliability is important for the teacher at the

13
very outset, since many teachers tent to regard tests as infallible measuring instruments and
fail to realize that even the best test is indeed a somewhat imprecise instrument with which
to measure language skills.

A fundamental criterion against any language test, which has to be judged is its
reliability. The concern here is with how far we can depend on the results that a test
produces. Three aspects of reliability are usually taken into account. The first concern the
consistency of scoring among different makers. The second is the concern of the tester
how to enhance the agreement between makers by establishing, and maintaining adherence
to, explicit guidelines for the conduct of this making. The third aspect of reliability is that
of parallel-forms reliability, the requirements of which have to be born in mind when
future alternative forms of a test have to be devised.
The concept of reliability is particularly important when considering language tests
within the communicative paradigm. Moreover, Davies (1968) stresses that reliability is
the first essential for any test, but for certain kinds of language tests, they may be very
difficult to achieve the appropriate results. 1.3.2. Discrimination

administers. The most obvious practical considerations concerning the tests overlook.
Firstly, the length of time available for the administration of the test if frequently
misjudged even by experienced test writers, especially if the complete test consists of a
number of sub-tests. Another practical consideration concerns the answer sheets and the
stationary used. The use of answer sheets, however, greatly facilitates marking and is
strongly recommended when large numbers of pupils are being tested. The question of
practicability, is not confined solely to oral tests, such written tests as situational
composition and controlled writing tests depend not only on the availability of qualified
markers who can make valid judgment concerning the use of language, etc. but also on the
length of time available for the scoring of the test. A final point concerns the presentation
of the test paper itself, where possible, it should be printed or typewritten and appear neat,
tidy and authentically pleasing. 1.3.4.Validity

According to Huges, A. (1989:22), " A test is said to be valid if it measures
accurately what it is intended to measure". The test must aim to provide a true measure of
the particular skill which it is supposed to measure. When closely examined, however, the
concept of validity reveals a number of aspects, each of which deserves our attention. 15

1.3.4. 1. Content validity

" A test is said to have content validity if its content constitutes a representative
sample of the language skills, structures, etc. with which it is meant to be concerned."
(Huges, A.,1989:22). This kind of validity depends on careful analysis of the language
being tested and of the particular course objectives. It is obvious that a grammar test, for

being defined after the test has been prepared.

- The content validity depends on the relevance of the individual’s test relevance of
item content.

The more a test stimulates the dimensions of observable performance and accords
with what is known about that performance, the more likely it is to have content and
construct validity. According to Kelly (1978:8), content validity seems "an almost and
completely overlapping concept" with construct validity, and for Moller (1982:68), " the
distinction between construct and content validity language proficiency." 1.3.4 2. Construct validity

Construct validity is defined by Anastasi (1982:144) as " the extent to which the
test many be said to measure a theoretical construct of trait. Each construct is developed to
explain and organize observed response consistencies. It derives from establish inter-
relationships among behavioral measures focusing on a broader, more enduring and more
abstract kind of behavioral description construct validation requires the gradual
accumulation of information from a variety of source. Any data throwing light on the
nature of the trait under consideration and the condition affecting its development and
manifestations are grist for this validity mill."

Construct validity is viewed from a purely statistical perspective in much of the
recent American literature Bachman and Palmer (1981a). It is seen principle as a matter of
the posterior statistical validation of whether a test has measured a construct that has a
reality independence of other constructs.
According to Hughes, A, 1989: 26, a test, part of a test, or a testing technique is
said to have construct validity if it can be demonstrated that it measures just the ability
which is supposed to measure. The word " construct" refers to any underlying ability (or

is being considered. A reading test can not be used to test the concurrent validity of a
grammar test. In addition, if teachers' ranking are being used, it is essential to make sure
that they understand on what basis they are expected to rank the students. If the test being
considered is a grammar test, then the teachers should be asked to rank the students
according to their grammar proficiency, not their overall English language ability.

It is said that predictive validity is different from concurrent validity in that "
instead of collecting the external measures at the same time as the administration of
experimental test, the external measure will only be gathered some time after the test has

18
been given". (Alderso et al, 1995). To put it in a simple way, predictive validity is the
extent to which the test in question can be used to make predictions about the future
performance. For example, does a test of English ability accurately predict how well
students will get along in a university in an English- speaking country? There are
numerous problems with attempting to answer such questions. Measures used to know how
well a student does at a university are sometimes employed to measure predictive validity,
but the problem is that there are many factors other than English proficiency involved in
academic success. Furthermore, it is not possible to know whether the students who scored
low on the tests and therefore did not get to go to university would have done if they had
been allowed to go. However, it is undeniable that prediction is an important and justifiable
use of language tests, and evidence that indicates a relationship between test performance
and the behaviour that is to be predicted provides support for the validity of this use of test
results. However, there is a wide range of situations in which we are not interested in
prediction at all, but in determining the levels of abilities of language learners.

In short, information about criterion relatedness- concurrent or predictive - is by
itself insufficient evidence for validation. ( Bachman 1990: 253). That is one of the reasons
why in this thesis, the author do not evaluate the criterion-related validity in tests.

1.3.4 5. Backwash validity

Language teachers operating in a communicative frame work normally attempt to
equip students with skills that are judged relevant to present of future needs, and to the
extent that tests are designed to reflect these, the closer the relationship between the test
and the teaching that precede it, the more the test is likely to enhance construct validity. A
suitable criterion for judging communicative tests in the future might well be the degree to
which they satisfy pupils, teachers and future users of test results, as judged by some
systematic attempt to gather data on the perceived validity of the test. If the first stage, with
its emphasis on construct, content, face, backwash validity, the bypassed procedures do not
suit the purpose for which it was intended.

On balance, special attention must be paid to the validity of a test when one
constructs it. Although there are many kinds of validity, from Harrison's conclusion, only
face validity and content validity are most vital for the teacher setting his own tests. This
view of validity provides a specific and useful framework for language test evaluation and
is also adapted in this thesis. 1.3.4.6. Souces of invalidity

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Đánh giá tính giá trị của các bài kiểm tra tiếng Anh dành cho học sinh lớp 10 ở một số trường THPT miền Trung và miền Bắc Việt Nam, từ Hà Tĩnh đến Hà Nam - Pdf 25

Tài liệu, ebook tham khảo khác

Học thêm