NGHIÊN cứu về VIỆC RA đề KIỂM TRA VIẾT NHẰM ĐÁNH GIÁ kết QUẢ học tập của học SINH lớp 12 tại TRƯỜNG THPT NGÔ QUYỀN, hải PHÒNG - Pdf 10

CHAPTER 1: INTRODUCTION
1.1. RATIONALE
A good test can be used as a valuable teaching device. Heaton (1991:5) states that
“test may be constructed primarily as devices to reinforce learning and to motivate the
student or primarily as a means of assessing the students’ performance in the language.”
According to this linguist, the relationship between testing and teaching is “so closely
interrelated that it is virtually impossible to work in either field without being constantly
concerned with the other”. For proper evaluation and assessment of the English language
learning and teaching process, testing, an important tool in educational research and for
program evaluation (Lauwerys and Seanlon (1969:2) is employed as an indispensable part
of the training program at Ngo Quyen high school (NGHS) in Hai Phong city.
However, the designing a good test is not simple. Having been a teacher of English
for many years, I have been involved in designing, administering and marking many kinds
of English tests such as progress and end-of-term tests and also have often heard teachers
and test-takers at NQHS complaining that some of the final achievement tests for 12
th
form
students do not faithfully reflect the real linguistic competence of the test-takers. What is
tested is not really taught and the test measures neither the achievement of the course
objectives nor the expected linguistic skills and knowledge of the students. Probably, this
is because the test writers use the tests which are designed elsewhere and are not suitable
for the students. What test writers are concerned with seems to be the reliability of the test
rather than its validity. The situation coincides with the comments made by some test
researchers as Brown (1994: 373) and Hughes (1989:1) on recent language testing, “a great
deal of language testing is of very poor quality. Too often language testing has a harmful
effect on teaching and learning and too often they fail to measure accurately whatever it is
they are intended to measure”. Another reason is that language testing here has not been
paid enough attention to. I have not witnessed either comprehensive or systematic
evaluation on the effectiveness and appropriateness of these tests.
For the above-mentioned reasons, the author is encouraged to undertake this minor
thesis with the aim at investigating the designing final written achievement tests for the 12

current final achievement test in terms of its validity.
- To find out the differences and similarities (if there are any) in teachers’ and test takers’
evaluation of the test and to suggest reasons why there are such similarities and
differences.
- To provide some practical recommendations for the improvement of the final
achievement tests so as to achieve more accurate measures of students’ English
competence.
1.4. RESEARCH QUESTIONS
The research questions of the study are as follows:
- How is the current final achievement test for the 12
th
form students at NGHS evaluated by
both students and teachers in terms of its validity?
- What improvements are recommended by the teachers and students with regard to the
validity of the test?
1.5. METHODS OF THE STUDY
In order to achieve the above aims, a study has been carried out with the following
2
methodologies.
First, the author based herself both on the theory and principles of language testing,
major characteristics of a good test (with special focus on test validity), achievement test
and practical tips to write it. From her critical reading, many reference materials have been
gathered, analyzed, and synthesized to draw out a theoretical basis to evaluate the current
final achievement test for the 12
th
form students at NQHS .
Second, qualitative methodologies involving data collected through survey
questionnaires were employed. Two questionnaires were administered to the 12
th
form

“things” they are asked to do are specified at each level and represent authentic tasks of the
sort which confront language users in real life.
Genesee and John A. Upshur (1996) look at tests as a task that measures one’s
ability to perform a particular task. They argue that a test is, first of all, about something.
That is, it is about intelligence, or European history, or second language proficiency. In
educational terms, tests have subject matter or content. Second, a test is a task or set of
tasks that elicits observable behavior from the test taker. The test may consist of only one
task, such as writing a composition, or a set of tasks, such as in a lengthy multiple-choice
examination in which each question can be thought of as a separate task. Different test
tasks represent different methods of eliciting performance. Third, tests yield scores that
represent attributes or characteristics of individuals. In order to be meaningful, test scores
must have a frame of reference. Test scores along with the frame of reference used to
interpret them is referred to as measurement. Thus, tests are a form of measurement.
(p.141). In other words, content, methods and measurement are three aspects of tests. The
quality of the end-of-year tests depends on whether the content of the test is a good sample
of the relevant subject matter. If the content of a test is a poor reflection of what has been
taught or what is supposed to be learned, then performance on the test will not provide a
good indication of achievement in that subject area. What a test is measuring is a reflection
of not only its content but also the method it employs. Tests that employ different methods
are measuring somewhat different skills, no matter how similar their content might be.
Tests in education measure differences in degree. They describe how proficiently students
can read a second language or how appropriately they speak in particular social situations,
for example.
4
In the foreign language teaching context, a test can be defined as an educational
instrument which is designed to measure what someone can do with the foreign language
to serve a particular purpose. (McNamara:11) As an instrument, a test may be responded to
differently by testees and test-users. Understanding testees and test-users’ responses to, and
perceptions of tests has been a critical issue in foreign language testing. Such
understanding is even more important where learner-centredness is promoted as a

words, we cannot expect testing only to follow teaching; rather a good test is an obedient
servant since it follows and apes the teaching (Davies (1968: 5). What we should demand
of it, however, is that it should be supportive of good teaching and, where necessary, exert
a corrective influence on bad teaching. If testing always had a beneficial backwash on
teaching, it would have much better reputation amongst teachers (Hughes:2)
Cohen (1994) discusses the effects of backwash more broadly, in terms of “how
assessment instruments affect educational practices and beliefs” (p.41). Wall and Alderson
(1993), go a little bit farther to argue convincingly on the basis of extensive empirical
research, that backwash has potential for affecting not only individuals, but the educational
system as well.
Read (1983:2) points out: “A test can help both teachers and learners to clarify
what the learners really need to know assuming that it is unrealistic to expect them to
master everything they are presented with during a particular course.” The result of tests
shows teachers not all but part of learners’ ability, which helps teachers to improve ways of
teaching or revise knowledge.
According to Heaton (1898:7), “a well-constructed classroom test will provide the
students with an opportunity to demonstrate their ability to perform certain tasks in the
language and the students should be able to learn from their weakness”. Obviously, under
the influence of the tests, the students are motivated to use what they have done and avoid
the mistakes and errors that they have made. The learners know how far they have
achieved the object of the course so that they can upgrade their level or they have to learn
more. “A good test can sustain or enhance class morale and aid learning.” (Madsen,
(1983:3).
Because of the important role a test plays in either supporting or impeding teaching
and learning, it is critical that a test must be supportive of good teaching. This raises the
necessity to investigate the opinions of the test users, specifically the learners and the
teachers.
2.3. TYPES OF ACHIEVEMENT TESTS
An achievement test is one of the means available to teachers and students alike of
6

success of his or her teaching and help to find out how much students have gained from
what have been taught. Accordingly, the teachers can identify the weakness of the learners
7
or diagnose the areas not properly achieved during the course of study.
In short, progress achievement tests can be regarded as a useful device that provide
the students with a good chance to perform the target language in a positive and effective
manner and to gain additional confidence in doing them. This way can be a good
preparative and supportive step towards the final achievement test for the students because
they will get familiar with the tests and the strategy to do them.
2.3.2. Final achievement tests
Final achievement tests, as the name suggest, is usually a formal examination,
given at the end of the school year or at the end of the course to measure how far students
have achieved the teaching goals (Hughes(1990:10). They may be written and
administered by ministries of education, official examining board, or by members of
teaching institutions. The content of these tests must be related to the courses with which
they are concerned. Hughes (1990:11) suggests two approaches towards designing
achievement tests: syllabus-content approach and objective content approach.
The syllabus-content approach means that the content of a final achievement tests
should be based on a detailed course syllabus or on the books and other material used. The
tests designed basing on what the students have already learnt in the course books can be
considered fair tests. On the contrary, the badly designed syllabus or badly chosen material
which is different from the course objectives may bring about misleading results which are
unlikely to show what students have achieved on the other. When this occurs, test results
will fail to meet the test validity in terms of course objectives.
The syllabus-objective approach is to design the test content directly on the
objectives of the course. This approach has some good points. Firstly, it forces course
designers to elicit about course objectives. Secondly, this approach can help to work
against the poor teaching practice that syllabus content-based tests fail to do. However, this
approach has to cope with the problems in testing what the students have neither learned
nor prepared.

As for Harrison (1983:7), it is necessary for test writers to draw out a test
specification before writing a test. Test specification is resulted from the process of
designing test content and test method (Mc Namara (2000:31). The specifications include
information on the length, the structure of each part of the test, the type of materials, the
extent to which the candidates will have to engage, the source of materials, the extent to
which authentic materials may be altered, the response format, the test rubrics and how
responses are to be scored. They are usually written before the test and then the test is
written on the basis of the specifications. After the test is written, the specification should
9
be consulted again to see whether the test matches the objective set in the specification.
Therefore, writing specifications is an important step because it insures that item
writers can write up test items that measure appropriately whatever the test developers
intend to and that the range of conditions suitable for the test objectives will not be
exceeded. When writing specifications, teachers should use an index card on the top of
which they can write the test objectives and below is the table of specifications. They
should try not to repeat the wording of the objective; remember to increase the level of
detail preparatory to writing tests items. The final step is writing the items themselves and
entering them on the back of the index card.
Harrison (1983:16) indicates the following factors to be taken into consideration
when one sets up the table of specification for a test.
- Time: The first factor teachers should be attended on is answering the question how
much can be tested in the time available for the test. They should decide a reasonable
amount of time for the majority of the test takers to be able to complete the test. If not,
a counter effect will happen, as the students are too panic and fearful to do the work
under pressure of time. Students who are not given enough time will not be able to
demonstrate their full achievement. On the other hand, students who are given too
much time to do a test can treat it like a puzzle rather than an actual language test.
- Coverage: The next important factor to be taken into account is determining the test
content in terms of grammatical and functional items and skills so that it accurately
reflects the syllabus and objectives. It also involves determining whether the test

some guidelines should be given to the testees when determining the test format.
Lastly, the test writers have to decide at this point whether to use an objective or
subjective format for each part of the test. This choice has important implication for the
marking of the test.
- Difficulty is another area that calls for teachers’ attention when constructing test. It
involves choosing appropriate level for each item or part of the test. The level of
difficulty of items included in the test should parallel that of the practice activities done
by the students during the course. This kind of variation in level of difficulty of test
items appropriate to placement or proficiency tests is not necessary in an achievement
test, as it is not their primary aim to discriminate between strong and week students.
- Rubrics: The test instructions should be clear and not ambiguous unless these will
invalidate the test by misleading them by turning the instructions into an additional test
item, though unintended (Dangerfield (1985:150). The students may complete the
items wrongly because they misinterpreted the instruction. It is also advisable to
11
provide an example of an answered test item where the format permits (e.g in the case
of multiple choice or sentence transformation items but not, of course, in the case of
compositions).
- Marking: Marking is an important but complicated part in the testing structure. It is
usually the last step of the whole test-designing process to enable the tester to have the
exact and true evaluation of the testees’ performance in the test. It contains the keys,
marking instructions, marking scale, etc, needed for each item and the whole test.
- The most important point to be noted here is that weighting on different parts test
should reflect the balance of the syllabus. Second, the weighting of marks should take
into consideration the difficulty of a test item and, to an extent, the proportion of the
overall test time that is likely to take students to complete those items. A final point in
relation to marks is that, if the test includes an element which has to be marked
subjectively, the teachers should give careful proportion of the total marks for the test,
but also to the criteria to be used for assessing that element. Even when only one
person is marking a set of test papers, it is important for reliability and consistency that

is intended to measure. This seems simple enough. However, it is not simple to say
whether or not a test is valid because of its variously different sub-kinds such as face,
content and construct, each of which deserves our attention. In this part I will present each
aspect in turn.
2.4.2.1. Face validity
According to Tim McNamara (2000:105) “face validity is a type of validity
referring to the degree to which a test appears to measure the knowledge or abilities it
claims to measure, as judged by untrained observer (such as the candidate taking the test,
or the institution which plans to administer it)”. Face validity is concerned with what
teachers and students think of the test. Does it appear to them a reasonable way of
assessing the students, or does it seem trivial, or too difficult, or unrealistic? A test which
pretended to measure pronunciation ability but which did not require the candidate to
speak might be thought to lack face validity. That means, face validity concerns the appeal
of the test to the popular judgement, typically that of other testers, teachers, moderators,
and test takers.
Alderson and Clapham and Wall (1995:173) recognized face validity as an
influence factor in testing. According to them, while opinions of students about tests are
not experts, they can be important because those opinions represent the kind of response
that you can get from the people who are taking the test. If a test does not appear to be
13
valid to the test takers, they may not do their best, so the perceptions of non-experts are
useful.
2.4.2.2. Content validity
Content validity, along with face validity is considered as two types of internal
validity, which is validity in terms of the test itself.
Harrison (1983:11) defines content validity as “Content validity is concerned with
what goes into the test. The content of the test should be decided by considering the
purpose of the assessment, and then drawing up a list known as a content specification”.
This means the test content constitutes the representative sample of language skills,
structures or even the course to be measured. In this case, the relationship between the test

ability in a particular test then that part of the test would have construct validity only if we
were able to demonstrate that we were indeed measuring just that ability.
2.4.3. Relationship between reliability and validity.
Test researchers and developers have admitted that reliability and validity are
essential measurement qualities. This is because these are the qualities that provide the
major justification for using test scores numbers as a basis for marking inferences or
decisions (Bachman and Palmer, 1996:19).
We often think of reliability and validity as two distinct but related characteristics
of test scores. Although validity is the most important characteristic, reliability is a
necessary condition to validity. The two measurement qualities, reliability and construct
validity, are thus essential to the usefulness of any language tests. Reliability is a necessary
condition for construct validity, and hence for usefulness. To be valid a test must provide
consistently accurate measurements. It must therefore be reliable.
Reliability and validity are considered two basic principles by (Heaton, (1990:6)
when writing useful tests. A reliable test, however, may not be valid at all. In other words,
reliability is not sufficient condition for either construct validity or usefulness. Suppose, for
example, that we needed a test for placing individuals into different levels in an academic
writing course. A multiple-choice test of grammatical knowledge might yield very
consistent or reliable score, but this would not be sufficient to justify using test as a
placement test for writing course. This is because grammatical knowledge is only one
aspect of the ability to use language to perform academic writing tasks.
It should be noted that a test could be reliable without possessing validity.
However, reliability is clearly inadequate by itself if a test does not succeed in measuring
what it is supposed to measure. It is impossible for test writers to try in vain to increase the
validity of a reliable test due to the features of test items that constructs it. From the outset
15
of test construction, test validity should be of most essential focus of all. A reliable test, in
fact, may not be quite valid. For example, a multiple-choice test which is very reliable, but
its validity is poor if it fails to measure what it intend to measure.
Furthermore, the emphasis on test validity is recognized by Hughes (1989:22) that,

Therefore, the test items must be in a wide difficulty scale, ranging from “extremely easy
items” to “extremely difficult items”. Below is how the items in the test should be spread
over a wide difficulty level :
- extremely easy items
- very easy items
- easy items
- fairly easy items
- items below average difficult level
- items of average difficult items
- items above average difficult level
- fairly difficult items
- difficult items
- very difficult items
- extremely difficult items
Similarly, Harrison (1994:14) defines discrimination as “the extent to which a test
separates the students from each other”. The extent of the need to discriminate will vary
depending on the purpose of the test. For example, if a placement test is able to efficiently
discriminate among students, it will be much easier to divide students into suitable groups
and similarly to an achievement or a diagnostic test, the level of each individual will
clearly be shown.
Conclusion: In this chapter, I have reviewed the literature on important issues related to
language testing. These include the relationship between testing and teaching, which is
often referred to as "backwash effect", types of achievement tests, the characteristics of a
good test with an emphasis on four important constructs, i.e, reliability, validity,
practicality and discrimination. Of these constructs, validity, particularly content validity
seems to be the most important determinant which gives the test the power to test what is
to be measured.
The next chapter will present the study which includes the participants, the methods
of data collection and the data analysis.
17

Vietnam and none of them was ever trained abroad. They are well-trained and rather
professionally experienced with at least 3 years’ teaching. About a quarter of the teachers
has done and is doing M.A course. Therefore, most teachers are qualified enough to
conduct communicative activities in a foreign language lesson. They can use English in
class quite well. They also continuously acquire for themselves a great knowledge of
18
general English and specialized subjects through their self-study and in-country training
programs.
3.1.3. English teaching and learning at NQHS
Being one of the six core subjects, which are compulsory in the national
examination at the end of supper secondary school, English is paid much attention at every
school in general and at NQHS in particular.
With the renovation in education, the English program at supper secondary schools
has been redesigned recently. The seven-year English program is used nationwide in
placement of the previous three-year one. The purpose of the new one is maybe to narrow
the gaps between classroom English and real English. This course book focuses on four
skills and provides appropriate grammar and vocabulary. According to the content of the
course book for 12
th
form students, after studying 16 units students are expected to have the
following abilities:
Listening: Students are able to understand passages or dialogues related to 16 topics in the
textbooks.
Speaking: Students are able to carry out conversation about culture, future life, sports
events
Reading: Students are expected to understand the general and detailed contents of the
reading passages in length of 280-320 words about the topics of 16 units.
Writing: Students are able to write a letter of request, describe the world in the future, give
instructions
However, due to the lack of materials and equipments, the shortage of time, the

because there are not any available to them. They themselves design most tests by a cut-
and-paste method, by which I mean they use commercial tests available to write tests
without following any rules of testing. Testing techniques have not been paid proper
attention to and the role of the testing in teaching and learning has not been fully
recognized. Tests, therefore, may lack some major important criteria of a good test
concerning its validity, reliability, format and practicality.
The first and foremost characteristic is that except for those carefully designed
tests, some are of poor quality, misspelling, or too difficult. This is because test makers
only try to fulfill her or his duty without considering its effectiveness on the one hand, and
most of them may not be aware of testing theories on the other. Those tests often fail to
measure accurately whatever they are intended to measure.
Moreover, test content is sometimes found to be unrelated to the objectives of the
course. They are likely to fail to measure some language skills such as speaking and
listening. There is no listening part in the final achievement tests. Also, teachers have no
chance to test learners’ speaking ability otherwise measuring the way of pronunciation of
students is done by phonetics section of the test which seems to be not accurate to measure
the students’ speaking skill.
More importantly, students are often clear about the test formats. As the result,
most teaching practice and class activities are accordingly test oriented, and what will not
be tested might be left uncovered. Students apparently shape an attitude of learning for
testing and for grading only.
20
For these reasons, the final achievement teat at NQHS has never been empirically
evaluated by test users. This study is the first attempt to explore test users' opinions of the
test. To provide background information, table 1 shows the structure of the test.
Table 1: The English final achievement tests have been constructed as follow:
Time allowance: 60 minutes
(50 multiple choice questions in total)
Part Questions Total Marking scale
A. Phonetics I. Multiple choice

tool. The overall purpose of the survey is to investigate the perception of teachers (test
makers at the English group) and the 12
th
form students (test takers) at NQHS of the
existing final achievement test based on the criteria of a good test mentioned in (2.3).
However, the major focus is on terms of validity.
It is expected that the result of the survey would help to:
21
(1) find out the differences and similarities in teachers and students’ evaluations
towards the test validity.
(2) achieve more accurate measures of students’ achievement with reference to the
training objectives.
As far as concerned, the key methods applied for reliable information are survey
questionnaires and interviews. Those methods help to collect and confirm different kinds
of data. However, each has its own advantages and disadvantages.
3.2.2. Survey questionnaires
Two sets of survey questionnaires were administered with the assistance of 15
teachers and 200 12
th
-form students at NQHS.
The first objective is to find out how these subjects evaluate the current final
achievement test for 12
th
form students based on the criteria of a good test and to compare
these responses in order to figure out what are similar and what are different and
recommendation to narrow the gaps. The survey also aims at collecting teachers’ attitudes
and suggestions towards the improvement and designing of the final achievement test for
12
th
form students.

research purposes are of limited utility in getting at the causes of problems or possible
solutions. Accordingly, it should need the aids of other methods.
3.2.3. Interviews
The interviews with the teachers of English group and 12
th
form students just
finished the English final achievement test administered by the school for information. The
interviews are primarily based on the initial analysis of the valid questionnaires to classify
any vague information from the questionnaire. I took notes of the interviews which were
analyzed in triangulation with the questionnaire data. The strong point of this method is
that experiences, opinions and drawings are exchanged much more openly and directly.
Nevertheless, some informants are shy and afraid of expressing their own ideas. That may
lead to the difficulty collecting some sensitive information.
3.3. DATA ANALYSIS
This section deals with the data collected from a survey on both the teachers and
students concerning their evaluation on the current final achievement test for 12
th
form
students given at the end of the school year.
3.3.1. Data analysis of students’ survey questionnaires and interviews.
Two hundred questionnaires (see the questionnaire in appendix 1) were
administered to two hundred 12
th
form students of NQHS.
The author intends to collect data in stratification with an aim to classify the
differences in perceptions of the test among students themselves. Therefore, the student
population was divided into two separated groups. Group 1 consists of 100 students whose
main subjects are math, physics and chemistry. They learn basis English only and are
23
assumed to be the A stream (A). The other group with 100 students who learn advanced

D
%
1. The test measures what the students
have been taught.
0 0 73 14 12 5 10 48 5 33
2. The task types given in the test are
familiar to the students.
0 0 0 0 15 3 56 68 29 29
3. Time allowance for this test is
enough.
15 2 53 11 9 4 10 57 13 26
4. The weighting demonstrated on the
marking scale of the test is appropriate.
24 44 37 24 13 7 15 13 11 12
5. In terms of test item format, this
English test mainly intends to measure
the students’ grammar and vocabulary
knowledge.
0 0 0 0 25 7 63 72 12 21
Table 2 shows the information collected with 6 questions in which students are
asked to state their views towards whether or not the test relates to what they have been
taught (Q1), if the task types given in the text are familiar to the students (Q2), their opinions on
the time allowance (Q3), the marking scale of the test (Q4), the main knowledge is measured
through the test (Q5), and if the result of the test can encourage them to learn better (Q6).
As shown in table 2, 48% and 33% of the students from D stream agree and
strongly agree that the can measure what they have been taught while 14% of students
disagree and 5% have no ideas. In contrast, only 15% of the students from the A stream go
along with this idea and the number with opposite views is 73%. It is obvious that the test
is quite easy for students of D stream and it doesn’t evaluate their real ability. However,
24

D
%
A
%
D
%
A
%
D
%
A
%
D
%
A
%
D
%
1. The Grammar and Vocabulary part
is long enough and related to what
students have been taught.
4 23 5 29 7 4 68 24 16 20
2. A student who is given a high score
in the grammar and vocabulary section
of the final achievement test is the one
who has good grammar and lexical
5 13 9 20 8 33 55 21 23 13
25


Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status