THIẾT kế một bài THI ĐÁNH GIÁ kết QUẢ học tập môn TIẾNG ANH - Pdf 10

DECLARATION
I certify this minor thesis of the Study Project entitled:
Designing an English achievement test for the first - year non-English major students
in Son La Teachers’ Training College
To total fulfillment of the requirement for the degree Master of Arts.
Son La, February 2009
Nguyễn Thị Ngọc Thuý
1
ACKNOWLEDGEMENTS
To carry out this MA coursework I am indebted to many people for their
encouragement, cooperation, and advice.
First and foremost, I would like to express my deepest gratitude to Mr. Vũ Văn
Phúc, my supervisor for his useful advice, insightful ideas, and dutiful supervision.
I also would like to take this opportunity to express my thanks to all the colleagues
in the English department in STTC (Son La Teachers’ Training College), for their help in
answering questions in surveys, direct interviews, for their constructive suggestions about
this research.
I would like to give my special thanks to the students of the first year at STTC who
have actively participated in doing the sample test, the surveys and responding to my
interviews.
Last but not least, my sincere thanks go to my family, my classmates, my friends,
especially my husband who encouraged and helped me to carry out the thesis.
2
ABSTRACT
Testing plays a very important role in teaching process which helps teachers to
assess their teaching as well as their students’ learning. Evaluating a test in terms of its
qualities such as reliability and validity is very necessary as to ensure the usefulness of this
assessment instrument. However, this issue receives little considerations from teachers at
Son La Teachers’ Training College. Thus, this minor study was designed to evaluate two
qualities, reliability and validity, of the achievement test for the first year non-major
English students at Son La Teachers’ Training College.

Table 6 The standard derivations of 5 scales in the test
Table 7 The interpretation of the item difficulty of the test
Table 8 The result of item discrimination of the test
Table 9 The coefficient alphas of the 5 scales in the test
FIGURES
Figure 3.3.4.1 Histogram of score distribution
TABLE OF CONTENTS
Page
Declaration i
Acknowledgement ii
Abstract iii
List of abbreviation iv
List of tables and figures v
Tables of contents vi
CHAPTER 1: INTRODUCTION 1
1.1 Rationale 1
1.2 Scope of the study 3
5
1.3 Aims of the study 3
1.4 Methods of the study 4
1.5 Research questions 4
1.6 Design of the study 4
CHAPTER 2: LITERATURE REVIEW 5
2.1 Basic concepts of testing 5
2.2 Types of tests 5
2.2.1 Proficiency tests 5
2.2.2 Achievement tests. 6
2.2.3 Diagnostic tests 8
2.2.4 Placement tests 8
2.2.5 Progress tests 8

3.2 The current testing situation at STTC 24
3.3 The proposed construction of the achievement test for the first year
students at STTC 25
3.3.1 Test objectives 25
3.3.2 The Paper Specification Grids for the 2
nd
Term Achievement Test 25
3.3.3 Data collection 26
3.3.4 Interpretation and test score analysis: 27
3.3.4.1 The frequency distribution 27
3.3.4.2 The central tendency 27
3.3.4.3 The dispersion 28
3.3.5 Test items evaluation 28
3.3.5.1 The item difficulty 28
3.3.5.2 The item discrimination 30
3.3.6 Estimating the reliability of the test 32
3.4 Teachers’ and students’ comments 32
CHAPTER 4: CONCLUSION 34
4.1 Summary of the study 34
4.2 Limitation 34
4.3 Suggestion for further study 35
REFERENCES -1-
APPENDICES -3-
Appendix 1: The Sample of The 2
nd
term Achievement Test -3-
Appendix 2: The result of test analysis using ITEMAN software
-8-
7
CHAPTER 1: INTRODUCTION

effect on teaching and learning and too often they fail to measure accurately whatevaer it is
8
they are intended to measure.” This coupled with the fact that teachers frequently lack
formal training in educational measurement techniques and they tend to be alienated from
the testing process. They regard it as a necessary evil, an intrusion on their regular
instructional activities.
At present, English tests at Son La Teachers’ Training College (STTC) have the following
characteristics:
- It has not been given appropriate attention and careful study
- Its role in teaching and learning has not been fully recognized.
- Almost language teachers think that teachers should be responsible for making tests
because testing is one part of teaching and learning activities that students have to pass.
- There has been a tendency using commercial (ready-made) tests rather than teacher self-
made tests since commercial tests are very convenient and do not take much time to
construct. Thus these selected tests may not be relevant to the objectives of the course.
- Test content is sometimes found to be unrelated to the objectives of the course and very
often many test items in some tests have not been dealt with classes.
- Students have complained that there is still a big gap between what is taught and what is
tested. An instance for this would be the case when some tests designed for pre-
intermediate level are given to students of elementary level. They are so difficult that only
few students can accomplish. Therefore, such tests are not valid and reliable.
- Using tests exclusively for grading, there is no feedback about the tests.
- There has been no discarding of bad tests or bad items. Some items are found to be so
difficult that few testees could do whereas there are test items, which are so easy that all
testees can obtain the correct answers. Such items should be discarded or replaced.
- Moreover, due to the fact that the writing and reading comprehension tests at the
university are totally designed with multiple choice techniques so students can easily cheat
by asking and copying answers from their classmates.
- Apart from those carefully designed tests, some others are still of low and poor quality
and these do not accurately measure the students' real ability. Perhaps the test writer only

2. To investigate the teachers’ suggestions and students’ suggestions for improving
testing situations and language tests at STTC.
10
3. To propose an achievement test construction for the first-year students at STTC
and a sample test will be designed based on the proposed test construction.
4. To offer some practical recommendations for improving of testing situation at STTC.
1.4 METHODS OF THE STUDY
In order to achieve the above aims, a study has been carried out with the following
approach. Basing on the theory and principle of language testing, major characteristics of a
good test, especially achievement tests, the author analyzes the results of the sample test,
and the survey questionnaire done on 10 English teachers of the English major students at
STTC. Many other methods, such as interviews, informal discussion with students,
teachers, and classroom testing observation are also employed to get more needed
information.
1.5 RESEARCH QUESTIONS
The research questions of the study are as follows:
1. What should be done to improve the English testing situation for the first-year
students at STTC?
2. Which test components are considered appropriate for the English Achievement
test construction at STTC?
1.6 DESIGN OF THE STUDY
The minor thesis is organized into four chapters
Chapter one is the introduction consisting of the rationale, the aims, the method, the
research questions and the design of the study.
Chapter two presents the literature review on the basic concepts of testing, types of tests
and characteristics of good tests, the test items, test item types of language components and
language skills.
Chapter three, which is the main part of the study, shows the analysis of the finding of test
designing and some brief comments from teachers and testees.
Chapter four deals with some suggestions to improve the test and the summary of the

course test takers may have followed. It is rather based on a specification of what they
have to be able to do in the language to meet the requirement of their future aims.
Other test specialists, such as Carroll and Hall (1985), Harrison (1986) and Henning (1987)
share the same view that proficiency test helps both teachers and learners know whether
the learners can be able to follow a particular course or they have to take some pre-
departure training to some other popular tests such as TOEFL, IELTS, which are used to
test students’ proficiency for their study in some English speaking countries. In Vietnam
proficiency tests are of different levels namely A, B, C for workers, engineers, teachers,
architects, etc.
2.2.2 Achievement Tests
As it has been mentioned above, not many teachers are interested in proficiency tests since
it does not base on any particular course book. (Hughes, 1990:10) states: “In contrast to
proficiency tests, achievement tests are directly related to language courses, their purpose
being to establish how successful individual students, groups of students, or the courses
themselves have been in achieving objectives”. Achievement tests are usually carried out
after a course on a group of learners who take the course. Sharing the idea about
achievement tests with Hughes, Brown (1994:259) suggests: “An achievement test is
related directly to classroom lessons, units or even total curriculum”. Achievement tests, in
his opinion, “are limited to a particular material covered in a curriculum within a particular
time frame.” Another useful comment on achievement tests offered by Finocchiaro and
Sako (1983:15) is that achievement types or attainment tests are widely employed in any
language teaching institutions. They are used to measure the amount of degree of control
of discrete language and cultural items and of integrated language skills acquired by the
students within a specific period of instruction in a specific course”. In his book, Harrison
(1983:7) shows: “an achievement test looks back over a longer period of learning than the
diagnostic test, for example, a year’s work, or even a variety of different courses.” He also
points out that achievement tests are intended to show the standard, which the students
have reached in relation to other students at the same level.
There are two kinds of achievement tests: final achievement tests and progress
achievement tests.

is the tester’s responsibility to make clear that it is there, that change is needed, not in the tests.
14
In addition, more formal achievement tests require careful preparation; teacher could feel
free to set their own ways to make a rough check on students’ progress to keep learners on
their toes. Since such tests will not form part of formal assessment procedures, their
construction and scoring need not be purely towards the intermediate objectives on which a
more formal progress achievement tests are based. However, they can reflect a particular
‘route’ that an individual teacher is taking towards the achievement of objectives.
2.2.3 Diagnostic Tests
According to Hughes (1990:13), “Diagnostic tests are used to identify students’ strengths
and weaknesses. They are intended primarily to ascertain what further teaching is
necessary”. Brown (1994:259) proposes, “A diagnostic test is designed to diagnose a
particular aspect of a particular language.” Harrison (1983) remarks that this kind of tests
is used at the ends of a unit in the course book or after a lesson designed to teach one
particular point. This kind of test is reasonably straight-forward to find out what skills are
applied well or badly by the learners. Otherwise, this leads to disadvantage, as it is not so
easy to obtain a detailed analysis of a learner’s command of grammatical structures. In
order to be sure of this, we would need a number of examples of the choice the student
made between the two structures in every different context on which we thought was
significantly different and important enough to warrant obtaining information. Tests of this
kind still need a tremendous amount of work to produce. Whether or not they become
generally available will depend on the willingness of individuals to write them and of
publishers to distribute them.
2.2.4 Placement tests
According to Hughes (1990:14), “Placement tests are intended to provide information
which will help to place students at the stage of the teaching progamme most appropriate
to their abilities. Typically, they are used to assign students to classes at different levels.”
In other words, we use placement tests to place pupils into classes according to their ability
so that they can start a course approximately at the same level as the other students in the group.
2.2.5 Progress Tests

precisely the skills that we wish to measure. If we want to know how well the candidate
can write compositions, we ask them to write compositions. If we want to know how well
they pronounce words, we ask them to speak. The tasks, and the texts which are used,
should be as authentic as possible. There is a fact that the tasks cannot be really authentic.
16
Nevertheless, the effort is to make them as realistic as possible. Direct testing is easier to
design when it is intended to measure the productive skills of speaking and writing since
the very acts of speaking and writing provide us with information about the candidate’s
ability. With listening and reading it is necessary to get candidates not only to listen or read
but also to demonstrate that they have done this successfully. He also indicates several
attractions of direct testing. Firstly, if teachers want to assess pupils’ ability, it is relatively
straightforward to create the conditions, which will elicit the behavior based on judgments.
Secondly, in his opinion at least in the case of the productive skills, the assessment and
interpretation of students’ performance is quite straight - forward. Thirdly, there is likely to
be a helpful backwash effect since practice for the test involves the practice of the skills
that we want to encourage.
By contrast, indirect testing tries to measure the abilities that “underlie” the skills in which
we are interested (Hughes, 1990:15). One section of the TOEFL is considered an indirect
measure of writing ability where the candidate has to identify which of the underlined
elements is erroneous or inappropriate in formal Standard English. Another example of
indirect testing id Lado’s (1961) proposes methods of testing pronunciation ability by a
paper and pencil test in which the candidate has to identify pairs of words, which rhyme
with each other. The main problem with indirect tests is that the relationship between
language performance and skill performance in which we are usually interested tends to be
rather weak in strength and uncertain in nature. We do not know enough about the
component parts of composition writing to predict accurate composition writing ability
from scores on tests that measure the abilities, which we believe underlies it. We may
construct tests of grammar, vocabulary, discourse markers, handwriting, and punctuation.
Still we will not be able to predict accurately scores on compositions even if we make sure
of the representation of the composition scores by taking many samples.

purpose of criterion – referenced tests is to classify people according to the fact that
whether or not they are able to perform some task or set of tasks satisfactorily. Moreover,
the test must match teaching objectives perfectly, so that any tendency of the field of
language measurement, criterion tests possesses two positive virtues: they are helpful in clarifying
objectives and they motivate students to a setting standard in terms of what they can do.
2.2.9 Objective Testing versus Subjective Testing
18
The difference between objective testing and subjective testing is that of scoring. If no
judgment is required on the part of the scorer, then the scoring is objective. A multiple–
choice item test, with the correct responses unambiguously identified, would be a case to
point. If judgment is called for, the scoring is said to be subjective. There are different
degrees of subjectivity in testing. The impressionistic scoring of a composition may be
considered more subjective than the scoring of short answers in response to questions on a
reading tsak. In Oller’s point of view (1979), many tests, such as cloze tests, “lie
somewhere between subjectivity and objectivity”. As a result, many testers are seeking
after objectivity in scoring not only for the sake of objectivity itself, but also for the great
reliability it brings.
2.2.10 Communicative Language Testing
In recent years, in parallel with the development of communicative language teaching
(CLT), communicative language testing has been the focus of a great number of researches
on language testing. Discussions have been centered on the desirability of measuring the
ability to take part in acts of communication. In sum, it is assumed that the main function
of language is to enable people to communicate with each other in society. As a result,
testing language ability is but testing communicative ability (including reading and
listening, the two receptive skills necessary for the process of communication, a two-way
process (Khoa, 1999). Communicative language testing may embrace a number of testing
approaches such as direct versus indirect testing, objective versus objective testing and etc.
Based upon the theory language ability is a complex and multifaceted construct. Bachman
(1991, p.678) proposes the following characteristics or communicative tests: “First, such
tests create an “information gap," requiring test takers to process complementary

posterior statistical validation of whether a test has measured a construct that has a reality
dependence of other constructs.
2.3.1.2 Content validity
The more a test simulates the dimensions of observable performance and accords with
what is known about that performance, the more likely it is to have content and construct
validity. According to Kelly (1978:8), content validity seems “an almost and completely
overlapping concept “with construct validity and for Moller (1982: 68), “the distinction
between construct and content validity language proficiency.” Anastasi (1982: 131) defines
20
content validity as “essentially the systematic examination of the test content to determine
whether it covers a representative sample of the behavior domain to be measured.” She
shows a fact of useful guideline for establishing content validity:
- The behavior domain to be tested must be systematically analyzed to make certain that
major aspects are covered by the test items with correct proportions:
- The domain under consideration should be fully described in advance, rather than being
defined after the test has been prepared.
- The content validity depends on the relevance of the individual test relevance of item content.
2.3.1.3 Face validity
Anastasi (1982:136) points out that face validity is not validity in the technical sense; it
refers, not to what the test actually measures, but to what it appears who take it, the
administrative personnel who decide on its use and other technically untrained observers.
Fundamentally, the question of face validity concerns report and public relations. Lado
(1961), Davies (1968), Ingram (1977), Palmer (1981), and Bachman and Palmer (1981)
have all discounted the value of face validity. If a test does not have face validity though, it
may not be acceptable to the students taking it, or the teachers using it. If the students do
not accept it as valid, their adverse reaction to it may mean that they do not perform in a
way that truly reflects their ability. Anastasi (1982:136) takes a similar line “Certainly if
test content appears irrelevant, inappropriate, silly or childish, the result will be poor co-
operation, regardless of the actual validity of the test. Especially in adult testing, it is not
sufficient for a test to be objectively valid. It also needs face validity to function effectively

the agreement between markers by establishing, and maintaining adherence to, explicit
guidelines for the conduct of this marking. The third aspect of reliability is that of parallel-
forms of a test to be devised. The concept of reliability is particularly important when
language tests within the communicative paradigm one considered. Moreover, Davies
(1968) stresses that reliability is the first essential for any test, but for certain kinds of
language test, they may be very difficult to achieve the appropriate result.
2.3.3 Discrimination
Another important feature of a test is its capacity to discriminate among the different
candidates and to reflect the differences in the performances of the individuals in the
group. The extent of the need to discriminate will vary depending on the purpose of the
22
test. In many classroom tests, for example, the teacher will be much more concerned with
finding out how well the pupils have mastered the syllabus and will hope for a cluster of
marks around the 80 per cent and 90 per cent brackets. Nevertheless, there may be
occurrences in which the teacher may require a test to discriminate to some degree in order
to assess relative abilities and locate areas of difficulty. Here below are the items in the test
should be spread over a wide difficulty level as follows:
- Extremely easy items
- Very easy items
- Easy items
- Fairly easy items below average difficult level
- Items of average difficult level
- Items above average difficult level
- Fairly difficult items
- Difficult items
- Very difficult items
- Extremely difficult items.
2.3.4 Practicability.
A test must be practicable, in other words, it must be fairly straight forward to administer.
The most obvious practical consideration concerning the test is overlooked. Firstly, the

writing skills are multiple- choice items, matching items, editing, dictation, short-answer
items, summary writing, sentence transformation, free writing, compositions and essays,
error-recognition items, ‘broken-sentence’- items. Item types for testing grammar are multiple-
choice items, completion items, matching items, completion items, word transformation.
2.4.2 Language components and language skills.
Linguistics is the study of phonology, syntax, and semantics. The first, phonology, is
concerned with the sound of a language and the way in which these are structured into
segments such as syllables and words. The second, syntax, with the way we string words
together in phrases, clauses, and sentences to build well-formed sentences. Moreover, the
24
third, semantics, with the way we assign meaning to a certain unit of a language in order to
communicate. Each of these has additional levels, phonology is supplemented by
phonetics, the study of the physical characteristics of sound; syntax by morphology is the
study of the structure of words and semantics by pragmatics is the study of the situational
constrains on meaning. The language components we focus on in this minor thesis are
grammar, vocabulary, and phonology. Grammar belongs to syntax. Vocabulary belongs to
semantics. And phonology belongs to phonetics. In addition, the language skills, which we
want to test are reading and writing skills.
2.4.3 The test item types used to evaluate language components and language skills.
Test item types for Reading and Writing skills and Grammar, Vocabulary
Table 1: Test item types for Reading and Writing skills and Grammar, Vocabulary
Reading Writing Gram. and Usage Vocabulary
-Multiple-choice
Item
- Short-answer items
- Cloze items
- Words and sentence
matching
-Picture and sentences
matching

items
- Completion
items
- Transformation
items
- Error-
Recognition
Multiple-choice items
- ‘Broken
sentence’ items
- Pairing and
matching items
- Multiple-choice
items
- Matching
- Word
formation
- Items involving
synonyms
- Reordering
- Definitions
(explaining the
meaning of each
word.)
- Sentence
Completion
- Gap filling
25

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

THIẾT kế một bài THI ĐÁNH GIÁ kết QUẢ học tập môn TIẾNG ANH - Pdf 10

Tài liệu, ebook tham khảo khác

Học thêm