Đánh giá độ giá trị của bài kiểm tra cuối kỳ cho sinh viên không chuyên tiếng Anh năm thứ hai tại khoa Điện – Điện tử, Trường Đại học Sư phạm Kỹ thuật Nam Định - Pdf 26

VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
FACULTY OF POST – GRADUATE STUDIES SUBMITTED BY: TRẦN THỊ THU HƯƠNG
A thesis submitted in partial fulfillment of the requirements
for the degree of Master of Arts
EVALUATING THE VALIDITY OF THE FINAL ACHIEVEMENT TEST
FOR SECOND – YEAR NON – MAJOR STUDENTS AT ELECTRONIC –
ELECTRICAL ENGINEERING DEPARTMENT, NAM DINH
UNIVERSITY OF TECHNOLOGY EDUCATION
(Đánh giá độ giá trị của bài kiểm tra cuối kỳ cho sinh viên không
chuyên tiếng Anh năm thứ hai tại khoa Điện – Điện tử, Trường
Đại học Sư phạm Kỹ thuật Nam Định)
M.A. MINOR THESIS
EVALUATING THE VALIDITY OF THE FINAL ACHIEVEMENT TEST
FOR SECOND – YEAR NON – MAJOR STUDENTS AT ELECTRONIC –
ELECTRICAL ENGINEERING DEPARTMENT, NAM DINH
UNIVERSITY OF TECHNOLOGY EDUCATION
(Đánh giá độ giá trị của bài kiểm tra cuối kỳ cho sinh viên không
chuyên tiếng Anh năm thứ hai tại khoa Điện – Điện tử, Trường
Đại học Sư phạm Kỹ thuật Nam Định) M.A. MINOR THESIS

Field: Language Teaching Methodology
Code: 60 14 10
Supervisor: Phạm Lan Anh, M.A
HANOI, 2011

CHAPTER 3: THE STUDY 20
3.1. English learning and teaching at Nam Dinh University of Technology Education 20
3.1.1. Students’ backgrounds 20
3.1.2. The English teaching staff 20
3.1.3. Objectives of the English course 21
3.1.4. Checklist of the course book 22
3.1.5. Objectives of the final test 23
3.1.6. Difficulty level and discrimination of the final test 24
3.2. English testing at Nam Dinh University of Technology Education 24
3.2.1. Testing situation 24
3.2.2. The current final achievement test 25
3.3. Research methods 26
3.3.1. Survey questionnaire 26
3.3.2. Interview and informal discussion 26
3.4. Data analysis of survey questionnaires and interviews 26
3.4.1. Data analysis of the administration of the test 27
3.4.1.1. Data analysis of the format of the test 27
3.4.1.2. Data analysis of the logistics of the test 28
3.4.2. Data analysis of face validity of the test 29
3.4.2.1. Data analysis of general opinion about the test 30
3.4.2.2. Data analysis of reading comprehension task 31
3.4.2.3. Data analysis of grammar knowledge task 33
3.4.2.4. Data analysis of translation task 34
3.5. Discussion and findings 36
3.5.1. Similarities in teachers and students’ perception 36
3.5.1.1. Test administration 36

viii
3.5.1.2. Face validity 36
3.5.1.2.1. General opinion about the test 36

Table 6: Teachers and students’ comment on students’ reading comprehension ability,
theme and instruction of the reading comprehension task.
Table 7: Teachers and students’ comment on the grammar task.
Table 8: Teachers and students’ comment on the translation task.
Charts
Chart 1: Percentage of teachers and students’ comment on what language ability the test
mainly intend to measure.
Chart 2: Percentage of students and teachers’ opinions on what test items can measure their
true ability.
Chart 3: Percentage of the results which students get in the test.
Chart 4: Percentage of what test tasks students cannot do.
Chart 5: Percentage of teachers and students’ comment on the length of the reading text.
Chart 6: Students’ comment on whether or not the reading text is difficult.
Chart 7: Teachers’ comment on which types of reading skills are expressed in the reading
comprehension task.
Chart 8: Teachers and students’ comment on which students’ ability this translation task
requires.

v
LIST OF ABBREVIATIONS

NUTE: Nam Dinh University of Technology Education
EGP: English for General Purposes
ESP: English for Specific Purposes
1
CHAPTER 1: INTRODUCTION

2
Moreover, test’s validity can be seen as an attempt for improving the test quality.
Being a measure of students’ achievement toward learning objectives, final examinations
must be valid. Validity is one of the characteristics of a qualified test. Therefore,
“Evaluating the validity of the final achievement test for second – year non – major
students at Electronic – Electrical Engineering Department, Nam Dinh University of
Technology Education” is chosen with the hope that the study will be helpful with the
author, the teachers, the test-takers and everyone who is concerned with language testing in
general and validity of an achievement test in particular. Due to limit time in collecting
students’ scores, this study is different from the previous study. The author only focuses on
the face validity of this test. The author hopes that the result of the study can then be
applied to improve the current test and to create a new really reliable item bank. It is also
intended to encourage both teachers and learners in their teaching and learning.
1.2. Scope of the study
The scope of this thesis is limited to a research on teachers’ and test-takers’
evaluation of the existing achievement test in terms of its face validity for the second-year
non-English major students at Electronic – Electrical Engineering Department, NUTE due
to the limitations in time, ability and availability of data. Moreover, it is impossible for the
author to cover all used final achievement tests as well as design a sample achievement test
for second-year students. Instead, only a test specification for test 12 in semester 3 is
presented.
1.3. Aims of the study
Following the scope of the research above, the aims of this research are:
1. To indentify the English teachers and students’ evaluation of the final existing
achievement test (test 12) at NUTE in terms of face validity.
2. To provide suggestions for test designers.
1.4. Methods of the study
In order to achieve the above aims, the study has been carried out as follows:
First, the author goes to library to read theory about assessment and testing,
achievement test with characteristics of a good achievement test and test validity with a

questions. Then, the author gives some solutions to improve the final achievement test.
Chapter 4: Conclusion offers conclusions and proposes some suggestions for further
research on the topic.

4
CHAPTER 2: LITERATURE REVIEW

This chapter provides an overview of the theoretical background of the study. It
includes three main sections. Section 2.1 discusses the relationship between teaching,
learning and assessment. Section 2.2 focuses on the purposes of formative assessment and
summative assessment. Section 2.3 gives a brief description of achievement tests,
characteristics of a good EGP test and ESP test. It is then followed by section 2.4 in which
face validity is focused. Finally, section 2.5 suggests some measures to increase face
validity.
2.1. Relationship between teaching, learning and assessment
In the relationship between teaching, learning and assessment; curriculum and
content standards also play an important role. Curriculum is best characterized as what
should take place in the classroom. It describes the topics, themes, units and questions
contained within the content standards. Content standards are the framework for
curriculum. Curriculum can vary from programs to programs, as well as from instructors to
instructors. Unlike content standards, curriculum focuses on delivering the “big” ideas and
concepts that the content standards identify as necessary for the learner to understand and
apply. Curriculum serves as a guide for instructors; addressing teaching techniques,
recommending activities, scope and sequence, and modes of presentation considered most
effective. In addition, curriculum indicates the textbooks, materials, activities and
equipment that help learners achieve the content standards best. In the teaching and
learning process, assessment is a tool to give the nature of evidence required to
demonstrate that the content standards have been met. To ensure valid and reliable
accountability, the assessment selected should test the state standards. Clearly, assessment,
curriculum and content standards have close relationship; assessment is the basis to give

and general of tests. Oller (1979: 1) defines language test as an instrument that attempts to
measure the extent to which students have learned in a foreign language course. From the
two definitions, this research agrees that language test is a set of instruments in forms of
questions and problems whose function is to measure an individual student’s language
abilities and knowledge in relation to a foreign language that he or she has learned.
Language test is a useful instrument with which educators can obtain reliable and
valid information on their students’ language abilities. Teachers can monitor and evaluate
student learning and indentify students’ strengths and weaknesses to clarify what they
really need to know. Students’ test results can become an important feedback on how well
an English course has been taught or learned and a necessary feedforward for the students
in the beginning of the English courses. Feedback and feedforward are very important in
the teaching and learning process. The author expresses the relationship between feedback

6
and feedforward through an example of catching a ball. When we move to catch a ball, we
must interpret our view of the ball’s movement to estimate its future trajectory. Our
attempt to catch the ball incorporates this anticipation of the ball’s movement in
determining our own movement. As the ball gets closer, or exhibits spin, we may find it
departing from the expected trajectory, and we must adjust our movement accordingly. It
means that feedforward will help teachers to give the anticipated problems at the beginning
of the course which students can have in the learning process so that students can feel more
confident to avoid the problems and study more effectively. Whereas feedback will help
teachers to adjust the teaching method reasonably so that students can get the best results.
Feedback also helps the teacher to evaluate the effectiveness of the syllabus as well as the
methods and materials he or she is using. Test results become a feedback on the curriculum
that have been developed and implemented.
In addition, testing may bring many impacts on teaching and learning. Hughes (1989:
01) calls the effect of testing on teaching and learning as “backwash”. He appreciates the role of
backwash in the teaching-learning process. Backwash can be harmful if the test content doesn’t
go with the objectives of the course. It leads to the problem of teaching in one way and testing in
Figure 2: The Scope of Impact of Language Tests
Obviously, the importance of testing can not be denied. In detailed, this research
focuses on testing English for Specific Purposes Testing (ESP). ESP has been playing an
important role in teaching and learning ESP at universities now. From the early 1960s, ESP
has grown to become one of the most prominent areas of English foreign language
teaching. This development is reflected in an increasing number of publications,
conferences and journals dedicated to ESP discussions. Similarly, more traditional general
English courses gave place to courses aimed at specific areas, for example, English for
Business Purposes. In addition to the emergence of ESP, a strong need for testing of
specific groups of learners was created. As a result, ESP testing movement has shown a
slow but definite growth over the past few years. Obviously, ESP testing and EFL testing
are very indispensable in the teaching and learning process.
On
an
Individual
student
On
student
and
teachers
On
student,

skills, attitudes and beliefs. There are many assessments collected in a course such as:
continuous assessment, formative assessment, summative assessment, peer-assessment,
self-assessment and so on. However, in this research the author will focus on the
relationship between two main kinds of assessment: formative assessment and summative
assessment. "As coach and facilitator, the teacher uses formative assessment to help
support and enhance student learning. As judge and jury, the teacher makes summative
judgments about a student's achievement " (Atkin, Black & Coffey, 2001).
Formative assessment is designed to provide feedback and feedforward to students
and instructors for the purpose of the development of teaching and learning. From a
student's perspective, formative assessment provides information on a student's
performance, how they are progressing with the skills and knowledge required by a
particular course and the problems which they will have in a course. Generally the results
of formative assessment do not contribute to a student's final grade but are purely for the
purpose of assisting students to understand their strengths and weaknesses in order to work
towards improving their overall performance. From an instructor's perspective, formative
assessment is a diagnostic tool that can be used to evaluate the effectiveness of course and
curriculum design. Formative assessment has the potential to highlight areas in which
teaching and curriculum design needs to be improved as well as any areas where teaching
methods have been very effective in improving student. The sample tests in this kind are
diagnostic test and placement test. Placement test is used at the beginning of a course to
indentify a student’s level of language and find the best class for them. Diagnostic test is
used to identify problems that students have with language. The teacher diagnoses the

9
language problems students have. It helps the teacher to plan what to teach in future and
provide students with the anticipated problems and solutions.
The purpose of summative assessment is to provide "a sampling of student
achievements which lead to a meaningful statement of what they know, understand and can
do" (Brown & Knight, 1999: 37). Generally summative assessment occurs at the end of a
topic or the end of a course in order to evaluate how well students have acquired the

There are two kinds of achievement tests: final achievement test and progress
achievement test.
Progress achievement tests (short-term achievement tests) are always administered
during the course after a chapter or a term, and often written by the teacher. These tests are
of course based on the teaching program. Hughes (1900:12) claims “these tests are
intended to measure the progress that students are making”. In other words, progress
achievement tests are supposed to help the teachers to judge the degree of success of his or
her teaching and help to find out how much students have gained from what has been
taught. Accordingly, the teachers can identify the weakness of the learners or diagnose the
areas not properly achieved during the course of study. In the other hand, for students, this
test can be regarded as a useful device that provides the students with a good chance to
perform the target language in a positive and effective manner and to gain additional
confidence in doing them. This way can be a good preparative and supportive step towards
the final achievement test for the students because they will get familiar with the tests and
the strategy to do them.
Final achievement tests (longer – term achievement tests) are those administered at
the end of a course of study. They may be written and administered by ministries of
education, official examining boards, or by members of teaching institutions. They are
used to check how well learners have done after a whole course in terms of objective and
content of the course. Therefore, according to Hughes (1990:11), there are two kind of
final achievement tests: syllabus-content approach and syllabus-objective approach.
The syllabus-content approach is based directly on a detailed course syllabus or on
the books and other material used. The test only contains what it is thought that the
students have actually encountered, and thus can be considered, in this respect at least, a
fair test. The disadvantage of this type is that if the syllabus is badly designed, or the books
and other materials are badly chosen, then the results of a test can be very misleading.
Successful performance on the test may not truly indicate successful achievement of course
objectives.
The syllabus-objective approach refers to the one in which the test contents are
based directly on the objectives of the course. This approach has some benefits. First, it

This is non-statistical type of validity that involves “the systematic examination of
the test content to determine whether it covers a representative sample of the behavior
domain to be measured” (Anastasi & Urbina, 1997: 114). A test has content validity built
into it by careful selection of which items to include. Items are chosen so that they comply

12
with the test specification which is drawn up through a thorough examination of the subject
domain. Foxcraft et al. (2004: 49) notes that by using a panel of experts to review the test
specifications and the selection of items the content validity of a test can be improved. The
experts will be able to review the items and comment on whether the items cover a
representative sample of the behavior domain.
Construct validity
A test has construct validity if it accurately measures a theoretical, non-observable
construct or trait. The construct validity of a test is worked out over a period of time on the
basis of an accumulation of evidence. There are a number of ways to establish construct
validity. Two methods of establishing a test’s construct validity are convergent/divergent
validation and factor analysis.
A test has convergent validity if it has a high correlation with another test that
measures the same construct. By contrast, a test’s divergent validity is demonstrated
through a low correlation with a test that measures a different construct.
Factor analysis is a complex statistical procedure which is conducted for a variety
of purposes, one of which is to assess the construct validity of a test or a number of tests.
Face validity
Hughes (1989) defines “a test is said to have face validity if it looks as it is
measures what it is supposed to measure. Anatasi (1982: 136) pointed out that face validity
is not validity in technical sense; it refers, not to what the test actually measures, but to
what it appears superficially measure.
Face validity is very closely related to content validity. While content validity
depends on a theoretical basis for assuming if a test is assessing all domains of a certain
criterion, face validity relates to whether a test appears to be good measure or not.

positively beneficial. If an English test for first year undergraduate students is designed on
the basis of an analysis of the English language needs of these students and which includes
tasks as similarly as possible to those which they would have to perform as undergraduates
(reading textbooks, taking notes during lectures, etc) and administer instead of one which
was entirely multiple choice, then beneficial washback can be achieved. There will be an
immediate effect on teaching and learning the syllabus will be redesigned, new books will
be selected, classes will be conducted differently and students’ way of learning will change
to reflect the demand of the new test.
In a nutshell, the author has just give a common overview about achievement test
and characteristics of a good achievement EGP test so that readers can understand how to
evaluate a good final achievement EGP test.

14
2.3.3. Characteristics of a good ESP test
Nowadays, the ESP teaching and research has achieved tremendous improvement
home and abroad. In the aspect of teaching, it has formed the system of Vocational English
(VE: Business English, Tourism English, Hotel English, Medical English…) and English
for Academic Purposes.
“ESP is not a matter of teaching specialized varieties of English. The fact that
language is used for a specific purpose does not imply that it is a special form of language,
different in kind from other forms. Though the content of learning may vary, there is no
reason to suppose that processes of learning should be any different any different for the
ESP learner than for the general English learner” (Hutchinson, 1987).
From the above view, we acquire two points that ESP is one kind of English, with
its specific language characteristics, which is not applied to teach some particular items,
and the similarity between ESP and EGP is more distinguishable than their difference; the
other is there is no difference in essence in the teaching principles and procedure between
ESP and EGP. In other words, EGP is the premier stage for ESP, and ESP is the advanced
stage of EGP teaching. The testing and evaluation for ESP should be carried out in
accordance with the teaching contents and objectives. Therefore, only with the efficient

candidate, the candidate’s family and members of the public. The test is what students and
parents want and it looks familiar to them. For example, for the past 8 years the Grade 9
exam has used passages, comprehension questions and grammar exercises taken directly
from English 9. Students have prepared for the exam by memorizing the book. This year,
the Foreign Language Specialist writes the exam using parallel texts and exercises, not
taken directly from the book without warning anyone. This test lacks face validity. Face
validity is hardly a scientific concept, yet it is very important. A test which does not have
face validity may not be accepted by candidates, teachers, education authorities or
employers. In favor of this view, Mc Namara (2000: 133) defines face validity as a degree
of language test acceptability for those who are involved in its designing and use. A
language test is said to be face valid only if it satisfies their expectation. Ingram (1977: 18),
as cited by Anderson et all (1995: 289), also agrees that face validity is “surface credibility
or public acceptability”.
Ensuring face validity of a language is important in view that this validation
procedure is one of the major aspects of validity. The procedure of face validation
“involves an intuitive judgment about the test’s content by people whose judgment is not
necessary expert”, as it is mentioned by Anderson et al (1995: 289). Anderson et al (1995:
172) mentions that the process of face validation simply deal with how those people

16
comment on the appearance of the language test, although there may be little attention paid
to the content of test items. Analyzing face validity of an English test is thus an attempt for
gathering people’s opinion on whether the test looks valid as an English test or not.

2.4.2. Relationship between reliability and validity
We often think of reliability and validity as separate ideas but, in fact, they're
related to each other. Reliability and validity are the two vital characteristics that constitute
a good test. However, validity and reliability have a complicated relationship.
If the test is not reliable, it cannot be valid at all. To be valid, according to Hughes
(1988:42), “a test must provide consistently accurate measurements. It must therefore be

As the relationship between reliability and validity shown above, validity is an
indispensable quality of all good tests. Hughes (1982: 22) says that, “the greater a test’s
content validity is, the more likely it is to be an accurate measure of what it is to measure”.
Therefore, from the outset of test construction, test validity should be the most essential
part of all.
Validity of a language test has four facets, namely face validity, content validity,
construct validity and criterion - referenced validity. However, the author focuses on face
validity because of some reasons.
Firstly, the later three facets of validity, content validity, construct validity and
criterion – referenced validity are excluded from this research because of the limitation of
time and source. Anastasi (1982: 136) as cited by Weir (1990: 26) stated that “face validity
is not validity in the technical sense”. Face validation is significant in that it involves in
whether or not the test “looks valid” to those who deal with the test, so the researcher
performs the analysis of face validation. Heaton (1988:60) contributed that “face validity
can provide not only a quick and reasonable guide but also a balance to too great of
concern with statistical analysis.”He points that the students’ motivation is maintained if a
test has good face validity plays a certain role in any test and it is of great concern in this
thesis. According to Anastasi & Urbina (1997: 114), content validity is a non-statistical
type of validity that involves “the systematic examination of the test content to determine
whether it covers a representative sample of the behavior domain to be measured”. Content
validity evidence involves the degree to which the content of the test matches a content
domain associated with the construct. Obviously, content validity has to need a
representative sample test to analyze and compare. According to Bachman and Cohen
(1998: 50), construct validation deals with the “judgmental and empirical justifications
supporting the inferences made from test scores”. Bachman and Palmer (1996: 21) also
mention that construct validation is related to the “meaningfulness and appropriateness” of

18
the researcher’s interpretations relevant to the actual test scores. Bachman (1990: 248)
mentions that criterion – referenced validity deals with demonstrating “a relationship

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Đánh giá độ giá trị của bài kiểm tra cuối kỳ cho sinh viên không chuyên tiếng Anh năm thứ hai tại khoa Điện – Điện tử, Trường Đại học Sư phạm Kỹ thuật Nam Định - Pdf 26

Tài liệu, ebook tham khảo khác

Học thêm