Designing & evaluating an English reading test for the non-majors of Civil Engineering at Haiphong private university - Pdf 78

Nguyen Thi Phuong Thu August 2005
Vietnam national university, hanoi
College of foreign languages
---------------
Designing & evaluating an English reading
test for the non-majors of Civil Engineering
at Haiphong private university
Thiết kế và đánh giá một bài kiểm tra tiếng anh chuyên ngành
cho sinh viên xây dựng dân dụng tại
trờng đại học dân lập hải phòng
M.A. minor thesis
Field: methodology
Code: 50702
Course: k11
By : Nguyen Thi Phuong Thu
Supervisor : Tran Hoai Phuong, MEd.
Hanoi - August 2005
Nguyen Thi Phuong Thu August 2005
Acknowledgements
During the process of further studying and conducting this research I was really
honored to receive guidance, assistance, and encouragement from various lecturers as well as
supervisors among whom I would like to acknowledge my sincere thanks to the leaders of the
College of Foreign Languages who have given me permission and created favorable conditions
for study and research.
I would also like to thank my supervisor, Mrs.Tran Hoai Phuong, Med, who really
sympathized with me and also gave me great help as well as invaluable guidance and
encouragement from the very start to the end of my research.
It is also my pleasure to give my special thanks to the students of classes XD 501, XD
502 and XD 503 at Hai Phong Private University who enthusiastically took part in doing the
test and helped me collect the results of the test.
I also benefited greatly from talks and discussions with my colleagues so let me thank

23. CU The number of the correct asnwers of the upper half
24. CL The number of the correct asnwers of the lower half
25. gd good discrimination
26. md bad discrimination
27. bi bad item
Nguyen Thi Phuong Thu August 2005
28. p Spearman rho correlation coefficient
29. SU Score on the upper half
30. SL Score on the lower half
Nguyen Thi Phuong Thu August 2005
Table of contents
Acknowledgement
List of abbreviations
Part I: Introduction
1.Rationale
2.Aims of the study
3.Scope of the study
4.Methods of the study
5.Design of the study
Part II: Development
Chapter one: Literature review
1.1.Language testing
1.2.Communicative language tests
1.3.Testing reading skills
1.3.1.Multiple choice questions
1.3.2.Short answer questions
1.3.3.Cloze
1.3.4.Selective deletion gap filling
1.3.5.C tests
1.3.6.Coloze elide

3.5-Marking the test
3.6-Test scores interpreting and evaluation
3.6.1.The frequency distribution
3.6.2.The central tendency
3.6.2.1.The mode
3.6.2.2.The median
3.6.2.3.The mean
Nguyen Thi Phuong Thu August 2005
3.6.3.The dispersion
3.6.3.1.The low-high
3.6.3.2.The range
3.6.3.3.The standard deviation
3.7-Test item analysis and evaluation
3.7.1.Item difficulty
3.7.2.Item discrimination
3.8.Estimating reliability
Summary
Part III: Conclusion and recommendations
References
Appendices

Nguyen Thi Phuong Thu August 2005
Part I: In troduction
1.Rationale
Testing is a matter of concern to all teachers - whether we are in the classroom or
engaged in syllabus/ materials, administration or research. We know quite well that good tests
can improve our teaching and stimulate student learning. Although we may not want to
become a measurement expert we may have to periodically evaluate student performances and
prepare reports on student progress.
Haiphong Private University (HPU) is a university in which there are a number of

The test takers are non - English - majors.
The specific aims of the research are:
 to assess the learners’ achievement in improving reading skill with English of Civil
Engineering after 120 period reading course.
 to measure their aptitude for the reading skill.
 to diagnose their strength and weakness in reading the subject matter.
 to find out whether or not the test satisfies the qualities of a good test. From there
the test will measure the effectiveness of the teacher’s teaching. If the test is not a
good one, some suggestions will be made for a better test form.
3.Scope of the study
“Not all language tests are of the same kinds. They differ with respect to how they are
designed, and what they are for; in other words, in respect to test method and test purpose.”
(Mc Namara, 2000: 5). For example, in terms of method, there are paper-and-pencil language
tests, performance tests, ect. And in terms of purpose, there are achievement tests, proficiency
test, and so on. In fact, the same form of test may be used for different purposes, although in
other cases the purpose may affect the form.
Due to the limitation of time and ability, it is impossible for the author to design tests
of all these types or of all the four language skills (speaking, writing, listening and reading).
Nguyen Thi Phuong Thu August 2005
Therefore, this minor thesis is limited to designing and evaluating an achievement test of ESP
reading for the non-majors at HPU and the reading tested was for communicative purposes.
4.Methods of the study
In this minor thesis the author designed an achievement test of reading, administered it
and then evaluated it, so the method adopted is quantitative. The data will be collected through
testing the students’ reading ability of Civil Engineering English.
5.Design of the study
The study is composed of three parts:
*Part I is the presentation of basic information such as the rationale, the scope of the study,
the aims of the study, the methods of the study and finally the design of the study.
*Part II includes three chapters:

students’ performance in the subjects. Tests will help us to put them in right places; therefore,
language tests, if used properly, can be considered a valuable teaching device for any teacher,
and they will contribute positively to the development of both teachers and learners. Last but
not least, any researcher who needs measurement of the language proficiency of the subjects
cannot do it without using an already existing test or designing his or her own test.
As for Caroll (1968) a test in general will certainly tell something about a testee’s
characteristics. Thanks to the results from his test, it is possible for a teacher to judge whether
this student is good or bad at the subject tested. Caroll provides the following definition of a
Nguyen Thi Phuong Thu August 2005
test: “a psychological or educational test is a procedure designed to elicit certain behavior
from which one can make inferences about certain characteristics of an individual.” (Caroll,
1968: 46)
According to Hughes (1989: 9), tests can be classified as follow:
 Proficiency tests
 Achievement tests
• Class progress tests
• Final achievement tests
 Diagnostic tests
 Placement tests
 Aptitude or Prognostic tests
 Direct tests versus indirect tests-Discrete- point tests versus intergrative
tests
 Norm-referenced tests versus criterion-referenced tests
 Objective tests versus Subjective tests
 Communicative tests
Generally there are some approaches to tests, for example the essay-translation
approach, the structuralist approach, the integrative approach, or the communicative approach.
However, in this minor thesis, I would like to choose only the communicative approach to
testing. This approach focuses on how the language is used in communication (‘meaning’
rather than ‘form’). This attempts to obtain different profiles of a learner’s performance in the

knowledge of relevant systemic features of language (pronunciation, grammar, vocabulary)
with an understanding of context is deployed. Yet, these tests are regarded as time consuming
and difficult to score. For example for an oral interview, the test will involve comprehension of
extended discourse (both spoken and written), and as a result besides the disadvantages
mentioned above it also requires trained raters.
Because of those disadvantages another type of test, pragmatic test, replaced the old
ones. It focuses less on knowledge of language and more on psycholinguistic processing
involved in language use. With this type, a cloze test was seen the most suitable and was once
believed to be easy to construct, relatively easy to score. However, it soon turned out to be
measuring the same kinds of things as discrete point tests of grammar and vocabulary. It also
failed to test communicative skills.
Nguyen Thi Phuong Thu August 2005
In the early 1970s thanks to Hyme’s theory of communicative competence (an
understanding of language and the ability to use language in context, particularly in terms of
the social demand of performance, i.e. knowing a language is more than knowing its rules of
grammar) communicative language tests developed and it has the two following features:
’They are performance tests which require assessment to be carried out when
the candidate is engaged in communication, either receptive or productive, or both.
They see language as a sociological phenomenon, focusing on the external,
social functions of language while integrative and pragmatic tests see language as an internal
phenomenon. With this test, the use of authentic texts and real world tasks may be developed.’
(Mc Namara, 2000: 16).
One of its distinguishing feature that supersedes other types of tests is that besides systemic
features of language, it requires students’ careful study of the communicative roles and tasks.
All the reasons discussed above are regarded as a strong impetus that initiates this minor thesis
into designing a reading test of ESP for communicative purpose, i.e. it is a communicative
language test.
1.3-Testing reading skills
In a reading test, test items are often set basing on the text itself. And often within the
same test more than one typed of item, maybe two, three or more types of the following items

type of test is an objective method for testing the test takers’ understanding of the texts.
Nguyen Thi Phuong Thu August 2005
1.3.8.Jumbled sentences
This type of test is intended to test the student’s understanding of a sequence of stages
in a process or events in a narrative. A successful student is the one who can reorder jumbled
sentences or unscrambled sentences of a story correctly.
1.3.9.Matching
Like MCQ test, matching is a familiar type of testing reading comprehension. With this
test, candidates are required to identify the relationships between a list of entries in one
column with a list of responses in another column. Candidates may have to match word with
word, sentences with sentence, picture with sentence, etc.
1.3.10.Jumbled paragraphs
Similar to tasks involving jumbled sentences, test tasks with jumbled paragraphs
require students to rearrange the given paragraphs in the correct order. To do this students
have to read through these paragraphs to get the main idea of the whole text. In short, for
testing reading abilities different methods have been recommended and a teacher may use this
one or that one depending on certain purposes. For example, to develop the communicative
nature of tests the use of short answer questions, selective gap filling, C-tests, information
transfer techniques or other restricted response formats are often preferred.
1.4. Major characteristics of language tests
Tests can serve pedagogical purpose, to be sure. The most important consideration in
designing a language test is its usefulness. This can be defined in terms of their qualities such
as reliability, validity, practicality, interactiveness, impact, or authenticity, etc. Among these
the four qualities which will be discussed below are more critical for good tests.
1.4.1. Reliability
Reliability is apparently an essential quality of test values; if the scores of a test are not
relatively consistent, they fail to provide us with the information about the ability we want to
Nguyen Thi Phuong Thu August 2005
measure. Reliability is considered a fundamental criterion against which any language test has
to be judged.

Scores on test tasks
with characteristics A’
Nguyen Thi Phuong Thu August 2005
- provide a detailed scoring key,
- train scorers,
- agree on acceptable responses and appropriate scores at outset of scoring,
- identify candidates by number, not name, and
- employ multiple, independent scoring. (Hughes, 1989: 36-42)
The concept of reliability is particularly important when considering language tests
withinthe communicative paradigm (Porter, 1983). Davies (1965: 14) also shares the same
view but he also admits that ‘reliability is the first essential for any test; but for certain kinds
of language test may be very difficult to achieve.’
1.4.2. Validity
The second quality that affects test usefulness is validity. A test is said to be valid if it
measures what it is intended to measure. Or in other words, the test may be valid for some
purposes, but not for others. For example, if the purpose of a test is to test ability to
communicate in a foreign language, then it is valid if it actually tests ability to communicate. If
the test is full of questions of grammar, then the test cannot be considered valid. Moreover, if a
test is to test reading ability, but it also tests writing, for example, then the test fails to have the
validity for testing reading.
However, it is impossible to say whether a test is valid or not valid at all because there
are degrees of test validity, i.e. this test may be more valid than that one. Therefore, Moore
(1992) defined validity as “the degree to which a test measures what it is supposed to
measure” . There are different types of validity such as content, face, construct, criterion-
related validity, and they will be all discussed below.
1.4.2.1.Content validity
Among different types of validity, content validity is said to be the most important one,
but it is also the simplest. “A test is said to have content validity if its content constitutes a
representative sample of the language skills, structures, etc. with which it means to be
concerned.” (Hughes, 1989: 22). In order to judge whether or not a test has content validity,

Nguyen Thi Phuong Thu August 2005
the same time. And predictive validity concerns the degree to which a test can predict
candidate’s future performance.
1.4.2.4 Construct validity
Like reliability, construct validity is essential to the usefulness of any language test.
The term construct validity is used to refer to the extent to which we can interpret a given test
score as an indicator of the ability(ies) or construct(s), we want to measure. The purpose of
construct validation is to provide evidence that underlying theoretical constructs being
measured are themselves valid. Typically, construct validation begins with a psychological
construct that is part of a formal theory. The theory enables certain predictions about the
construct variable will behave or be influenced under specified conditions. The construct is
then tested under the conditions specified. If the hypothesized results occur, the hypotheses are
supported and the construct is said to be valid. Often this will involve a series of tests under a
variety of conditions.
Test validity is the one that is always paid the most attention to since it is an
indispensable quality of all good tests. When constructing a test, the first thing to be focused
on is test validity. Hughes (1989: 22) agrees that if in a test important parts are not defined or
not presented, it will fail to be accurate. He notes that “the greater a test's content validity is,
the more likely it is to be an accurate measure of what it is to measure.”
1.4.3. Practicality
Another quality of a good test which should not be forgotten is its practicality.
Although it is different in nature from other qualities, practicality is not less important. Unlike
reliability and validity, practicality does not pertain to the uses that are made of test scores, but
primarily to the ways in which the test will be implemented in a given situation, and to whether
the test will be developed and used at all. Practicality often affects a tester’s decisions during
the development of a test, i.e., at every stage of his testing.
Practicality can be defined as ‘the relationship between the resources that will be
required in the design, development, and use of the test and the resources that will be
available for these activities’. (Bachman & Palmer, 1996: 35). This relationship can be
represented as in the figure below:

what has actually been taught.”
In Brown’s point of view “an achievement test is related directly to classroom lesson,
units or even a total curriculum within a particular time frame.” (Brown, 1994: 259). In other
words, an achievement test measures a student’s mastery of what should have been taught. It is
thus concerned with covering a sample (or selection), which accurately represents the contents
of a syllabus or a course book. Unlike progress test, achievement test should attempt to cover
as much of the syllabus as possible. If we confine our test to only part of the syllabus, the
contents of the test will not reflect all that the student has learnt.
Achievement test can be subdivided into class progress tests and final achievement tests.
1.5.1. The class progress test
The class progress test is often conducted during the course and is developed by the
teacher himself after each chapter or each term. He constructs such type of test to judge how
successful his teaching is and also to find out what his students have achieved from his
teaching. The class progress test is a teaching device and can be considered a good chance for
the students to prepare for the final achievement test.
1.5.2. The final achievement test
The final achievement test is more formal and intended to measure achievement on a
larger scale (annual exams, entrance exams, final exams). The final achievement test is not
written and administered by the teacher himself, but maybe by ministries of education, boards
of examiners, or by members of teaching institutions. A final achievement test is often based
on an adopted syllabus and its approach, either syllabus-content approach or syllabus-objective
approach. If the test is based on the former, its contents should be based directly on a course
syllabus or on the textbooks and other materials chosen. If it is based on the latter, its contents
are based directly on the objectives of the course.
Nguyen Thi Phuong Thu August 2005
Summary
In this chapter I have briefly dealt with the concept of a language test, how it is defined
and what is important in designing it. Moreover, I also mentioned the concept of
communicative language ability in which communicative competence was also discussed.
Also, in this chapter the definition of an achievement test as well as testing reading skills were

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Designing & evaluating an English reading test for the non-majors of Civil Engineering at Haiphong private university - Pdf 78

Tài liệu, ebook tham khảo khác

Học thêm