a corpus-based analysis of the collocates of the word homeland in the 1990s, 2000s and 2010s = nghiên cứu đồng định vị của từ homeland qua các thập niên 1990, 2000 và 2010 trên cơ sở ngôn ngữ học khối liệu - Pdf 25

i

VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
FACULTY OF POST-GRADUATE Triệu Tuấn Anh

Title: A corpus-based analysis of the collocates of the word “homeland” in the
1990s, 2000s and 2010s.
(Nghiên cứu đồng định vị của từ “homeland” qua các thập niên 1990, 2000 và 2010
trên cơ sở ngôn ngữ học khối liệu)
Major: English Linguistics
Code: 60.22.15
Supervisor: Assoc. Pro. Tran Xuan Diep
Hanoi, Sep 2013
ii

ACKNOWLEDGEMENT

The fulfillment of this thesis would not have been possible without the support,
assistance, and encouragement of a number of people.
First, I owe my deepest gratitude to my supervisor, Assoc. Prof. Dr Trần Xuân
Điệp, for his valuable guidance and advice throughout every stage of this study. His

noun in the 1990s to refer to the geographic space related to a particular group
whereas it was mainly used as a noun or adjective to modify the word “security” or to
refer to a political department in the 2000s and 2010s
iv

TABLE OF CONTENTS

Part I: Introduction

1. Rationale of the study
1-2
2. Objectives of the study
2
3. Scope of the study
2

1. The frequency of the use of the word “homeland”
18-19
2. The meanings of the word “homeland”
19-30
3. Discussion
30-31
4. Implications
32-33
Part III: Conclusions
34-35
References
36

1
PART I:
INTRODUCTION
I. Rationale of the study
It is clearly seen that the homeland is the common topic in people‟s
conversations; especially, it is an endless inspiration for the authors and the
writers. Although one may travel or have to live in different places all over the
world, the homeland still plays an important part and exerts such certain
influences on his life as the appearance or the characteristics, and that is the
place that one often feels the most comfortable. This fact can explain why it is
normally said that “One‟s homeland is even greater than the heaven.”
To the American, the homeland has a great importance because the
American have different values from people from other countries, and they
seem to be proud of their country. This importance is normally expressed
through language. Meanwhile, language is shown in corpora where not only
various forms of language but also a significant number of written and spoken
texts are stored electronically. Studying linguistic features of texts discloses the

decades?
III. Scope of the study
As the title of this paper suggests, the aim of the research is exploring
the collocates of the word “homeland” over three periods of time. There exist
so many corpora in the world now; therefore, the writer of this paper has little
intention of employing all the corpora available. Instead, he merely analyzes
the collocates based on the data in three selected corpora, namely COCA and
Time Magazine Corpus. The data of these two corpora are gathered from both
spoken and written language through different sources.
Furthermore, the use of each word may stay unchanged all the time, or it
may change over time. However, the writer of this paper does not wish to look
3
at the trend over many periods of time, but only the use of the word in the
1990s, 2000s and 2010s are explored.
IV. Design of the study
The study includes three parts which are as follows:
1. Part I: Introduction. This part aims at providing the readers with basic
information including rationales, objective of the study, scope of the study
and its design.
2. Part II: Development:
 Chapter 1: Literature review: this chapter presents what other linguists
have done before related to the field.
 Chapter 2: Theoretical background: This part serves to provide the
theory to the study, which pays attention to corpus linguistics and
collocation analysis
 Chapter 3: Methodology. This chapter introduces the subjects of the
study, the research approach, the instrument of data collection and
procedures implemented in the study.
 Chapter 4: Findings and Discussion. This is considered the most
important part of any research. This chapter will show which words

writing) from 1931, 1961, 1991 and 2006. He investigated terms related to male and
female pronouns, man, woman, boy and girl, gender-related profession and such role
nouns as chairman, spokesperson and policewoman, and terms of address as Mr and Ms.
The writer finally drew the conclusion that there were some reductions in frequencies of
male terms, particularly decreases of male pronouns and Mr. It was also found that while
there were some reductions in gender stereotypes, others were being maintained (such as
a lack of adjectives associated with women‟s success or power). Additionally, the term
5
“girl” was still more likely than the term “boy” to refer to adults, and it was often used in
a sexual way.
Fang (2008) conducted the research discussing the meaning of the text segment
international community in two different discourse communities: GuCorpus (British) and
PdCorpus (Chinese), which are somehow typical for two discourse communities in
Western and Asian countries. By exploring the different collocates and grammatical
structures within each community, he could figure out the different ways in which the
phrase was used.
These studies mentioned above have proposed outstanding findings which again
confirm the fact that the meaning of a word can only be understood and interpreted
through its collocation collected by a corpus of authentic data.
The writer of this paper has found that despite the availability of a huge number of
research papers employing corpus linguistics approach, no corpus-based study affiliated
with homeland has been conducted before. This paper, hence, is carried out aiming at
filling that gap. The data used for analysis will be taken from the authentic data.

web-based format.
Although each scholar has a different view of the definition of the corpus, many
of them share the same following characteristics of the corpus:
- The language must be authentic rather than made-up.
- The collection of data must be principled.
- The corpus is electronically saved.
7
2. Notable corpora
There are a huge number of corpora thanks to the development of science and
technology. Wynne and Prytz (2012) illustrate some types of corpora and some
famous English examples as shown in the following table:

Types of
corpora
Features of
the corpus
Examples
Balanced,
represent
ative
Texts
selected in
pre-defined
proportions
to mirror a
particular
language or
language
variety
Brown family

Parallel
Same texts
in two or
more
languages
OPUS: Open source parallel corpus

- Access to aligned corpora, mainly EU texts
- Unknown size
ENPC: English-Norwegian Parallel Corpus
/>c
- Originally English and Norwegian originals
with Norwegian and English translations,
now also German, Dutch and Portuguese
- 50 text extracts in each direction, fiction and
non-fiction
Compar-
ble
Similar texts
in two, or
more
languages or
language
varieties
ICE: International Corpus of English

- Different varieties of English
- 50% spoken
- Some freely available
ICLE: International Corpus of Learner English

Air Traffic Control Speech Corpus
/>ews_2008_1_ATCOSIM.html
Lampeter Corpus of Early Modern English Tracts
/>MPHOME.HTM
- Historical, written
- Tracts published between 1640 and 1740
- Six domains, ten decades
- 120 different texts, 1.1 million words

Types of corpora and some famous English example

3. Corpus linguistics
MrEnery and Wilson (1996) define corpus linguistics as “the study of
language based on examples of real life language use”. However, unlike
qualitative approaches to research, corpus linguistics uses bodies of electronically
encoded text, implementing a more quantitative method.
Bennett (2010) provides a simpler definition of corpus linguistics, that is
“corpus linguistics approaches the study if language in use through corpora. A
corpus is large, principled collection of naturally occurring examples of language
10
stored electronically. He also states that corpus linguistics, in short, serves to
answer two fundamental research questions:
 What particular patterns are associated with lexical and grammatical
features?
 How do these patterns differ within varieties and registers?
Biber, Conrad and Reppen (1998) identify four main features of corpus
linguistics as follows:
 It is empirical, analyzing the actual patterns of language use in natural
texts.
 It utilizes a large and principled collection of natural texts as the basis for

habitual or customary places of that word.”
According to Manning (1999), a collocation is “an expression consisting
of two or more words that correspond to some conventional way of saying things”.
Likewise, Lewis (2000) defines that “a collocation is two or more words that tend
to occur together.”
Although each linguist has different viewpoints, they all share the same
point that a collocation is the regular combination of lexical items. Benson (1985)
points out that lexical collocations include:
 Verb + noun (Eg: to do homework)
 Adjective + noun (Eg: a big deal)
 Noun + verb (Eg: alarms go off)
 Noun of noun (Eg: a bar of chocolate)
 Adverb + adjective (Eg: terribly sorry)
 Verb + adverb (Eg: affect deeply)

12
6. Collocation analysis
Baker (2006) builds up a clear model of step-by-step guide to collocation analysis:
1. Build or obtain access to a corpus.
2. Decide a search term, bearing in mind that the terms can be expanded to
include plurals or other forms, euphemisms, anaphora or relevant proper
nouns.
3. Obtain a list of collocates.
4. Decide how many collocates you want to look at.
5. Can the collocates be grouped semantically, thematically or grammatically?
Use this as a basis for the order in which you analyze the words in more detail.
6. Obtain concordances of the collocates and look for patterns within the context.
This should enable you to uncover dominant discourses surrounding the
subject.
7. Consider contesting discourses- concordance lines which go against or

Additionally, because of its design, this corpus seems to be suitable for users to
look at how language has changed over a period of time. The texts in this
corpus come from various sources:
 Spoken: (95 million words) Transcripts of unscripted conversations from
morethan 150 different TV and radio programs (Examples: All Things
Considered, Newshour, Good Morning America, Today Show, 60 Minutes,
Hannity and Colmes or Jerry Springer)
 Fiction: (90 million words) Short stories and plays from literary magazines,
children‟s magazines, popular magazines, first chapters of first edition
books from 1990 to present, and movie scripts)
 Popular magazines: (95 million words) Nearly 100 different magazines,
with a good mix (overall and by year) between specific domains (news,
health, home and gardening, women, financial, religions, sports). A few
examples are Time, Men’s Health, Good Housekeeping.
14
 Newspapers: (92 million words) Ten newspapers from across the US,
including: USA Today, New York Times, and Allanta Journal Constitution.
In most cases, there is a good mix between different sections of the
newspaper, such as local news,opinion, sports and financial.
 Academic journals: (91 million words) Nearly 100 different peer-reviewed
journals. They were selected to cover the entire range of the Library of
Congress classification system
Time Magazine Corpus consists of more than 100 million words of
American English from 1923 to present, as found in Time Magazine. The
Time Magazine Corpus allows users to easily look at:
 The overall frequency over time of words and phrases that were related to
changes in society and culture or historical events such as: new age,
political correct, email, global warming.
 Changes in the language itself, such as the rise and fall of words and
phrases like beauteous, nifty or freak out. Changes with grammatical

were followed:
1. Collect data from website: americancorpus.org
 In the DISPLAY section, tick the box KWIC (key word in context)
 In the SERCH STRING section, type the word “homeland” in the box
WORD
 In the box COLLOCATE, enter the number 1 and 1, which means one
word before and after “homeland” will be hightlighted for easier
analysis
 In the box SECTION, choose 1990s, 2000s, 2010s respectively, which
means the collocates of the word “homeland”in these periods of time
will be on display.
 Finally, press the button “search”, and the data were displayed in the
form of a table.
16
2. Similar steps were conducted in the Time Magazine Corpus at
www.corpus.byu.edu/time
3. After all the data from two corpora had been collected, the top collocates in
each corpus were analyzed through texts, and a comparison between the
results from two corpora was made. And then, the research would be
concluded with how the meaning of the word “homeland” had changed
through three selected periods of time.

decade. This figure continued to rise slightly to 10.4 words in the 1940s. There
was a small fall to 8.6 per million in the 1950s before a slight increase by 1.3
per million in the 1960s. The next three decades experienced a fluctuation
around 17 per million before a significant increase to35.8 per million was
recorded in the first decade of the 21
st
century. It is obvious that in the last
decade, the use of the word “homeland” increased greatly.
Here is another bar chart describing the changes in the use of the word
“homeland” through COCA corpus. The changes in the use of the word “homeland” through COCA corpus

As can be seen in the right-hand side of this picture, the times of the
use of the target word was only 8.32 per million in the first half of the 1990s.
This figure went down slightly to 7.15 per million in the other half of this
decade before rising remarkably to 22.53 per million in the next decade, and
this number tended to remain stable for 5 years from 2005 to 2009. Overall,
19
there was an upward trend of the use of the word “homeland” in the period of
80 years from 1920 to 2000. And this word tends to be used frequently in this
decade. The data from COCA corpus have partly confirmed the accuracy of
this trend.
Additionally, thanks to the benefits of COCA corpus, we can
understand in which register this word has been most used. The word
“homeland” was used most in spoken language, with the average of 24.51 per
million. This word was used less in the register of newspaper and academic
texts, 17.94 and 17.99 per million respectively. On the other hand, this word
was rarely used in the language of fiction and magazine.

17
8
Jewish
17
20
9
My
14
10
ancestral
13

(The collocates of “homeland” in the highest frequency in the 1990s)

The following examples are the extracts from the articles in the 1990s.
Example 1:
“The government says over 200 civilians were killed; others say the number
could be around 1,000. Chai Ling survived the killings, only to become a
hunted criminal, branded by the government as China's leading female criminal
counterrevolutionary. As her fellow demonstrators across China were arrested
and jailed by police, Chai Ling went underground for 10 months, moving from
place to place, hidden, she says, by ordinary people and even a few Communist
Party members and officials. Early this month, she and her husband, also a
fugitive, turned up outside their homeland. She reasserted her dedication to
the struggle for democracy, quoting a Chinese poet: " I am not a hero in a time
when there are no heroes, but I will never fall down so that the murderers'
knives could block the wind of freedom. " I'm James Walker for
Nightline KOPPEL When we spoke to Chai Ling in Paris earlier today, we
discussed her escape and also the spirit of the people in the country she left
behind. Our interview with Chai Ling, when we…”

Example 3:
“Lephing ($ 3.50) might appeal to more specialized tastes, but it's by far the
most interesting dish on the menu. Made from mung beans, it looks like a giant
white creme caramel. It has an almost translucent appearance and is flecked
with bits of chive. It's surrounded and topped with a soy-based sauce that has a
powdery heat from dried chiles. The cool, gelatinous texture goes only partway
in taming the spiciness; the thick pitalike bread that comes with it is a much
more effective weapon. # Gyatso says that in his homeland people eat mainly

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

a corpus-based analysis of the collocates of the word homeland in the 1990s, 2000s and 2010s = nghiên cứu đồng định vị của từ homeland qua các thập niên 1990, 2000 và 2010 trên cơ sở ngôn ngữ học khối liệu - Pdf 25

Tài liệu, ebook tham khảo khác

Học thêm