Vietnam national university, hanoi
University of languages and international studies
FACULTY of post- graduate studies NGUYỄN THỊ HIỀN
A COMPARATIVE STUDY ON HOW HESITATION AND
RESERVEDNESS IS EXPRESSED VIA PROSODIC MEANS IN
ENGLISH AND THE EQUIVALENT EXPRESSIONS IN
VIETNAMESE
(Nghiên cứu so sánh sự do dự và dè dặt được thể hiện thông qua phương
tiện ngôn điệu trong tiếng Anh và các hình thức diễn đạt tương đương
trong Tiếng Việt)
M.A. Minor Program Thesis Field: English Linguistics
Code: 60 22 15 HANOI - 2010
Vietnam national university, hanoi
University of languages and international studies
FACULTY of post- graduate studies
ABSTRACT
iii
TABLE OF CONTENTS
iv
LIST OF TABLES
vii
LIST OF FIGURES
vii
PART A: INTRODUCTION
1. Rationale of the study
1
2. Aims of the study
2
3. Scope of the study
2
4. Methods of the study
3
5. Design of the study
3
PART B: DEVELOPMENT
Chapter 1: THEORETICAL BACKGROUND
5
1.1 Literature review
5
1.2 Hesitation and reservedness
6
1.2.1 Definition of hesitation
6
15
2.2 Data analysis
15
2.2.1 Prosodic feature analysis of hesitation and reservedness in the English samples
15
2.2.1.1 Pitch contour
16
2.2.1.2 Duration
18
2.2.1.3 Speaking tempo
19
2.2.1.4 Loudness (or intensity)
21
2.2.1.5 Summary
21
2.2.2 Prosodic feature analysis of hesitation and reservedness in the
Vietnamese samples
22
2.2.2.1 Pitch contour
22
2.2.2.2 Duration
24
2.2.2.3 Speaking tempo
25
2.2.2.4 Loudness (or intensity)
25
2.3 Comparison: prosodic cues for hesitation and reservedness in English
and Vietnamese
26
2.3.1 Similarities
REFERENCES
41
APPENDIX
I vii
LIST OF FIGURES
Figure 1: Praat Editor showing waveform, spectrogram and TextGrid
16
Figure 2: Illustration of pitch contour of “No” on Praat Screen
17
Figure 3: Pitch contour of the sentence “Um…u…no. I don‟t think so. I can‟t
think of anything”
18
Figure 4: Illustration of pitch contour of two hesitation points M and E
23 LIST OF TABLES
Table 1: Summary of vocalized fillers in English and the equivalences in
Vietnamese
9
Table 2: Speaking rate (syllables/second) before and after each pause
20
Table 3: Summary of prosodic features which contribute to the expression of
hesitation
22
communicative ability while disfuency is inversely related. The present study examines the
2
evidence that speech hesitation sometimes supports and enhances communication and
suggests the ways they may be dealt with in the ELT classroom. Finally, there are few
studies of hesitation across languages from the perspectives of prosody and using English.
I do hope that my work can give more insight into the similarities and differences of
hesitation via prosodic means between two languages in which English is a stress-timed
language and Vietnamese is a tonal language.
2. Aims of the study
The main aims of this study are:
- To explore prosodic features in English which express hesitation and
reservedness,
- To provide a brief account of similarities and differences between hesitation and
reservedness expressed via prosodic features in English and Vietnamese,
- To give some proposals for further study and suggestions for improving speaking
skill.
To fully achieve the stated aims, the study should answer the following basic questions:
- What are the prosodic features used to express hesitation and reservedness in
English spontaneous speech?
- What are the similarities and differences in the expression of hesitation and
reservedness by prosody in English and Vietnamese?
- What tips are utilized to improve speaking fluency?
3. Scope of the study
Many fields relating to hesitation phenomena and prosodic features need to be
explored. However, due to the limited time and available facilities, this thesis only focuses
on the following aspects:
- Hesitation and reservedness in spontaneous speech;
- Typical types of hesitation and reservedness in English and their equivalences in
Vietnamese including silent pauses, filled pauses, repetitions, syllable lengthening.
study. It is considered as the backbone of the study. This part consists of three main
chapters. Chapter 1 shows the theoretical background of hesitation, reservedness and
prosody. Chapter 2 explores the similarities and differences of hesitation and reservedness
4
expressed via prosodic features in English and Vietnamese equivalent expressions. Chapter
3 suggests some tips for the teacher to improve the students‟ English speaking fluency.
Part C is “CONCLUSION” in which the author will give the readers some concluding
remarks as well as suggestions for further study. 5
PART B: DEVELOPMENT
Chapter 1: THEORETICAL BACKGROUND
In order to create the basis for analyzing and synthesizing the data in the main part of
the study, it is necessary for the author to have a comprehensive understanding of
theoretical background. In this part, the author will help the readers understand more about
the history of hesitation phenomena research which is also the basis for the author to
conduct the study. Besides, the nature of hesitation and reservedness in spontaneous speech
is revealed with the provision of definitions and types. Prosody is another core point which
the study focuses on so the author tries to clarify its concepts and features as well. Here,
prosody is simultaneously clarified in both English and Vietnamese language. This enables
the author to have a good approach in analyzing the similarities and differences of both
languages in the later chapter.
1.1 Literature review
The existence of hesitation and reservedness phenomena is a universal characteristic of
spontaneous speech in any language. Hence, this phenomenon has really attracted the
Edition
(1995:559) gives a clear concept: “Hesitation is the status of being slow to speak or act
because one feels uncertain or unwilling, to PAUSE in doubt or being worried about or
shy of doing something”. From the similar point of view, the Macmillan Dictionary for
Advanced Learners 2
nd
Edition (2002), a popular online dictionary for English learners,
also explains that hesitation is a pause before doing something, or a feeling that you should
not do it, especially because you are nervous, embarrassed, or worried. Synonyms or
related words for this meaning of hesitation can be found in words such as “uncertainty,
doubt, reservation, question, reserve”.
With regard to hesitation in spontaneous speech, a lot of definitions are given by
linguists but it is uneasy to have a common definition. Firstly, Fox Tree and Clark
(1997:152) defined hesitation as a phenomenon which occurs when “the speaker does not
immediately find an adequate option for language production and is compelled to
temporarily delay the output to solve his or her difficulties”. Later, Rolf Carlson, Kjell
Gustafson, and Eva Strangert in another study gave similar concept about hesitation
(2006: 21–24): “hesitation is the phenomenon when you are uncertain to what to say or
you have problems in lexical access or in the structuring of utterances or in searching
feedback from a listener”.
7
Obviously, Fox Tree and Clark and Rolf Carlson gave out the same explanations when
hesitation occurs in spontaneous speech. They focus more on the causes of hesitation rather
than their expressions. In this study, the expressions of hesitation are more considered.
1.2.2 Definition of reservedness
The word “reservedness” derives from the English adjective “reserved”, which
means the tendency to avoid showing one‟s feelings or expressing one‟s opinions to other
people (the Oxford Advanced Learner’s Dictionary 5
th
pauses are indicative of the strength of association between sequential linguistic elements”.
Silence has its own communicative value. It is possible that the speaker deliberately put
pauses into his speech to make the listener‟s job easier, or to aid them to segment speech or
to give them time to parse the speech. We have pauses at the end of syntactical boundaries,
breathing pauses and hesitation pauses. In order to differentiate among these types of
pauses, we can look at the below example in a conversation taken from the study of
Davidson in which silence is inserted into.
A: Well did you want me to just pick you- get into Robinson‟s so you could buy a
little pair of slippers?
(silence)
A: I mean or can I get you something?
(based on Davidson, 1984: 104)
Obviously, the silence follows the proposal or request which the speaker offers to the
hearer. However, the appearance of silence implies that the speaker understands their
hearer to be reluctant, not hearing, or for some other reason slow to respond. Hence, the
speaker‟s silence is an intentional signal or it is a hesitation pause.
1.2.3.2 Filled pause
Filled pauses are hesitation sounds that speakers employ to indicate uncertainty or to
maintain control of a conversation while thinking of what to say next. Filled pauses do not
add any new information to the conversation and they do not alter the meaning of what is
uttered. They are called fillers which are regarded as extra linguistic noise. In English, the
set of filled pauses includes /ah, eh, er, uh, um, erm, hm/. Among them, a nasalized “um”
and an oral “ah” are the most common fillers. Other sounds or non-lexemes can
occasionally be used as a filled pause, and some speakers may adopt the words "well", “I
mean” and "you know". In Vietnamese, the assertion of /ừm/, /à/, /ờ/ is the distinguishing
fillers. The examples of (1) and (2) illustrate the most prevalent forms of filled pauses in
English and Vietnamese.
(1) A: Tomorrow will you go to the cinema with me?
B: I… uh … am busy
ừm
"75 divided by 5 is um 15."
(75 chia 5 bằng ừm 15)
Ah
a, à
"Ah well, I will try."
(À, tôi sẽ thử).
1.2.3.3 Repetitions
Repetition is a common phenomenon in spontaneous speech. It is often the case that
repetition to a spoken dialogue system occurs when the users fail to make themselves
understood. Repetition can occur when a unit of speech, such as a sound, syllable, word, or
phrase is repeated, e.g: "to-to-to-tomorrow".
Repetitions in spontaneous speech in most cases involve a first instance of the repeated
word (R1), a possible silent pause (SIL), a second instance of the repeated word (R2), and
continuation of the utterance. An example is given below:
(a) I might (R1) might (R2) have to go to the cla- class.
10
(b) I might (R1) (SIL) might (R2) have to go to the cla- class
In Vietnamese, repetition often occurs in dialogues as a way of expressing the
speaker‟s attitude. Like English language, Vietnamese people can make use of repeating a
word; e.g: “Tôi tôi tôi xin lỗi em. Có lẽ không cần nữa "(From the short story Con
bé và Gã lang thang: 15 by Chiêu Hoàng).
1.2.3.4 Syllable lengthening
Apart from hesitation markers like filled pauses, syllable lengthening is quite common
in spontaneous speech. The speaker can lengthen the syllables or words, e.g: English
speakers often lengthen the words “a:nd” and “we:ll” in their utterance as in the following
example: “Yesterday he came a:nd asked about your book”. Moreover, the most common
instance of lengthening occurs when the particle “the” is pronounced as “thee” in and the
ending vowel sound is drawn out past its usually enunciated duration.
the way sounds combine in syllables have gone a long way toward mastering the sound
system. It is important to learners‟ ability to make themselves understood as any of the other
features of the sound system. What helps people understand is the characteristic “melody” –
the prosodic features of the language –includes variations in pitch, loudness, tempo and
length.
1.3.2.1 Pitch (or Fundamental Frequency)
Pitch is an important component to establish intonation of utterances, especially for
English. It is defined as the frequency of vibration of vocal cords and the relative height of
speech sounds as perceived by a listener. Pitch represents the fundamental frequency (F0)
of its signal which is calculated as the number of repetitions, or cycles, of its waveform per
second, and given in Hertz (Hz.). Pitch varies over an entire phrase or sentence, which is
manifested by different pitch curves. In English, three main levels: fall, rise and level can
combine to formulate patterns of pitch including fall, rise, fall- rise, rise – fall, level. Each
pattern of pitch carries its own function, e.g: fall (the impression of completeness and
finality), rise (a certain degree of doubt/ uncertainty), rise – fall (strong feeling of approval
or certainty), fall – rise (limited agreement or response with reservations or hesitation),
level (a feeling of saying something routine, uninteresting or boring). For example:
- Today we learn English phonetics (With falling tone, the speaker wants to give a
statement)
12
- Today we learn English phonetics (With rising tone, the speaker puts a question to the
hearer).
Different from English language, Vietnamese is a tonal language in which there exist
six tones (Ngo Nhu Binh, 2001: 12-14). Tones have distinctive pitch contours: Ngang has
an almost level contour (e.g: ma); Sắc has high rising contour (e.g: má); Ngã also has an
overall rising pattern, but interrupted by a glottalization in the middle (e.g: mã ); Huyền has
a falling tone (e.g: mà); Nặng has a drop tone interrupted by a glottalisation (e.g: mạ) and
Hỏi is gradually falling then rising in the last third back to the original level (e.g: mả). A
change of pitch in Vietnamese can make words change their meaning but it can not change
study of speech, it is usual to use the term “length” for the listener's impression of how long a
sounds lasts for, and duration for the physical, objectively measurable time. For example, I
might listen to a recording of the following syllables and judge that the first two contained
short vowels while the vowels in the second two are long: / bit bet bi:t bæt /; that is a
judgment of length. But if I use a laboratory instrument to measure those recordings and find
that the vowels last for 100, 110, 170 and 180 milliseconds respectively, I have made a
measurement of duration.
14
Chapter 2: HESITATION AND RESERVEDNESS VIA PROSODIC FEATURES IN
ENGLISH AND VIETNAMESE
In this main part of the study, what I would want to do is to apply the comparative study
approach in analyzing English and Vietnamese samples. The analysis is to uncover what
prosodic features contribute to the expression of hesitation and reservedness in English and
Vietnamese spontaneous speech. Here, English is used as an instrumental language.
Prosodic features including pitch, duration, loudness and tempo are tested on PRAAT
software to determine whether they have any influence on hesitant speech. On the basis of
data collection and analysis, the author draws out the similarities and differences between
English and Vietnamese prosodic features in showing hesitation and reservedness.
2.1 Procedures
2.1.1 Collecting samples of spontaneous speech
In this study, the following data are utilized:
In English: three job interviews are extracted from online TOEFL tests. Each individual
short sequence lasts for about 5-7 minutes. The applicant in the first interview is male; two
applicants in the other interviews are female. Each applicant is in a different mood
(overconfident, shy and technical). However, the study only focuses on the prosodic cues
for hesitation; therefore, psychological factors are omitted.
In Vietnamese: three interviews are recorded from the program “Gõ cửa ngày mới” on
VTV1 (a national television channel). Each interview lasts for 4- 5 minutes and famous
people are invited for the interview to talk about their career and life. The interviewees
computer software. Besides, transcribing and analyzing audio and video data is extremely
time- consuming. Especially, the author had to calculate the speaking rate. Another
difficulty is that purely automatized speech processing is error-sensitive because it requires
the subjectivity. Finally, the subjects were downloaded again from the available resources
but not directly recorded; therefore, the quality of sound was not good and some errors
during playing the records could happen.
2.2 Data analysis
2.2.1 Prosodic feature analysis of hesitation and reservedness in the English samples
In order to characterize the speech at the prosodic level, prosodic values such as
duration, F0 slopes, speaking tempo and loudness were measured by Praat version 5.1.3.7.
16
All the samples were recorded and saved in a single .wav file for each block for each
participant. Not all the sentences in three English samples were analyzed. The author only
refined 20 typical sentences which contain as much hesitation as possible after listening
carefully to each corpus. These sentences were digitized and then transcribed in Praat using
TextGrids. Figure 1 illustrates the sentence which was entered into Praat using Textgrid for
analysis. In this figure, blue line, grayish image, red speckles and yellow or green line
show pitch contour, spectrum, formant and pulses respectively. We can get the values of
fundamental frequency in Hz, duration in seconds and speaking tempo in syllable/second.
Besides, pauses are represented as blank lines (see Praat screen on Figure 1).
Figure 1: Praat Editor showing waveform, spectrogram and TextGrid
2.2.1.1 Pitch contour
An important feature to take into account when modeling pitch in hesitation
phenomena is F0 slope. For example, syllables at the end of a sentence are mainly
pronounced with a descending pitch slope, but an interrogative sentence ends with a rising
pitch. Therefore, it is reasonable to investigate whether there is a standard pitch slope for
hesitations or whether, on the other hand, it depends on other aspects, such as semantics or
syntax.
can use fundamental frequency (F0) to demonstrate the change of pitch contour. F0 slope
is manipulated in Hz and it ranges from 75-500 Hz. The word “no” /nəu/ is a single
syllable which has an onset. The F0 value of onset (F0 start value) and the F0 value of the
syllable- ending (F0 end value) are measured in Hz. From automatic analysis using Praat
TextGrid, we get F0 start value at 228.1 Hz, then decreasing to 181.3 Hz and rising again
at F0 end value (200.1 Hz). This means that pitch descends and then rises again.
In conclusion, when the speaker wants to implicate a limited agreement or respond
with reservations, he can use a fall-rise pitch to convey his intention.
2.2.1.2 Duration
Duration is the most important cue to the impression of hesitation. Firstly, we pay
attention to the duration of the pauses and determine how it changes on the part of hesitant
speech. As far as duration is concerned, the first parameter we test is the length of pauses
which is measured in millisecond (ms). Length is scaled automatically using the Praat
version 5.1.3.7 duration tier manipulation standard settings. Consider the following
examples in which pauses are represented in milliseconds between brackets:
19
(3) // Yeah, no problem.{120} I left loads of time „cos you know what the trains are like
nowadays. {240}// And I wasn‟t really sure where the Heppleworth site was, but
er…{333} the directions you *er* sent me were crystal clear.// (Interview 1, line 6- 11)
(4)// Um…{5005} u {1831} no. {1221}// I don‟t think so. {2099} // I can‟t think of
anything.// (Interview 2, line 80)
(5) //…{2332} no, sorry {598} that‟s the molecular ion.{2710} The base peak is….{2881}
the most intense peak. {1086} // All the other peaks are relative to this. {927}// I can‟t
believe I got them mixed up.// (Interview 3, line 29-30)
In example 3, there are very short silent pauses (120 ms, 240 ms) which appear when the
applicant has apparently figured out what he wants to convey (subsequent delivery is fairly
fluent), but is just trying to stop for breathing. In other words, these pauses play the
demarcative role between different syntactic components. The pause after the filler /er/
seems to last longer (333 ms) when the applicant is trying to find next words. However, the