Oral History material as a source for dialectological studies

Lieselotte Anderwald, University of Freiburg

September 2001

Our research project English Dialect Syntax from a Typological Perspective (funded by the Deutsche Forschungsgemeinschaft since March 2000) concentrates on the investigation of morphological and syntactic features of British non-standard dialects. Looking for sufficient material to investigate these features, we came across several criteria that constrained our choice of material:

(1)   Material recording the use of speakers who grew up, ideally, before WWI would be ideally suited for our purposes. This makes already existing material collected in the 1970s and 80s preferable to present-day material, although some old aged speakers still exist, and we do also use material collected from these speakers.

(2)   Features of syntax are – almost by definition – much rarer than features of phonetics and phonology and very large quantities of text are therefore necessary. This restricts the practicality of collecting one's own corpus from scratch considerably.

(3)   Some material previously collected by dialectologists already exists, and some colleagues were kind enough to give us access to it, even though we were not part of the original research projects. New data that we would want to collect would ideally complement this already existing material regionally – again implying that ideally, the complementary material would ensure (temporal) comparability with these other materials by being from the 1970s or 80s, or by recording speakers from the same age cohort.

Dialectological work in the past has worked mainly with questionnaires. The best-known example (for England) is of course the Survey of English Dialects (for details on the sed cf. Orton 1960, Orton 1962). The sed has the advantage that it was conducted nation-wide; every county of England is covered in a mesh of several informants. In addition, the use of a relatively rigid questionnaire ensures absolute comparability, although this is valid only in the frame of the questionnaire. An obvious disadvantage of questionnaire-based studies is the fact that you only get what you ask for. As the overriding interest of the sed was phonetic/phonological and lexical, direct questions about syntactic features are missing almost completely, while morphological features are investigated only very marginally. The sed has the further disadvantage that it was conducted too early for the wide-spread use of tape-recorders. The material is available in phonetic transcription, but as this is a laborious procedure for taking linguistic notes even for the most practised transcriber, the recording of spontaneous sentences and constructions (where they occur at all in the fieldworkers' notebooks) hardly ever go beyond the clause level and is embedded in very little context. A third disadvantage of the sed is the fact that it has only a limited range of speakers per location (typically one per location). As typically only one answer per question was recorded, the sed can only record variation between different speakers, but practically no variation of an individual speaker, so that a possibly variable use of a particular feature can only be indirectly determined, if at all.[1]

Today, most dialectologists still base their studies on questionnaires, but fortunately, thanks to technological evolution, tape-recordings are the rule, and usually, these interviews also contain stretches of spontaneous discourse. Unfortunately, however, even the material of more modern dialectologists has a number of disadvantages. For obvious practical reasons, researchers have usually concentrated on one, at the most a small number of localities from which they have interviewed speakers. There is thus only rarely comparability across different dialects. A more major disadvantage is the fact that outsiders do not usually have access to the raw material, other than those excerpts that are published in the context of dialect studies. Many researchers resent the idea of parting with their data, and this is understandable as a lot of – often unpaid – work has gone into collecting and transcribing the material. In addition, many linguists may want to re-use aspects of their material for new purposes, sometimes over decades. Happily (for us), there are some exceptions to this rule, and we have received some material by researchers that has helped to build up a stock of transcribed regional data.

A third source sometimes employed in dialectology, the collection of non-standard dialogue from works of fiction, is so clearly inferior in linguistic terms to authentic dialect material that it does not really merit further discussion in this context.[2]

A further complication is the fact that the interest in dialect syntax is a relatively new phenomenon, and this might also be the reason why existing corpora are sometimes not sufficient in size. Since a sufficiently large dialect corpus is not available at the moment, we had to search for alternatives. To sum up, we were looking for large quantities of traditional regional speech, preferably by older local speakers with strong family affiliations in the area, that would record the use of speakers who grew up before wwii, even better before wwi, i.e. we were looking for material preferably from the 1970s and 80s, or from very old speakers from the 1990s. It had to be recorded in acceptable quality for linguistic analyses, preferably even including transcripts that were reliable on a word-by-word basis, and – most important of all – the material had to be more or less freely available to us as researchers not originally part of the research design. These criteria suggest a new source that has so far not – or hardly – been used for dialectological purposes, the use of tape recordings and transcripts from projects of Oral History. 'Oral History' is a concept that seems to date from the 1970s (the oed records a single first use in the US from 1971, then a number of occurrences from 1977 and 1978) and is described by the Oral History Society as 'the recording of people's memories and feelings. It is the living history of everyone's unique life experiences'[3], thus placing it in contrast to the received notion of history somehow having to do with 'dusty books and documents, archives and libraries, or remote castles and stately homes'[4].

Oral History collections sometimes originate from projects (short- or long-term) undertaken by an individual (sometimes also a group of individuals or an institution) with an interest in a specific theme or topic, often just recording life memories. Typically, these are lay persons, not professional historians, although some projects were initiated by local and academic historians, museum staff or archivists. The Oral History Society suggests for potential projects that possible topics 'could be for example memories of childhood, leisure, politics, religion or women's experience in wartime or memories of coming to Britain as a migrant'[5]. The recording situation makes Oral History material ideal for linguistic investigations. The interviewers were usually true insiders, coming from the area, often still speaking the dialect themselves, which tends to ease up the interview situation considerably. A second advantage is that the informant's attention was genuinely on what was being said, rather than on how it was being said. Thankfully, the Oral History Society advises all potential interviewers to give a copy of the tapes to their local library or archive,[6] and this is the place where Oral History material can be found today across Great Britain.

Members of the Oral History Society are advised to at least 'write a synopsis of the interview which briefly lists in order all the main themes, topics and stories discussed'[7], but does not explicitly mention verbatim transcripts. While for our purposes existing transcripts are of course extremely welcome, the non-linguistic interests of Oral History Projects makes it clear that 'normalization' or 'standardization' of linguistic features during the process of transcription must be expected. After all, non-standard syntax (much more than non-standard pronunciation or lexis) is still considered incorrect by the largest part of the population, and is often corrected almost unconsciously in the process of writing spoken language down.

Other features associated with spoken language are also typically cleared during the process of (non-linguistic) transcription in order to make spoken language 'look' more like written language. Word-by-word transcriptions including repetitions, false starts, incomplete sentences etc. are the exception rather than the rule. If we have access to the original tapes, the purified written version can be remedied relatively easily (if laboriously and time-consumingly), by listening to the original tapes and de-standardizing them. Carefully comparing transcripts with the original tapes has allowed us to re-insert all morphological, syntactic and pragmatic features. For the rest of the material where no transcripts were available, we transcribed the original tapes from scratch, mostly with the help of native speakers who either worked on the project or are associated with it in related research projects.

As a result, the actual transcripts used for the Freiburg English Dialect corpus fred are verbatim equivalents of the spoken versions; hesitations, repetitions, false starts of the same sentence etc. are all included. In addition, all morphosyntactic dialect features have been reinserted. A variety of phonological features were kept, either if they were already represented in the original transcripts, or because we suspected that they might interact with morpho-syntax, e.g. features like h-dropping, wanna, gonna, s’pose etc. We also represent certain paralinguistic features like laughter, long pauses, indistinct stretches of conversation (marked as gaps, unclear or truncated words) which are all indicated in the transcriptions by specific tags to minimize the risk of ambiguities and to open up the possibility for analyses on a pragmatic or discourse level. In this way we have tried to remedy the linguistic shortcomings of the original Oral History material.

Extralinguistic variables in fred are constrained by intention – fred is not designed to be a representative sociolinguistic corpus, but a regionally representative corpus of as broad a dialect speech as possible. Our Oral History Projects concentrate on interviewing older people. These older people are typically very local, i.e. still live in the place where they were born, without having moved outside the region for any considerable stretch of time. Also, typical fred informants usually left school about the age of fourteen or younger, certainly not continuing on to higher education (although there is also one informant in our material who is a lawyer, constituting the rare exception). Finally, most of our informants are male – as is well known, women tend to use more prestigious, in many cases more standard forms of speech where it is available to them. In other words, most of our informants would qualify in dialectology as typical norms (e.g. Chambers and Trudgill 1998:29), i.e. non-mobile old rural male speakers with little education. Although this restricts the range of investigations that can be conducted with the help of fred in sociolinguistic terms, his represents exactly the same bias as in earlier dialectological work, where we find a preponderance of norm informants as well, so that results from work on fred will be comparable to earlier studies or to material from earlier investigations.

To conclude, then, although there might be some problems and drawbacks of using Oral History material, the advantages clearly outweigh them. Oral History material is an excellent starting point for a regionally representative investigation into the non-standard dialects of Great Britain, one that seems to have been overlooked by dialectologists until very recently. Fred is a corpus of authentic, conversational, dialect texts that is subdivided into a number of large regions. It is available in a computerised format that allows analyses via automatic text retrieval programmes like TACT or WordSmith. The areas that will be represented are Scotland, Northern Ireland and Wales (sometimes called the Celtic Englishes), the North of England, the Midlands, East Anglia, the Southeast, the London area, and the Southwest. For most areas, data from Oral History Projects has been available. We take this opportunity to thank those dedicated individuals, museums, archives and libraries that have helped to create a huge archive of regional English, and hope that using this material for purposes not originally envisaged highlights their immeasurable value not only to contemporary history but also to contemporary linguistics.


For the current status and size of fred, click here.



Chambers, J.K. and Peter Trudgill. 1998. Dialectology. 2nd edition. Cambridge: CUP.

Orton, Harold. 1960. 'An English dialect survey: Linguistic Atlas of England', Orbis 9: 331-48.

Orton, Harold. 1962. Survey of English Dialects: Introduction. Leeds: E. J. Arnold.

[1] This is not a serious limitation for questions of phonetics/phonology, where several questions may elicit the use of a particular feature, but it is a serious limitation for morphological items where typically only one question per feature is available.

[2] Although some authors are well-known for their authentic renditions of local speech (e.g. George Eliot, Emily Bronte or Charles Dickens), and for research into historical stages of specific dialects use of these materials might have some (limited) justification.

[3] 'What is Oral History?' at http://www.nmgw.ac.uk/~ohs/oralhis.htm, p. 1.

[4] 'How to do ORAL HISTORY' at http://www.nmgw.ac.uk/~ohs/ohs/howto.html, p. 1.

[5] 'How to do ORAL HISTORY' at http://www.nmgw.ac.uk/~ohs/ohs/howto.html, p. 2.

[6] 'How to do ORAL HISTORY' at http://www.nmgw.ac.uk/~ohs/ohs/howto.html, p. 7.

[7] 'How to do ORAL HISTORY' at http://www.nmgw.ac.uk/~ohs/ohs/howto.html, p. 7