What is Corpus Linguistics?
Corpora, Concordancing, and Usage
In order to conduct a study of language which is corpus-based, it is necessary to gain access to a corpus and a concordancing program. A corpus consists of a databank of natural texts, compiled from writing and/or a transcription of recorded speech. A concordancer is a software program which analyzes corpora and lists the results. The main focus of corpus linguistics is to discover patterns of authentic language use through analysis of actual usage. The aim of a corpus based analysis is not to generate theories of what is possible in the language, such as Chomsky's phrase structure grammar which can generate an infinite number of sentences but which does not account for the probable choices that speakers actually make. Corpus linguistics’ only concern is the usage patterns of the empirical what that reveals to us about language behavior.
The Advantages of Doing Corpus-Based Analyses
Corpus linguistics provides a more objective view of language than that of introspection, intuition and anecdotes. John Sinclair (1998) pointed out that this is because speakers do not have access to the subliminal patterns which run through a language. A corpus-based analysis can investigate almost any language patterns--lexical, structural, lexico-grammatical, discourse, phonological, morphological--often with very specific agendas such as discovering male versus female usage of tag questions, children's acquisition of irregular past participles, or counterfactual statement error patterns of Japanese students. With the proper analytical tools, an investigator can discover not only the patterns of language use, but the extent to which they are used, and the contextual factors that influence variability. For example, one could examine the past perfect to see how often it is used in speaking versus writing or newspapers versus fiction. Or one might want to investigate the use of synonyms like begin and start or big/large/great to determine their contextual preferences and frequency distribution.
Applying Corpus Linguistics to Teaching
According to Barlow (2002), three realms in which corpus linguistics can be applied to teaching are syllabus design, materials development, and classroom activities.
Classroom Activities
These can consist of hands on student-conducted language analyses in which the students use a concordancing program and a deliberately chosen corpus to make their own discoveries about language use. The teacher can guide a predetermined investigation which will lead to predictable results or can have the students do it on their own, leading to less predictable findings. This exemplifies data driven learning, which encourages learner autonomy by training students to draw their own conclusions about language use.
Teacher/Student Roles and Benefits
The teacher would act as a research facilitator rather than the more traditional imparter of knowledge. The benefit of such student-centered discovery learning is that the students are given access to the facts of authentic language use, which comes from real contexts rather than being constructed for pedagogical purposes, and are challenged to construct generalizations and note patterns of language behavior. Even if this kind of study does not have immediately quantifiable results, studying concordances can make students more aware of language use. Richard Schmidt (1990), a proponent of consciousness-raising, argues that “what language learners become conscious of -- what they pay attention to, what they notice...influences and in some ways determines the outcome of learning." According to Willis (1998), students may be able to determine:
the potential different meanings and uses of common words
useful phrases and typical collocations they might use themselves
the structure and nature of both written and spoken discourse
that certain language features are more typical of some kinds of text than others
Barlow (1992) suggests that a corpus and concordancer can be used to:
compare language use--student/native speaker, standard English/scientific English, written/spoken
analyze the language in books, readers, and course books
generate exercises and student activities
analyze usage--when is it appropriate to use obtain rather than get?
examine word order
compare similar words--ask vs. request