
What is Corpus Linguistics
Corpus linguistics is a study of language and a method of linguistic analysis which uses a collection of natural or “real word” texts known as corpus. Corpus linguistics is used to analyse and research a number of linguistic questions and offers a unique insight into the dynamic of language which has made it one of the most widely used linguistic methodologies.
Since corpus linguistics involves the use of large corpora that consist of millions or sometimes even billion words, it relies heavily on the use of computers to determine what rules govern the language and what patters (grammatical or lexical for instance) occur. Thus it is not surprising that corpus linguistics emerged in its modern form only after the computer revolution in the 1980s. The Brown Corpus, the first modern and electronically readable corpus, however, was created by Henry Kucera and W. Nelson Francis as early as the 1960s.
What Corpus Linguistics Does

- Gives an access to naturalistic linguistic information. As mentioned before, corpora consist of “real word” texts which are mostly a product of real life situations. This makes corpora a valuable research source for dialectology, sociolinguistics and stylistics.
- Facilitates linguistic research. Electronically readable corpora have dramatically reduced the time needed to find particular words or phrases. A research that would take days or even years to complete manually can be done in a matter of seconds with the highest degree of accuracy.
- Enables the study of wider patterns and collocation of words. Before the advent of computers, corpus linguistics was studying only single words and their frequency. Modern technology allowed the study of wider patters and collocation of words.
- Allows analysis of multiple parameters at the same time. Various corpus linguistics software programmes and analytical tools allow the researchers to analyse a larger number of parameters simultaneously. In addition, many corpora are enriched with various linguistic information such as annotation.
- Facilitates the study of the second language. Study of the second language with the use of natural language allows the students to get a better “feeling” for the language and learn the language like it is used in real rather than “invented” situations.
What Corpus Linguistics Does Not
- Does not explain why. The study of corpora tells us what and how happened but it does not tell us why the frequency of a particular word has increased over time for instance.
- Does not represent the entire language. Corpus linguistics studies the language by using randomly or systematically selected corpora. They typically consist of a large number of naturally occurring texts, however, they do not represent the entire language. Linguistic analyses that use the methods and tools of corpus linguistics thus do not represent the entire language.