Access to the BAWE Corpus
With your own corpus query tools
The corpus can be downloaded from the Oxford Text Archive. BAWE is listed as resource number 2539. The corpus is suitable for use with concordancing programs such as WordSmith and AntConC.
Parsed versions of the corpus have been created by Phil Durrant using the Stanford Core NLP parser, and are available here: https://phildurrant.net/parsed-bawe-corpus/
With Sketch engine open access*
The Sketch Engine open access interface will allow you to view concordance lines and surrounding contexts. You can select the files you want to examine by filtering for features contained in the file header (for example you can choose a specific genre family, or the discipline / level / gender / L1 of contributors).
This manual will help you get started with Sketch Engine.
See also:
- Diana McCarthy's slides from the BAAL Corpus Linguistics SIG event, December 2010.
With Lextutor
Concordances from an untagged version of the BAWE corpus can be created using Lextutor. This is a freely accessible, easy-to-use tool for teachers and learners.
With the EAP Foundation Concordancer
Concordances from an untagged version of the corpus can be created with the concordancer on the EAP Foundation website. This is freely accessible and very easy to use for simple searches.
With CorpusMate
CorpusMate enables teachers and learners to discover the language used in different disciplines. BAWE is one of the corpora in the CorpusMate database.
By subscription
Subscribing to Sketch Engine provides access to a number pre-loaded corpora, including BASE and BAWE, and offers a wider range of search features. You can register for a 30-day free trial account.
*Notes about the BAWE Corpus in Sketch Engine
This version of the corpus has been prepared by Paul Thompson and Alois Heuboeck at Reading University. The files have been tagged by Paul Rayson at Lancaster University for POS (CLAWS tagset) and for semantic category using WMatrix. The Sketch Engine website describes query options for this version, as some of the BAWE markup has been modified.
BAWE contains 6,506,995 running words, but in SketchEngine the total number of tokens is reported as 8,336,262. This is because the SketchEngine token counts include punctuation.
Teaching and Learning
The BAWE Quicklinks project provides selected links for teachers to use in their feedback to students.
The online course at the University of Queensland SLATx: Improving writing through corpora: Data-driven learning uses BAWE to help students with their academic writing, and also provides an introduction to corpus consultation for teachers and researchers.
Word list
- Key lemmas Agriculture
- Key lemmas Biological Sciences
- Key lemmas Business
- Key lemmas Chemistry
- Key lemmas Computer Science
- Key lemmas History
- Key lemmas Leisure & Tourism
- Key lemmas Law
- Key lemmas Linguistics
- Key lemmas Mathematics
- Key lemmas Medicine
- Key lemmas Philosophy
- Key lemmas Physics
- Key lemmas Politics
- Key lemmas Psychology
- Key lemmas Sociology
- Key lemmas Engineering
- Key lemmas Cybernetics & Electronic Engineering
- Key lemmas English and American Studies
- Key lemmas Food Sciences
- Key lemmas Health
- Reporting Verbs
- AWL Alphabetical
- AWL Frequency
- Collocations
- Frequency lists by discipline