Skip to Content

John Blake Corpus linguistics

1 reply [Last post]
admin
:
Richard Watson Todd
:
Frequency and saliency in corpus analysis

While I use corpus analyses fairly frequently in my research, one issue that always worries me is the (over-)reliance of corpus analyses on frequency. Yes, frequency is important, but saliency is also a major issue in how we process texts. If I'm skimming an issue of a journal, I'll look at headings, abstracts and tables as these are quick to process and should contain the most important information. Converting a text into a corpus, though, is ultra-egalitarian - footnotes are treated as of equal value to titles.
The only (but deeply unsatisfactory) way I can see of accounting for saliency while still using current corpus and concordancing tools is to weight certain parts of the text that are likely to be perceived as salient. Given the way corpus tools process information, the only way of doing this that I can see is by duplicating these salient parts of the text. For a research article, titles could be copied 5 times, abstracts and headings 3 times, and sub-headings twice to make the frequencies mirror their salience (these numbers are just off the top of my head - it would be interesting to conduct research into how appropriate weightings could be generated).
Do you know if anyone has ever tried to do something like this? To what extent would this affect corpus findings?
Most importantly, are there any other ways of integrating saliency into a corpus analysis?