Monday 23 February 2009

The corpus-based Behavioral Profile approach to cognitive semantics

Gries and Divjak (in press) (Quantitative approaches in usage-based cognitive semantics: myths, erroneous assumptions, and a proposal) generally argue in favour of quantitative corpus-linguistics methods in cognitive linguistics. At this stage of my project, Gries and Divjak's paper provides me with methodological tools to combine numbers (i.e. frequency of occurrences of 'may' and 'can') and word senses (i.e. frequency of occurrences of the various senses of 'may' and 'can'). Of particular interest is the attention that the authors pay to cases of polysemy and to cross-linguistic studies.

Gries and Divjak point out that "cognitive linguistics can only benefit from reducing the subjective element in its methods as much as is feasible" (p.4). For that purpose, the authors propose the Behavioral Profile approach (BP). Behavioral profiling of lexical items is based in distributional properties captured by percentages and "allows researchers to analyze the BP data using statistical techniques as well as to compare the results to data/results from other studies" (p.8)

The BP approach is based on two assumptions:

i) "corpus data provides (nothing but) distributional frequencies" (p.4)
ii) "distributional similarity reflects, or is indicative of, functional similarity" (p4)

[functional similarity = any function of a particular expression, ranging from syntactic to discourse-pragmatic]

Methodological steps involved in the BP approach:

1) Retrieval of all instances of a word's lemma from a corpus in their context.

2) Semi-manual analysis of many properties of the use of the word forms (following Atkins (1987): morphological characteristics, syntactic characteristics, semantic characteristics. The identification of those features allows to compile ID tags for the word forms).

3) Generation of a co-occurrence table that specifies which ID tag level is attested how often in percent with each sense of a polysemous word. The columns containing the percentages for each sense is referred to as the sense's behavioral profile.

Application of the BP approach to polysemy

Gries and Divjak show how the BP approach can assist in answering questions related to the phenomenon of POLYSEMY, such as the identification of prototypical senses of specific lexical items, the connection of a particular sense of a polysemous word to the network of already identified senses, the usefulness of a cluster-analytic approach in the domain of POLYSEMY.


Application to cross-linguistic studies

This section is of particular interest to me because of the recent addition of the CODIF corpus to my data set. So a semantic study of French and English sub data sets will be carried out.

Gries and Divjak recognise that "[c]ross-linguistic semantic studies are notoriously difficult given that different languages carve up conceptual space(s) in different ways (cf. Janda, to appear for discussion); for that reason, linguistic dimensions are difficult to compare across languages" (p7)

[what is meant here exactly by 'linguistic dimensions'?]

For Gries and Divjak, because the BP approach is based on operationalizable distributional properties, it can be applied to cross-linguistic studies : "concordance lines from different languages can be annotated for a number of common characteristics while at the same time doing justice to any individual languages characteristics and avoiding overly subjective intuitions regarding cross-linguistic semantic differences" (p.7)

The BP approach seems that it could provide a unified model to investigate the semantic domain of POSSIBILITY both cross-linguistically and via polysemous 'may', 'can' and 'pouvoir'.


References to check out:

Janda, Laura A. (to appear) What is the role of semantic maps in cognitive linguistics? In Piotr Stalmaszczyk and Wieslaw Oleksy (eds.). Festschrift for Barbara Lewandowska-Tomaszczyk.


More useful quotes:

"(...) the concordance lines of a particular search expression and the uses of the word and their frequencies constitute an objective database of the kind that made-up sentences do not since researchers cannot invent all uses of an expression in a corpus let alone their frequencies of occurrence" (p.3)

" (...) corpus-linguistics studies meaning in terms of use, which in turn is made tangible through distribution, and hence lends itself better to quantification." (p.4)

4 comments:

  1. Here is a comment I received by email:

    "Nice use of the blog. I look forward to read more summaries/evaluations of what you're reading!"

    Thanks, I think there will be plenty more, judging by the pile of papers waiting on my desk!

    ReplyDelete
  2. Behavioral profiles: a corpus-based approach to cognitive semantic analysis (Gries and Divjak) (to appear)(http://www.linguistics.ucsb.edu/faculty/stgries/research/BehavioralProfiles.pdf) is an extended version of the paper described in the main post.

    Additional features:

    - a short review of the treatment of POLYSEMY in Cognitive Linguistics

    - an attempt to demonstrate that a corpus-based approach to POLYSEMY (as treated in the CL literature) provides solutions to problems highlighted in previous approaches to the phenomenon (e.g. lack of consideration of contextual information, lack of method in identifying how primary senses develop, and generally, lack of empirical support)

    - presentation of case studies as detailed exemplification of the above

    - clearer explanation of how to statistically process ID tags

    ReplyDelete
  3. A useful reference that I picked up from: "Clusters in the mind? Converging evidence from near synonymy in Russian" (Divjak and Gries:2008). The Mental Lexicon 3.2:188-213

    "[T]he corpus-based BP approach is an objective data-driven alternative to intuitive approaches to semantics with at least two major advantages. On the one hand it yields descriptions at a previously not utilized level of precision and makes it possible to answer notoriously difficult questions in the domain of polysemy, near synonymy, and lexical fields (cf. Gries, 2006; Dabrowska, in press; Divjak, 2006; Divjak and Gries, 2006) including issues like network construction, prototype identification, and the analysis of similarities of words and word senses (i.e. the structure of word senses and lexical fields). On the other hand, it corrolates strongly with different experimental methods: sorting and gap-filling (..), sentence elicitation and video descriptions (...), and forced-choice selection and judgement tasks (...)" (p210)

    ReplyDelete
  4. 'Behavioral Profiles' are defined by Gries (and according to Hanks 1996:77) in 'Corpus-based cognitive semantics, a contrastive study of phasal verbs in English hand Russian' (http://www.linguistics.ucsb.edu/faculty/stgries/research/ContrastivePhasalVerbs.pdf)as "(...) the totality of complementation patterns of a word tht determines its semantics" (p.276)

    ReplyDelete