Friday 20 February 2009

ICLE and LOCNESS welcome CODIF -- the latest addition to the database

Finally coming back after a two-week immersion in the depth of ICLE and LOCNESS!

The quantitative investigation of the data started with a pilot study comparing the frequency of occurrences of 'may' and 'can' across ICLE and LOCNESS, including comparisons with the frequency of occurrences of the other central modals ('could', 'might', 'must', 'shall', 'should', 'will' and 'would') both in LOCNESS as well as in the other subsections of the ICLE corpus. In the later case, the purpose of the investigation was to find out to what extent the use patterns of 'may' and 'can' in French-English IL reflect those observable in second language English in general. The results from the pilot study proved useful as it became clear that 'may' and 'can' play a role in the profiling of French-English interlanguage through different use patterns. The findings of the pilot study are now recorded in the form of a paper entitled Investigating the typicality of 'may' and 'can' in a corpus of learner English.

I am now at a stage where I am trying to zoom into my first general findings to see if there is anything striking there. In order to do that, I have had to laboriously count and record the occurrences of 'may' and 'can' in LOCNESS file by file, pretty much manually by copying and pasting each file into Word and then finding each occurrence in the 324 304 words data set! An exercise that I only wanted to carry out once! So at that point, having made no decision about whether to consider 'may' and 'can' as individual modals or as lemmas -- which would then have included 'may not', 'cannot, 'can't' and (?)'can not' in the study, all forms of the two modals were accounted for (the decision to include 'can not' as an acceptable spelling is still being debated). So far, these are the data sets that I am able to work from:


- LOCNESS: MAY and CAN (as featuring per essay)
- LOCNESS: MAY NOT, CANNOT, CAN'T (as featuring per essay)
- LOCNESS: MAY and CAN in argumentative and literary texts (as featuring per essay)

-ICLE FR: MAY and CAN (as featuring per essay)
-ICLE FR: MAY NOT, CANNOT, CAN'T (as featuring per essay)
-ICLE FR: MAY and CAN in argumentative texts (as featuring per essay)
-ICLE FR: MAY and CAN in literary texts ( as featuring per essay)

-ICLE FR, ICLE (excl FR), LOCNESS: MAY, MAY NOT, CANNOT, CAN'T (as featuring generally across the three data sets -- this count does not include the distinction between individual files/essays)

- ICLE FR, ICLE (excl FR), LOCNESS: control variable AND

NB: Tables indicating occurrences of 'may' do not included cases of 'may not'. Cases of 'may not's are only included in relation to cases of 'cannot's and 'can't's. That allows to consider negation as a variable and to investigate its interaction with modality.


Recently, the issue of the usefulness of a native French comparison data set was raised in discussion. Such data set would be particularly helpful at the qualitative stage of the data analysis process and create opportunities for cross-linguistic collocation searches. That way, I would be able to identify what contextual features are generally lexicalised via 'pouvoir' and assess whether those features are also lexicalised via 'may' and 'can' in French-English IL. In other word, it would allow me to establish whether 'may'/'can' in Fr-English IL carry over some semantic features of 'pouvoir' and if so, in what measure. In order to carry out those collocation searches I was recently granted access to the COrpus de DIssertations Francaises (CODIF) database which is a corpus of native French essay writing (dissertations written by French undergraduates at the University of Louvain, Belgium). The CODIF database was compiled by the Centre for English Corpus Linguistics (CECL) at the Universite Catholique de Louvain, Belgium. The data set counts around 100 000 words.

From the perspective of the Cognitive Semantics framework, a three-way database (ICLE FR, LOCNESS and CODIF) allows for an investigation of the conceptual domains recruited by 'may', 'can' and 'pouvoir'. As members of the same semantic domain (i.e. POSSIBILITY), do the three modals recruit the same conceptual domains/frames? What is the nature of the relation between those domains? Does the nature of those relations vary cross-linguistically?

1 comment:

  1. Here is a useful quote from Gries's paper entitled "Corpus-based methods and SLA", and published in Robinson, Peter and Nick C. Ellis (eds.) 'Handbook of cognitive linguistics and second language acquisition'. New york: Routledge, Taylor & Francis Group. p.406-32

    "Frequency lists are also theoretically relevant in Cognitive Linguistics because, (...), the more frequent a linguistic expression, the more entrenched it is assumed to be and the more likely it has unit status" (p.415)

    ReplyDelete