Friday 16 October 2009

"can" and "may" in present-day English, Yvan Lebrun, 1965

In that post, I referred to a corpus-based study of "can" and "may" by Yvan Lebrun, namely "can" and "may" in present-day English (1965). In this post, I briefly present -- or rather log, the scope of Lebrun's study (I will present his general conclusions in a later post):

  • the study is corpus-based and includes data from both British and American English
  • a variety of genres are featured in the data: short stories, novels, plays, newspapers, scientific texts
  • all texts featuring in the data were published between 1955 and 1962
  • the study includes occurrences of might and could
  • numbers of occurrences:
  1. Total number of occurrences, including may, can, might, could: 4765
  2. Total number of occurrences of can: 2024
  3. Total number of occurrences of could: 1745
  4. Total number of occurrences of may: 491
  5. Total number of occurrences of might: 505
  • Methodologically, Lebrun scanned each instance of the modals to ascertain lexical meanings. The modals were considered to convey the same lexical meaning whenever their semantical contents proved identical once such significant oppositions as "present"vs. "past" or "indicative" vs. "conditional" had been discarded (p.11)
  • In order to decide on the semantical content of the modals, Lebrun relies on the context for each instance
  • Lebrun first carries out a recognition process of all the lexical senses and then attempts to define them
  • the process of defining the lexical senses was first motivated by Sommerfelt's recommendation (i.e. 'the definition be able to replace the word in an ordinary sentence'). Such 'replacement' process was abandoned on the basis that:

"In none of the lexical meanings CAN, COULD, MIGHT, MAY can be equated with a substitutable word or phrase. In fact, each of their lexical senses is so wide that only a long series of 'synonyms' can cover it" (p.11)

Further,

"Instead of defining CAN, COULD,MIGHT, MAY by means of longish strings of juxtaposed partial equivalents and thus blurring out the internal unity of the lemma's meaning, I renounced Sommerfelt's principle and aimed at definitions that (a) embrace every facets of the sense they are meant to cover, (b) bring out the internal unity of each meaning, and (c) emphasize what the various significations of a lemma have in common." (p.11)

  • Overall methodological strategy :
  1. Based on the three recognised lexical meanings and that are common to CAN, COULD, MIGHT, MAY, calculated how often each of these three meanings were expressed by MAY rather than by CAN and by MIGHT rather than by COULD.
  2. Lebrun examines cases where MAY and CAN are synonyms
  3. Based on the discovery that some collocations exclude the use of one of the two synonyms , Lebrun calculated the frequency of MAY relatively to CAN in kinds of clauses where either word can be used idiomatically and tried to find out if this relative frequency is independent of the context [my emphasis, this part of Lebrun's methodology reinforces the idea of including, in my study, two separate variables (i.e. SENSES and CONTEXT) for a treatment of the meanings of MAY and CAN, as featuring in my data. For more details on this, see this previous post).

Further reading (of early studies):

Lebrun, Y., Can and May, A Problem of Multiple Meaning, in Proceedings of the Ninth International Congress of Linguistics, 1962 (The Hague, Mouton, 1964)

Ten Bruggencate, K, The Use of Can and May, in Taalstudie 3 (1882), 94-106

Wood, F., May and Might in Modern English, Moderna Sprok 49 (1955), 247-253

Senses vs. context in the coding of the semantics of MAY and CAN

This brief post is a continuation on the theme of the previous post where I raise the issue of coding the senses of "may" and "can" most effectively for the purpose of statistical analysis. It provides a short update on my current line of thinking re the design an optimal coding system for the meanings of "may" and "can".

With regards to my project, I am at a stage now where I am about to start the annotation of the senses of "may" and "can" as featuring in my data and I am currently concerned with the issue of defining an appropriate degree of granularity for that stage of the coding. In other words, I need to establish how much of contextual information should be included in the coding of senses of the modals. Further, and with regards to the above, I'm now considering the inclusion of an extra variable (in addition to a SENSES variable) for the investigation of the behaviour of "may" and "can", namely that of CONTEXT. The motivation behind the inclusion of the CONTEXT variable would be to ultimately assess/quantify contextual weight on the semantics of the modals. Also, including a CONTEXT variable in the study would allow for the exclusion of 'contextuality' as a level of the SENSES variable. I would therefore approach, with SENSES, each occurrence of the modals according to their generally recognised "core" meanings. The advantages of dealing with the senses of the modals from the perspectives of both context and core meanings is that firstly, the number of levels included for each variable will be smaller than if only one variable was considered, which would facilitate the recognition of possible patterns in the data. Also, the two variables CONTEXT and SENSES could then be tested for possible mutual interaction which ultimately could be quantified statistically. Such a design of the data would also allow me to address a whole chunk of literature in the English modals that tries to assess what, semantically, belongs to the modals and what belongs to the context and the situation of utterance, and to what measure. To my knowledge, that line of work still remains to be experimentally challenged. Identifying/differentiating two meaning-related variables such as SENSES and CONTEXT could facilitate the possible inclusion of an experimental task that would aim to assess potential statistical results. I'm currently exploring the feasibility of that possibility.

Sunday 11 October 2009

Coding the English modals for senses : Leech & Coates (1980), Coates (1983) and Collins (1988)

Despite the overwhelming literature on the semantics of the English modals and the numerous attempts by many scholars to identify their core meanings and related senses, very few studies have in fact used a corpus-based approach for the purpose of their classification. The current record that I have of such studies counts the following publications, in chronological order of publication:

  • Joos, M. (1964) The English Verb: Form and Meaning. Madison and Milwaukee
  • Lebrun,Y. (1965) "CAN" and "MAY" in present-day English. Presses Universitaires de Bruxelles
  • Ehrman,M.E. (1966) The meanings of the Modals in Present-Day American English. The Hague and Paris
  • Hermeren,L.(1978) On Modality in English: A study of the Semantics of the Modals, Lund:CWK Gleerup
  • Leech,G.N & Coates, J. (1980) Semantic Indeterminacy and the modals. In Greenbaum, S. & al. (eds) Studies in English Linguistics. The Hague: Mouton.
  • Coates, J. (1983) The Semantics of the Modal Auxiliaries. London & Canberra: Croom Helm.
  • Collins, P. (1988) The semantics of some modals in contemporary Australian English. Australian Journal of Linguistics 8, p.261-286
  • Collins, P. (2009) Modals and Quasi-Modals in English. Rodopi
The work of Peter Collins is of particular interest to me, being the most recent in time and therefore benefiting from the latest developments both in the field of modality and corpus linguistics:

"Modals and Quasi-modals in English" reports the findings of a corpus-based study of the modals and a set of semantically-related 'quasi-modals' in English. The study is the largest and most comprehensive to date in this area, and is informed by recent developments in the study of modality, including grammaticalization and recent diachronic change. The selection of the parallel corpora used, representing British, American and Australian English, was designed to facilitate the exploration of both regional and stylistic variation." (11/10/09)


In his 1988 paper, Collins proposes to investigate possible differences in the distribution and the semantics of can, could, may and might in three varieties of English, namely Australian English, British English and American English. Below, I specifically refer to Collins' 1988 paper.

In terms of theoretical framework, Collins adopts a framework based on Leech and Coates (1980) and Coates (1983), two studies that count amongst the most influential corpus-based studies on the English modals. Collins' motivations behind borrowing an already existing framework are twofold:
  1. To facilitate comparisons between results from his study and those encountered in Coates (1983)
  2. According to Collins, the framework proposed in Leech and Coates (1980) and Coates (1983) "accounts more adequately than any other so far proposed for the complexity and indeterminacy of modal meaning, and is therefore particularly useful in handling the recalcitrant examples that one is forced to confront in a corpus-based study" (p.264)
Considering that Collins' methodological and theoretical approaches are anticipated to feature in my study at one stage or another, I report here his overall framework as well as his taxonomy of the senses of MAY/CAN.

Collins' (borrowed) taxonomy includes the notions of "core" meanings, "periphery" meanings and graded degrees of membership:

A central concept is that of a fuzzy semantic set, whose members range from the "core" (representing the prototypical meaning) to the "periphery" of the set, with continually graded degrees of membership (the phenomenon of "gradience", as explored by Quirk 1965)" p.264

In the case of CAN, the core meaning of the modal is recognised to be that of ability and the periphery meaning that of possibility. More explicitly:

CAN in the sense of ability is paraphrasable as "be able to" or "be capable of". In prototypical, or "core" cases CAN refers to permanent accomplishment, and is more or less synonymous with "know how to".

Collins further notes that core ability cases are "characterised by the presence of animate, agentive subject, a dynamic main verb, and determination of the action by inherent properties of the subject referent". Generally, the more an occurrence lacks these properties, the less prototypical it becomes. In other words, depending on the number of those characteristics present in a given occurrence, the meaning of CAN will be more or less prototypical, depending on its position between the core and the periphery.

So to sum up, gradience has to do with the nature of class membership.

Collins (borrowed) theoretical framework also includes two other cases, namely ambiguity and merger which are two different sorts of indeterminacy. Ambiguity refers to cases where "it is not possible to decide from the context which of two (or more) categorically distinct meanings is the correct one" (p.265) and merger refers to cases "where there are two mutually compatible meanings which are neutralised in a certain context" (p.265)

Including both the notions of gradience and indeterminacy, the theoretical framework adopted in Collins (1988) is thus both categorical (i.e. it includes semantic categories such as ability, permission, possibility) "on the grounds that:

  • "they co-occur with distinct syntactic and semantic features" (p.266) [see paper for a listings of which syntactic and semantic features typically occur in specific semantic uses of the modals]
  • "they involve distinct paraphrases" (p.266)
  • "ambiguous cases can occur" (p.266)

and fuzzy as the framework allows for gradience.

Semantic categories for CAN in Collins (1988)
  • Root meanings, including ability (possible paraphrase: 'able to', 'capable of'), permission (possible paraphrase: 'allowed', 'permitted'), possibility (possible paraphrase: 'possible for')
Collins notes that

Root Possibility may be regarded (...) as an 'unmarked' meaning, where there is no clear indication either of an inherent property of the subject or of a restriction. The meaning is simply that the action is free to take place, that nothing in the state of the world stands in its way (...). Root Possibility is sometimes difficult to distinguish from ability because ability implies possibility. (...). Because ability CAN and permission CAN normally require a human or at least animate subject, Root Possibility is generally the only sense available when the subject is inanimate" (p.270)

Semantic categories for MAY in Collins (1988)

  • Epistemic Possibility (possible paraphrase: 'it is possible that ...')
  • Permission
  • Root Possibility

Collins notes that

Epistemic Possibility is to be distinguished from Root Possibility in terms of its commitment to the truth of the associated proposition. Whereas Epistemic Possibility expresses the likelihood of an event's occurrence, Root possibility leaves open the question of truth and falsehood, presenting the event as conceivable, as an idea (p.274)

At this point, it will be interesting to see if the theoretical framework adopted in Collins (2009) has remained the same as the one chosen in Collins (1988) or if any amendments were made. In the next few days I will investigate Collins latest framework before starting coding the senses of MAY/CAN as featuring in my data.