<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5805835432542538049</id><updated>2011-07-08T03:53:47.257-07:00</updated><title type='text'>Cognition and Interlanguage -- work in progress...</title><subtitle type='html'>This blog is the product of my ongoing PhD research. It aims to keep track of my reflections on possible semantic cognitive mechanisms at work in French-English interlanguage. My PhD project specifically focuses on the semantic domain of POSSIBILITY and uses the case of 'may' and 'can' to investigate meaning representation in the bilingual mind.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>23</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-3751932328016005632</id><published>2009-10-16T13:14:00.000-07:00</published><updated>2009-10-16T14:18:09.073-07:00</updated><title type='text'>"can" and "may" in present-day English, Yvan Lebrun, 1965</title><content type='html'>In &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/10/coding-english-modals-for-senses-leech.html"&gt;that post&lt;/a&gt;, I referred to a corpus-based study of "can" and "may" by Yvan Lebrun, namely &lt;span style="font-style: italic;"&gt;"can" and "may" in present-day English&lt;/span&gt; (1965). In this post, I briefly present -- or rather log, the scope of Lebrun's study (I will present his general conclusions in a later post):&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;the study is corpus-based and includes data from both British and American English&lt;/li&gt;&lt;li&gt;a variety of genres are featured in the data: short stories, novels, plays, newspapers, scientific texts&lt;/li&gt;&lt;li&gt;all texts featuring in the data were published between 1955 and 1962&lt;/li&gt;&lt;li&gt;the study includes occurrences of &lt;span style="font-style: italic;"&gt;might&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;could&lt;/span&gt;&lt;/li&gt;&lt;li&gt;numbers&lt;span style="font-style: italic;"&gt; &lt;/span&gt;of occurrences: &lt;/li&gt;&lt;/ul&gt;&lt;ol&gt;&lt;li&gt;Total number of occurrences, including &lt;span style="font-style: italic; font-weight: bold;"&gt;may, can, might, could&lt;/span&gt;: &lt;span style="font-weight: bold;"&gt;4765&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Total number of occurrences of &lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-style: italic;"&gt;can&lt;/span&gt;: 2024&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Total number of occurrences of&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt; &lt;span style="font-style: italic;"&gt;could: 1745&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Total number of occurrences of&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt; may: 491&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Total number of occurrences of&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt; might: 505&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;ul&gt;&lt;li&gt;Methodologically, Lebrun scanned each instance of the modals to ascertain lexical meanings. The modals were considered to convey the same lexical meaning whenever their semantical contents proved identical once such significant oppositions as "present"vs. "past" or "indicative" vs. "conditional" had been discarded (p.11) &lt;/li&gt;&lt;li&gt;In order to decide on the semantical content of the modals, Lebrun relies on the context for each instance&lt;/li&gt;&lt;li&gt;Lebrun first carries out a recognition process of all the lexical senses and then attempts to define them&lt;br /&gt;&lt;/li&gt;&lt;li&gt;the process of defining the lexical senses was first motivated by Sommerfelt's recommendation (i.e. 'the definition be able to replace the word in an ordinary sentence'). Such 'replacement' process was abandoned on the basis that:&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;blockquote&gt;"In none of the lexical meanings CAN, COULD, MIGHT, MAY can be equated with a substitutable word or phrase. In fact, each of their lexical senses is so wide that only a long series of 'synonyms' can cover it" (p.11) &lt;/blockquote&gt;&lt;br /&gt;Further,&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;"Instead of defining CAN, COULD,MIGHT, MAY by means of longish strings of juxtaposed partial equivalents and thus blurring out the internal unity of the lemma's meaning, I renounced Sommerfelt's principle and aimed at definitions that (a) embrace every facets of the sense they are meant to cover, (b) bring out the internal unity of each meaning, and (c) emphasize what the various significations of a lemma have in common." (p.11)&lt;br /&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Overall methodological strategy&lt;/span&gt; :&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ol&gt;&lt;li&gt;Based on the three recognised lexical meanings and that are common to CAN, COULD, MIGHT, MAY, calculated how often each of these three meanings were expressed by MAY rather than by CAN and by MIGHT rather than by COULD.&lt;/li&gt;&lt;li&gt;Lebrun examines cases where MAY and CAN are synonyms&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Based on the discovery that some collocations exclude the use of one of the two synonyms , &lt;span style="font-weight: bold;"&gt;Lebrun calculated the frequency of MAY relatively to CAN in kinds of clauses where either word can be used idiomatically and tried to find out if this relative frequency is independent of the context &lt;/span&gt;[my emphasis, this part of Lebrun's methodology reinforces the idea of including, in my study, two separate variables (i.e. SENSES and CONTEXT) for a treatment of the meanings of MAY and CAN, as featuring in my data. For more details on this, see &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/10/senses-vs-context-in-coding-of.html"&gt;this previous post&lt;/a&gt;).&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Further reading&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;(of early studies)&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;Lebrun, Y., &lt;span style="font-style: italic;"&gt;Can and May, A Problem of Multiple Meaning,  &lt;/span&gt;in Proceedings of the Ninth International Congress of Linguistics, 1962 (The Hague, Mouton, 1964)&lt;br /&gt;&lt;br /&gt;Ten Bruggencate, K, &lt;span style="font-style: italic;"&gt;The Use of Can and May&lt;/span&gt;, in Taalstudie 3 (1882), 94-106&lt;br /&gt;&lt;br /&gt;Wood, F., &lt;span style="font-style: italic;"&gt;May and Might in Modern English&lt;/span&gt;, Moderna Sprok 49 (1955), 247-253&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-3751932328016005632?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/3751932328016005632/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/10/can-and-may-in-present-day-english-yvan.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/3751932328016005632'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/3751932328016005632'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/10/can-and-may-in-present-day-english-yvan.html' title='&quot;can&quot; and &quot;may&quot; in present-day English, Yvan Lebrun, 1965'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-2430936874128740727</id><published>2009-10-16T11:56:00.000-07:00</published><updated>2009-10-16T13:14:02.036-07:00</updated><title type='text'>Senses vs. context in the coding of the semantics of MAY and CAN</title><content type='html'>This brief post is a continuation on the theme of the previous post where I raise the issue of coding the senses of "may" and "can" most effectively for the purpose of statistical analysis. It provides a short update on my current line of thinking re the design an optimal coding system for the meanings of "may" and "can". &lt;br /&gt;&lt;br /&gt;With regards to my project, I am at a stage now where I am about to start the annotation of the senses of "may" and "can" as featuring in my data and I am currently concerned with the issue of defining an &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;appropriate&lt;/span&gt; degree of granularity for that stage of the coding. In other words, I need to &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_1"&gt;establish&lt;/span&gt; how much of contextual information should be included in the coding of senses of the modals. Further, and with regards to the above, I'm now considering the inclusion of an extra variable (in addition to a SENSES variable) for the investigation of the behaviour of "may" and "can", namely that of CONTEXT. The motivation behind the inclusion of the CONTEXT variable would be to ultimately assess/quantify contextual weight on the semantics of the modals. Also, including a CONTEXT variable in the study would allow for the exclusion of '&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;contextuality&lt;/span&gt;' as a level of the SENSES variable. I would therefore approach, with SENSES, each occurrence of the modals according to their generally recognised "core" meanings. The advantages of dealing with the senses of the modals from the perspectives of both context and core meanings is that firstly, the number of levels included for each variable will be smaller than if only one variable was considered, which would &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_3"&gt;facilitate&lt;/span&gt; the recognition of possible patterns in the data. Also, the two variables CONTEXT and SENSES could then be tested for possible mutual interaction which ultimately could be quantified statistically. Such a design of the data would also allow me to address a whole &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_4"&gt;chunk&lt;/span&gt; of literature in the English modals that tries to assess what, semantically, belongs to the modals and what belongs to the context and the situation of utterance, and to what measure. To my knowledge, that line of work still remains to be experimentally challenged. Identifying/differentiating two meaning-related variables such as SENSES and CONTEXT could facilitate the possible inclusion of an experimental task that would aim to assess potential statistical results.  I'm currently exploring the &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_5"&gt;feasibility&lt;/span&gt; of that possibility.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-2430936874128740727?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/2430936874128740727/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/10/senses-vs-context-in-coding-of.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/2430936874128740727'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/2430936874128740727'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/10/senses-vs-context-in-coding-of.html' title='Senses vs. context in the coding of the semantics of MAY and CAN'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-3025927261617026112</id><published>2009-10-11T15:49:00.000-07:00</published><updated>2009-10-16T11:56:13.191-07:00</updated><title type='text'>Coding the English modals for senses : Leech &amp; Coates (1980), Coates (1983) and Collins (1988)</title><content type='html'>Despite the overwhelming literature on the semantics of the English modals and the numerous attempts by many scholars to identify their core meanings and related senses, very few studies have in fact used a corpus-based approach for the purpose of their classification. The current record that I have of such studies counts the following publications, in chronological order of publication:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Joos, M. (1964) &lt;span style="font-style: italic;"&gt;The English Verb: Form and Meaning. &lt;/span&gt;Madison and Milwaukee&lt;/li&gt;&lt;li&gt;Lebrun,Y. (1965) &lt;span style="font-style: italic;"&gt;"CAN" and "MAY" in present-day English.&lt;/span&gt; Presses Universitaires de Bruxelles&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Ehrman,M.E. (1966) &lt;span style="font-style: italic;"&gt;The meanings of the Modals in Present-Day American English&lt;/span&gt;. The Hague and Paris&lt;/li&gt;&lt;li&gt;Hermeren,L.(1978) &lt;span style="font-style: italic;"&gt;On Modality in English: A study of the Semantics of the Modals&lt;/span&gt;, Lund:CWK Gleerup&lt;/li&gt;&lt;li&gt;Leech,G.N &amp;amp; Coates, J. (1980) Semantic Indeterminacy and the modals. In Greenbaum, S. &amp;amp; al. (eds) &lt;span style="font-style: italic;"&gt;Studies in English Linguistics&lt;/span&gt;. The Hague: Mouton.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Coates, J. (1983) &lt;span style="font-style: italic;"&gt;The Semantics of the Modal Auxiliaries.&lt;/span&gt; London &amp;amp; Canberra: Croom Helm.&lt;/li&gt;&lt;li&gt;Collins, P. (1988) The semantics of some modals in contemporary Australian English. &lt;span style="font-style: italic;"&gt;Australian Journal of Linguistics &lt;/span&gt;8, p.261-286&lt;/li&gt;&lt;li&gt;Collins, P. (2009) &lt;span style="font-style: italic;"&gt;Modals and Quasi-Modals in English. &lt;/span&gt;Rodopi&lt;/li&gt;&lt;/ul&gt;The work of &lt;a href="http://languages.arts.unsw.edu.au/staff/staff.php?first=Peter%20Craig&amp;amp;last=Collins"&gt;Peter Collins&lt;/a&gt; is of particular interest to me, being the most recent in time and therefore benefiting from the latest developments both in the field of modality and corpus linguistics:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.infibeam.com/Books/info/peter-collins/modals-quasi-modals-english/9789042025325.html"&gt;"Modals and Quasi-modals in English" reports the findings of a corpus-based study of the modals and a set of semantically-related 'quasi-modals' in English. The study is the largest and most comprehensive to date in this area, and is informed by recent developments in the study of modality, including grammaticalization and recent diachronic change. The selection of the parallel corpora used, representing British, American and Australian English, was designed to facilitate the exploration of both regional and stylistic variation." &lt;/a&gt;(11/10/09)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In his 1988 paper, Collins proposes to investigate possible differences in the distribution and the semantics of &lt;span style="font-style: italic;"&gt;can, could, may &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;might&lt;/span&gt; in three varieties of English, namely Australian English, British English and American English. Below, I specifically refer to Collins' 1988 paper.&lt;br /&gt;&lt;br /&gt;In terms of theoretical framework, Collins adopts a framework based on Leech and Coates (1980) and Coates (1983), two studies that count amongst the most influential corpus-based studies on the English modals. Collins' motivations behind borrowing an already existing framework are twofold:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;To facilitate comparisons between results from his study and those encountered in Coates (1983)&lt;/li&gt;&lt;li&gt;According to Collins, the framework proposed in Leech and Coates (1980) and Coates (1983) "accounts more adequately than any other so far proposed for the complexity and indeterminacy of modal meaning, and is therefore particularly useful in handling the recalcitrant examples that one is forced to confront in a corpus-based study" (p.264)&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;Considering that Collins' methodological and theoretical approaches are anticipated to feature in my study at one stage or another, I report here his overall framework as well as his taxonomy of the senses of MAY/CAN.&lt;br /&gt;&lt;br /&gt;Collins' (borrowed) taxonomy includes the notions of "core" meanings, "periphery" meanings and graded degrees of membership:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;A central concept is that of a fuzzy semantic set, whose members range from the "&lt;span style="font-weight: bold;"&gt;core&lt;/span&gt;" (representing the prototypical meaning) to the "&lt;span style="font-weight: bold;"&gt;periphery&lt;/span&gt;" of the set, with continually graded degrees of membership (the phenomenon of "&lt;span style="font-weight: bold;"&gt;gradience&lt;/span&gt;", as explored by Quirk 1965)" p.264&lt;br /&gt;&lt;br /&gt;&lt;/blockquote&gt;In the case of CAN, the core meaning of the modal is recognised to be that of &lt;span style="font-style: italic;"&gt;ability&lt;/span&gt; and the periphery meaning that of &lt;span style="font-style: italic;"&gt;possibility&lt;/span&gt;. More explicitly:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;CAN in the sense of ability is paraphrasable as "be able to" or "be capable of". In prototypical, or "core" cases CAN refers to permanent accomplishment, and is more or less synonymous with "know how to".&lt;/blockquote&gt;&lt;br /&gt;Collins further notes that core &lt;span style="font-style: italic;"&gt;ability&lt;/span&gt; cases are "characterised by the presence of animate, agentive subject, a dynamic main verb, and determination of the action by inherent properties of the subject referent". Generally, the more an occurrence lacks these properties, the less prototypical it becomes. In other words, depending on the number of those characteristics present in a given occurrence, the meaning of CAN will be more or less prototypical, depending on its position between the core and the periphery.&lt;br /&gt;&lt;br /&gt;So to sum up, &lt;span style="font-style: italic;"&gt;gradience&lt;/span&gt; has to do with the nature of class membership.&lt;br /&gt;&lt;br /&gt;Collins (borrowed) theoretical framework also includes two other cases, namely &lt;span style="font-style: italic;"&gt;ambiguity&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;merger&lt;/span&gt; which are two different sorts of &lt;span style="font-style: italic; font-weight: bold;"&gt;indeterminacy&lt;/span&gt;. &lt;span style="font-style: italic; font-weight: bold;"&gt;Ambiguity&lt;/span&gt; refers to cases where "it is not possible to decide from the context which of two (or more) categorically distinct meanings is the correct one" (p.265) and &lt;span style="font-style: italic; font-weight: bold;"&gt;merger&lt;/span&gt; refers to cases "where there are two mutually compatible meanings which are neutralised in a certain context" (p.265)&lt;br /&gt;&lt;br /&gt;Including both the notions of &lt;span style="font-style: italic;"&gt;gradience&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;indeterminacy&lt;/span&gt;, the theoretical framework adopted in Collins (1988) is thus both &lt;span style="color: rgb(153, 0, 0);"&gt;categorical&lt;/span&gt; (i.e. it includes semantic categories such as &lt;span style="font-style: italic;"&gt;ability, permission, possibility&lt;/span&gt;) "on the grounds that:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;"they co-occur with distinct syntactic and semantic features" (p.266) [see paper for a listings of which syntactic and semantic features typically occur in specific semantic uses of the modals]&lt;br /&gt;&lt;/li&gt;&lt;li&gt;"they involve distinct paraphrases" (p.266)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;"ambiguous cases can occur" (p.266)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;and &lt;span style="color: rgb(153, 0, 0);"&gt;fuzzy&lt;/span&gt; as the framework allows for &lt;span style="font-style: italic;"&gt;gradience&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Semantic categories for &lt;span style="font-weight: bold;"&gt;CAN&lt;/span&gt; in Collins (1988)&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Root meanings&lt;/span&gt;, including &lt;span style="font-style: italic; font-weight: bold;"&gt;ability&lt;/span&gt; (possible paraphrase: 'able to', 'capable of')&lt;span style="font-style: italic;"&gt;, &lt;span style="font-weight: bold;"&gt;permission&lt;/span&gt; &lt;/span&gt;(possible paraphrase: 'allowed',  'permitted'&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;)&lt;/span&gt;&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;possibility&lt;/span&gt; &lt;/span&gt;(possible paraphrase: 'possible for')&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Collins notes that&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Root Possibility may be regarded (...) as an 'unmarked' meaning, where there is no clear indication either of an inherent property of the subject or of a restriction. The meaning is simply that the action is free to take place, that nothing in the state of the world stands in its way (...). Root Possibility is sometimes difficult to distinguish from &lt;span style="font-style: italic;"&gt;ability&lt;/span&gt; because &lt;span style="font-style: italic;"&gt;ability&lt;/span&gt; implies &lt;span style="font-style: italic;"&gt;possibility&lt;/span&gt;. (...). Because &lt;span style="font-style: italic;"&gt;ability&lt;/span&gt; CAN and &lt;span style="font-style: italic;"&gt;permission&lt;/span&gt; CAN normally require a human or at least animate subject, Root Possibility is generally the only sense available when the subject is inanimate" (p.270) &lt;/blockquote&gt;&lt;br /&gt;Semantic categories for &lt;span style="font-weight: bold;"&gt;MAY&lt;/span&gt; in Collins (1988)&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Epistemic Possibility&lt;/span&gt; (possible paraphrase: 'it is possible that ...')&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Permission&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Root Possibility&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Collins notes that&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Epistemic Possibility is to be distinguished from Root Possibility in terms of its commitment to the truth of the associated proposition. Whereas Epistemic Possibility expresses the likelihood of an event's occurrence, Root possibility leaves open the question of truth and falsehood, presenting the event as conceivable, as an idea (p.274)&lt;/blockquote&gt;&lt;br /&gt;At this point, it will be interesting to see if the theoretical framework adopted in Collins (2009) has remained the same as the one chosen in Collins (1988) or if any amendments were made. In the next few days I will investigate Collins latest framework before starting coding the senses of MAY/CAN as featuring in my data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-3025927261617026112?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/3025927261617026112/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/10/coding-english-modals-for-senses-leech.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/3025927261617026112'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/3025927261617026112'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/10/coding-english-modals-for-senses-leech.html' title='Coding the English modals for senses : Leech &amp; Coates (1980), Coates (1983) and Collins (1988)'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-6262050885191964163</id><published>2009-09-26T22:21:00.000-07:00</published><updated>2009-09-27T09:05:41.855-07:00</updated><title type='text'>Polysemy, syntax, and variation -- a usage-based method for Cognitive Semantics (contribution by Dylan Glynn, 2009)</title><content type='html'>Hello again, after three months of quietude during which I have been exclusively concentrating on setting up my data for statistical analysis. I have also recently temporarily relocated to &lt;a href="http://www.linguistics.ucsb.edu/index.html"&gt;UCSB, Santa Barbara&lt;/a&gt; from where I will continue to work on my project as a visiting scholar as well as attend Stefan Gries' &lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/teaching/overview-ucsb.html"&gt;courses in statistics for linguists with R. &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This brief post acknowledges Dylan Glynn's contribution to &lt;a href="http://www.benjamins.com/cgi-bin/t_bookview.cgi?bookid=HCP%2024"&gt;&lt;span style="font-style: italic;"&gt;New Directions in Cognitive Linguistics&lt;/span&gt; (2009)&lt;/a&gt; entitled 'Polysemy, syntax, and variation --  a usage-based method for Cognitive Semantics'. Also, this post mainly deals with the issue of polysemy in relation to Quantitative Multifactorial method and does not cover Glynn's chosen statistical technique of Correspondence Analysis proper.&lt;br /&gt;&lt;br /&gt;In the interest of time, this post does not engage in any discussion that could arise from Glynn's contribution but rather serves as a personal log of potentially useful quotations and points that I will investigate at a later stage.&lt;br /&gt;&lt;br /&gt;Glynn's contribution provides a thorough overview of the treatment of polysemy in Cognitive Linguistics. Glynn's overall premise in relation to polysemy is:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;blockquote&gt;to conserve the network model but to complement [it] with another method: a corpus-driven quantified and multifactorial method (p.76)&lt;/blockquote&gt;Further, Glynn points out that with such multifactorial method inevitably requires to approach polysemy in a non-theoretical fashion:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Such an approach employs a kind of componentional analysis that identifies clusters of features across large numbers of speech events. In other words, rather than analyse the possible meanings of a lexeme, a polysemic network should 'fall out' from an analysis that identifies clusters of the cognitive-functional features of a lexeme's usage. These features do not in any way resemble those of the Structuralist componentional analyses, since they are not based on a hypothetical semantic system, but describe instances of real language usage and are based upon encyclopaedic semantics of that language use in context (p.76)&lt;/blockquote&gt;&lt;br /&gt;In relation to the syntagmatic and paradigmatic dimensions of polysemy, Glynn recognises that the interaction between the schematic and/or morpho-syntactic semantics and lexical semantics is yet to be established. Within a dichotomous CL context where 'one position is that syntactic semantics override lexical semantics' and the other position is that 'there exists a complex interaction between all the various semantic structures in all degrees of schematicity', Glynn makes the working assumption that&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;syntactic variation affects a polysemy network and that its effect cannot be satisfactorily predicted by positing meaning structure associated with grammatical forms and classes &lt;span style="font-style: italic;"&gt;a priori&lt;/span&gt;. We must therefore account for this variable as an integral part of semantic description. (...) It means that for a given lemma, or root lexeme, there will be semantic variation depending on its syntagmatic context, in other words, its collocation, grammatical, and even tense or case will necessarily affect the meaning of the item" (p.82)&lt;br /&gt;&lt;br /&gt;&lt;/blockquote&gt;In his approach to polysemy, Glynn treats each lexeme 'as a onomasiological field, or set of parasynonyms' (p.82).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Further reading&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;Zelinsky-Wibbelt, C. (1986). &lt;a href="http://www.aclweb.org/anthology/C/C86/C86-1002.pdf"&gt;An empirically based approach towards a system of semantic features&lt;/a&gt;. &lt;span style="font-style: italic;"&gt;Proceedings of the 11th International Conference on Computational Linguistics &lt;/span&gt;11:7-12&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-6262050885191964163?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/6262050885191964163/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/09/polysemy-syntax-and-variation-usage.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/6262050885191964163'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/6262050885191964163'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/09/polysemy-syntax-and-variation-usage.html' title='Polysemy, syntax, and variation -- a usage-based method for Cognitive Semantics (contribution by Dylan Glynn, 2009)'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-1749292016936746141</id><published>2009-06-18T05:41:00.000-07:00</published><updated>2009-06-18T07:55:32.856-07:00</updated><title type='text'>Profile-based methodology for the comparison of language varieties</title><content type='html'>&lt;div style="text-align: justify;"&gt;In this post, I would like to  briefly point out the usefulness of 'profiling' methods for corpus data investigation. For that purpose I specifically refer to a paper entitled &lt;a href="http://wwwling.arts.kuleuven.be/qlvl/PDFPublications/03Profilebased.pdf"&gt;&lt;span style="font-style: italic;"&gt;Profile-based linguistic uniformity as a generic method for comparing language varieties&lt;/span&gt; (2003)&lt;/a&gt;, authored by Dirk Speelman, Stefan Grondelaers and Dirk Geerearst. The authors' paper is inspired by studies in language varieties and research methods currently used in dialectometry. For my own purposes, it is interesting to note that the authors make a case for the validity of profile-based linguistic methodology for corpus-data investigation as  the annotation process of my data will include profiling occurrences of &lt;span style="font-style: italic;"&gt;may, can&lt;/span&gt; and the lemma &lt;span style="font-style: italic;"&gt;pouvoir&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;In their paper, the authors present "the 'profile-based uniformity', a method designed to compare language varieties on the basis of a wide range of potentially heterogeneous linguistic variables" (abst.) The aim of the authors is to show that profiling investigated lexical items contributes to the identification of global dissimilarities between language varieties on the basis of individual variables which are ultimately summarised in global dissimilarities. Such process allows language varieties to be clustered or charted  via the use various multivariate techniques.&lt;br /&gt;&lt;br /&gt;Unlike standard methods of corpus investigation, namely frequency counts, the profile-based method "implies usage-based, but add another criterion. The additional criterion is that the frequency of a word or a construction is not treated as an autonomous piece of information, but is always investigated  in the context of a profile."(p.11)&lt;br /&gt;&lt;br /&gt;The profile-based approach assumes that mere frequency differences in a corpus contribute to the identification of differences between language varieties. According to the authors, the profile-based approach presents two advantages: the avoidance of thematic bias and the avoidance of referential ambiguity.&lt;br /&gt;&lt;br /&gt;For the purpose of my project, the author's paper generally supports my methodological choice to semantically profile the occurrences of &lt;span style="font-style: italic;"&gt;may, can  &lt;/span&gt;and lemma &lt;span style="font-style: italic;"&gt;pouvoir&lt;/span&gt; as found in my data.  However, in their case study (see paper on p.18) the authors choose to take an onomasiological perspective (i.e. to use a concept as a starting point, and then investigate which words are associated with that concept). My project, on the other hand, takes on the opposite perspective, namely the semasiological approach which in the first instance considers individual words and looks at the semantic information that may be associated with those words. Inevitably, such difference in approaching the word/sense/concept interface leads to differing acceptations of the term 'profile' as both onomasiological and semasiological perspectives have different starting points. In that respect, the authors consider '[a] profile for a particular concept or linguistic function in a particular language variety [to be] the set of alternative linguistic means used to designate that concept or linguistic function in that language variety, together with their frequencies" (p.5)&lt;br /&gt;&lt;br /&gt;For the purpose of my project, the term &lt;span style="font-style: italic;"&gt;profile&lt;/span&gt; necessarily needs to be defined at word level and needs to incorporate the elements of &lt;span style="font-style: italic;"&gt;sense &lt;/span&gt;and&lt;span style="font-style: italic;"&gt; morpho-syntactic&lt;/span&gt; &lt;span style="font-style: italic;"&gt;information&lt;/span&gt;. In that regard, the Behavioural Profile methodology proposed by Gries and Divjak in &lt;a href="http://www.blogger.com/Quantitative%20approaches%20in%20usage-based%20cognitive%20semantics:%20myths,%20erroneous%20assumptions,%20and%20a%20proposal"&gt;&lt;span style="font-style: italic;"&gt;Quantitative approaches in usage-based cognitve semantics: myths, erroneous assumptions, and a proposal&lt;/span&gt; &lt;/a&gt;(in press) is an appropriate methodology for my project. Broadly, the BP methodology involves the identification of both semantic and morpho-syntactic features characteristic of the investigated lexical item, as found in the data. Ultimately, these identified features are used as linguistic variables and are investigated statistically. In the BP model, the identified features are referred to and processed as ID tags, each one of which contributes to the profiling of the lexical item under investigation.&lt;br /&gt;&lt;br /&gt;To sum up, Speelman, Grondelaers and Geeraerst's paper provides me here not only with the opportunity to reflect on the notion of 'profiling' in the context of corpus-data investigation but  also with the opportunity to consider the notion in the perspective of my own study.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-1749292016936746141?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/1749292016936746141/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/profile-based-methodology-for.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/1749292016936746141'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/1749292016936746141'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/profile-based-methodology-for.html' title='Profile-based methodology for the comparison of language varieties'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-7533860093304887428</id><published>2009-06-16T10:57:00.000-07:00</published><updated>2009-06-17T07:59:53.322-07:00</updated><title type='text'>Comparing exploratory statistical techniques for semantic descriptions</title><content type='html'>&lt;div style="text-align: justify;"&gt;As Glynn, Geeraerst and Speelman state in &lt;a href="http://wwwling.arts.kuleuven.be/qlvl/PDFPublications/Corpus_Cognitive_Semantics.pdf"&gt;&lt;span style="font-style: italic;"&gt;Testing the hypothesis. Confirmatory statistical techniques for multifactorial data in Cognitive Semantics&lt;/span&gt;&lt;/a&gt; (paper presented at the 10th International Cognitive Linguistics Conference in Cracow in July 2007):&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;blockquote&gt;Current trends in the study of polysemy have focused on exploratory techniques such as Cluster Analysis and Correspondence Analysis. (abst.)&lt;br /&gt;&lt;/blockquote&gt;&lt;div style="text-align: justify;"&gt;Broadly, exploratory techniques "identify and visualise patterns in the data". This technique "does not permit inferences about the language, only the sample, or dataset, investigated" (abst.)&lt;br /&gt;&lt;br /&gt;On the occasion of the Quantitative Investigations in Theoretical Linguistics 3 event in Helsinki on June 3rd 2008, Dylan Glynn presented a comparison of both the Cluster and Correspondence Analysis statistical methods for the purpose of semantic description (&lt;a href="http://www.ling.helsinki.fi/sky/tapahtumat/qitl/Abstracts/Glynn.pdf"&gt;&lt;span style="font-style: italic;"&gt;Clusters and Correspondences. A comparison of two exploratory statistical techniques for semantic description&lt;/span&gt;)&lt;/a&gt; [the powerpoint presentation for this paper can be found &lt;a href="http://www.ling.helsinki.fi/sky/tapahtumat/qitl/Presentations/Glynn.pdf"&gt;here&lt;/a&gt;].&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Over the past fifteen years, corpus-based research in the field of Cognitive Linguistics has produced a number of studies demonstrating the wide use of both statistical techniques.  In his paper, Glynn compares both techniques on the grounds on quality/accuracy of graphic representation of the data and accuracy of relative associations of variables as revealed in the data. The assessment of the accuracy of relative associations of variables for each statistical method is based on a regression analysis which takes into consideration "the relationship between the mean value of a random variable and the corresponding values of one or more variables" (OED).&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;For the purpose of his investigation, Glynn carried out a case study examining the semantic structure of the lexeme &lt;span style="font-style: italic;"&gt;annoy&lt;/span&gt; in comparison with &lt;span style="font-style: italic;"&gt;hassle&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;bother&lt;/span&gt; in a large non-commercial corpus of English specified for the American vs. British English regional difference (for the purpose of that case study Glynn identified the working variables of morpho-syntax and Frame Semantic argument structure). Glynn points out that the Cluster Analysis  and Multivariate Correspondence Analysis methods involve different types of graphic representations which in turn, present a number of shortcomings:&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;&lt;blockquote&gt;One important difference between the two techniques is that Cluster Analysis is primarily designed to present its results in the form of dendograms where Correspondence Analysis relies on scatter plots. The dendograms of HCA offer clear representations of both the groupings of features and the relative degree of correlation of those features. (...) The principle shortcoming of this representation is that it gives the false impression that all the data falls into groups, where in fact this may not be the case. (...) The scatter plots of Correspondence Analysis, although at times difficult to interpret, offer a much more "analogue" representation of correlation. (...) [T]he representation of the plot is (...) much more approximative than the dendogram. (p.2)&lt;br /&gt;&lt;/blockquote&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Through his case study, Glynn confirms the usefulness of both statistical methods as exploratory techniques. He also points out the possibility of unreliability of both methods to accurately process complex multivariate data and cautions analysts about the use of those methods for the specific purpose of confirmatory analysis. However, in the context of exploratory analysis, "the contrast in the result of the complicated analysis across the three lexemes [&lt;span style="font-style: italic;"&gt;annoy,hassle&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;bother&lt;/span&gt;] suggests that MCA [Multivariate Correspondence Analysis] is better suited to a truly multivariate exploratory research" (p.2)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;With regard to my project, Glynn's paper raises  a couple of points: &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;i) the need to decide on the statistical nature of my overall project analysis -- exploratory, confirmatory or perhaps both possibly following a comparative format (?)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;ii) the urgency to clearly identify the number and the nature of the variables through which I intend to investigate my data sets as those will be influential in the choice of statistical method -- at exploratory stage at least. &lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-7533860093304887428?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/7533860093304887428/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/comparing-exploratory-statistical.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/7533860093304887428'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/7533860093304887428'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/comparing-exploratory-statistical.html' title='Comparing exploratory statistical techniques for semantic descriptions'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-6163248771629389918</id><published>2009-06-16T07:35:00.000-07:00</published><updated>2009-06-16T07:43:31.554-07:00</updated><title type='text'>Statistical techniques for an optimal treatment of polysemy</title><content type='html'>&lt;div style="text-align: justify;"&gt;In this &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/06/dylan-glynn-on-theme-of-data-driven.html"&gt;post&lt;/a&gt;, I introduced the work of  &lt;a href="https://perswww.kuleuven.be/%7Eu0049977/"&gt;Dylan Glynn&lt;/a&gt; who is broadly concerned with developing methodology for corpus-data investigation. Glynn adheres to the Cognitive Linguistics/Semantics framework. Of interest here is a research project he contributed to with the collaboration of &lt;a href="http://wwwling.arts.kuleuven.be/qlvl/dirkg.htm"&gt;Dirk Geeraerts&lt;/a&gt; and &lt;a href="http://wwwling.arts.kuleuven.be/qlvl/dirks.htm"&gt;Dirk Speelman&lt;/a&gt;, and concerned with the assessment of the efficacity of two statistical techniques, namely exploratary vs. confirmatory techniques of statistical analysis. Glynn, Geeraerst and Speelam presented the results of their study at the 10th International Cognitive Linguistics Conference in Cracow in July 2007, in a paper entitled &lt;a href="http://wwwling.arts.kuleuven.be/qlvl/PDFPublications/Corpus_Cognitive_Semantics.pdf"&gt;&lt;span style="font-style: italic;"&gt;Testing the hypothesis. Confirmatory statistical techniques for multifactorial data in Cognitive Semantics&lt;/span&gt;&lt;/a&gt; [the abstract is accessible from page 11 of the link]. For the purpose of this post I can unfortunately only summarise the content of that paper based on its abstract. As I do not have access to the full paper, I am not in a position to critically assess the arguments proposed by Glynn, Geeraearst and Speelman.&lt;br /&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;According to the authors, the two main -- and actively currently used by Cognitive Linguists, statistical techniques for corpus-data investigation are i) &lt;span style="font-weight: bold;"&gt;exploratory techniques &lt;/span&gt;(i.e. the Cluster Analysis, used in Gries 2006; the Correspondence Analysis, used in forthcoming Glynn) and &lt;span style="font-weight: bold;"&gt;confirmatory techniques &lt;/span&gt;(i.e. Linear Discriminant Analysis, used in Gries 2003 and Wulff 2004; Logistic Regression Analysis, used in Heylen 2005 and De Sutter &amp;amp; al. in press)&lt;br /&gt;&lt;/div&gt; &lt;/div&gt;&lt;br /&gt;The authors define the aim of each technique as follows:&lt;br /&gt;&lt;br /&gt;   &lt;div style="text-align: justify;"&gt;&lt;blockquote&gt;The goal of (...) &lt;span style="font-weight: bold;"&gt;exploratory&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;statistics&lt;/span&gt; is to identify and visualize patterns in the data. These patterns are argued to represent patterns of usage (...). Exploratory statistics analysis does not permit inferences about the language, only the sample, or dataset, investigated. However, in &lt;span style="font-weight: bold;"&gt;confirmatory statistics&lt;/span&gt;, inference is made from the sample to the population. In other words, one claims that what is seen in the data is representative of the language generally. (abst.)&lt;/blockquote&gt;&lt;br /&gt; &lt;br /&gt;   &lt;div style="text-align: justify;"&gt;In the light of my own project, the author's study is of particular relevance because it identifies the case of polysemy, as an object of investigation, as requiring specific methodological attention:&lt;br /&gt;  &lt;/div&gt;    &lt;blockquote&gt;Current trends in the study of polysemy have focused on exploratory techniques.&lt;/blockquote&gt;However,&lt;br /&gt;&lt;blockquote&gt;[t]he importance of these techniques notwithstanding, the cognitive framework needs to deepen its use of quantitative research especially through the use of confirmatory multivariate statistics.&lt;br /&gt;  &lt;/blockquote&gt; Further,&lt;br /&gt; &lt;br /&gt;&lt;blockquote&gt;Within Cognitive Linguistics, [Linear Discriminant Analysis technique and Logistic Regression Analysis technique] have been successfully used to capture the various conceptual, formal, and extralinguistics factors that lead to the use of one construction over another. &lt;span style="font-weight: bold;"&gt;However, the study of polysemy differs at this point&lt;/span&gt;. Instead of examining the variables that effect the use of one parasynonymous forms to another, &lt;span style="font-weight: bold;"&gt;we are examining the interaction  of a range of formal variables (the lemma and its syntagmatic and inflectional variation), semantic variables, and extralinguistic variables, in the search of correlations across all of these&lt;/span&gt;. One possible multivariate technique for this type of data is Log-Linear Modelling. (abst.) &lt;/blockquote&gt;      In the course of their study, the authors identified complex sets of correlations between formal and semantic variables through exploratory studies and then modelled these correlations using The Log-Linear Analysis technique.&lt;br /&gt; &lt;br /&gt;&lt;span style="font-weight: bold;"&gt;At this point, Glynn, Geeraerst and Speelman's paper calls for a comparative study of specific polysemous lexical items contextualised in different language varieties and using, in turn, both the Cluster Analysis exploratory technique and the Log-Linear Modelling confirmatory technique. Such study would contribute to the identification of a possible optimal statistical technique for the investigation of corpus-data. &lt;/span&gt;&lt;br /&gt;  &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-6163248771629389918?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/6163248771629389918/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/statistical-techniques-for-optimal.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/6163248771629389918'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/6163248771629389918'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/statistical-techniques-for-optimal.html' title='Statistical techniques for an optimal treatment of polysemy'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-6369841845879145396</id><published>2009-06-16T02:53:00.000-07:00</published><updated>2009-06-16T03:17:12.274-07:00</updated><title type='text'>The place of Cognitive Linguistics on the French linguistics scene</title><content type='html'>&lt;div style="text-align: justify;"&gt;As previously described &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/02/icle-and-locness-welcome-codif-latest.html"&gt;here&lt;/a&gt;, part of my project involves the investigation of the lemma &lt;span style="font-style: italic;"&gt;pouvoir&lt;/span&gt; in a native French subdata set. Analyses of quantitative results of such investigation will be carried out according to the Cognitive Linguistics (CL) framework. Carrying out a literature review including the polysemy of &lt;span style="font-style: italic;"&gt;pouvoir&lt;/span&gt; in relation to the CL framework has, so far, proved a little tricky. This post provides a little bit of background on the place of CL in France and in French linguistics generally. At the Congres Mondial de Linguistique Francaise in Paris in July 2008, &lt;a href="http://wwwling.arts.kuleuven.be/qlvl/dirkg.htm"&gt;Dirk Geeraerst&lt;/a&gt; discussed the situation of CL in the context of French linguistics in a very informative paper entitled &lt;a href="http://www.linguistiquefrancaise.org/index.php?option=article&amp;amp;access=standard&amp;amp;Itemid=129&amp;amp;url=/articles/cmlf/pdf/2008/01/cmlf08310.pdf"&gt;&lt;span style="font-style: italic;"&gt;La Reception de la Linguistique Cognitive dans la Linguistique du Francais&lt;/span&gt;&lt;/a&gt;. Bonne lecture!&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-6369841845879145396?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/6369841845879145396/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/place-of-cognitive-linguistics-on.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/6369841845879145396'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/6369841845879145396'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/place-of-cognitive-linguistics-on.html' title='The place of Cognitive Linguistics on the French linguistics scene'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-6244767550882299518</id><published>2009-06-15T06:24:00.001-07:00</published><updated>2009-06-15T09:36:03.636-07:00</updated><title type='text'>Dylan Glynn on the theme of data-driven methodology in Cognitive Linguistics and its usefulness for the treatment of polysemy</title><content type='html'>&lt;div style="text-align: justify;"&gt;In this post, I would like to bring attention to the work of &lt;a href="https://perswww.kuleuven.be/%7Eu0049977/"&gt;Dylan Glynn&lt;/a&gt; whose on-going research is concerned with bridging the empirical and the cognitive. Here is how Glynn describes his own work:&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;table style="text-align: left; margin-left: 0px; margin-right: 0px;" width="551" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr valign="top"&gt;&lt;td rowspan="2" height="146"&gt;&lt;p class="style13"&gt;&lt;/p&gt;&lt;div style="text-align: justify;"&gt;&lt;blockquote&gt;&lt;p class="style13"&gt;The focus of my work is the development of methodology within the theoretical framework of Cognitive Linguistics. This school of thought imposes the minimal theoretical assumptions &lt;span class="style10"&gt;upon&lt;/span&gt; its model of language. It is for this reason that it is best placed to properly capture the complexity of language in a holistic manner. &lt;/p&gt;    &lt;p class="style13 f-lp"&gt;In methodological terms, I am most interested in finding ways &lt;span class="style10"&gt;to&lt;/span&gt; capture the multidimensional nature of language structure, from prosody and morphology through to semantics and &lt;span class="style10"&gt;culture&lt;/span&gt;. Specifically, I concentrate on the semantics of Grammatical Constructions, the polysemy and synonymy of lexis, iconicity in morphology, and the interaction of grammar, pragmatics, and metaphor-metonymy.(https://perswww.kuleuven.be/~u0049977/ling.html) [accessed 15/06/09]&lt;br /&gt;&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;&lt;p class="style13 f-lp"&gt;&lt;/p&gt;    &lt;/td&gt;    &lt;td height="145"&gt;&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;As part of a talk given at the 10th International Cognitive Linguistics Conference in July 2007 at the University of Cracow, entitled &lt;span style="font-style: italic;"&gt; &lt;a href="http://wwwling.arts.kuleuven.ac.be/qlvl/PDFPublications/Corpus_Cognitive_Semantics.pdf"&gt;Usage-Based Cognitive Semantics: A Quantitative Approach&lt;/a&gt;&lt;/span&gt;, Glynn makes a case for the quantitative treatment of lexical and constructional semantics and claims that "[c]orpus data respects the complexity of language and, if treated in sufficiently large quantities, enables generalisations about language structure that other methods cannot" (abst.).  Further, "u&lt;span style="font-weight: bold;"&gt;sage-based quantitative methodology (...) facilitates attempts to reveal the interaction between the different parameters of language simultaneously&lt;/span&gt;" (abst.) [my emphasis]&lt;br /&gt;&lt;br /&gt;During his opening talk of the theme session &lt;a href="http://wwwling.arts.kuleuven.ac.be/qlvl/EmpiricalEvidence.pdf"&gt;&lt;span style="font-style: italic;"&gt;Empirical Evidence. Converging approaches to constructional meaning&lt;/span&gt;&lt;/a&gt; to the Third International Conference of the German Cognitive Linguistics Association on September 25th-27th 2008, Glynn points out the fast growing interest in empirical cognitive research, particularly in the field of Cognitive Semantics:&lt;br /&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div style="text-align: justify;"&gt;Cognitive Linguistics has recently witnessed a new and healthy concern for empirical methodology. Using such methods, important in-roads have been made in the study of near-synonymy, syntactic alternation, syntactic variation and lexical licensing.&lt;br /&gt;&lt;/div&gt;&lt;/blockquote&gt;Further,&lt;br /&gt;&lt;div style="text-align: justify;"&gt;&lt;blockquote&gt;Empirical methods, and methodology generally, are one of the most important concerns for any descriptive science and the recent blossoming of research in this respect in Cognitive Linguistics can be seen as a maturing of the field. A range of recent anthologies on the issue, including Gries &amp;amp; Stefanowitsch (2006), Stefanowitsch &amp;amp; Gries (2006), Gonzales-Marquez &amp;amp; al. (2007), Andor &amp;amp; Pelyvas (forth.), Newman &amp;amp; Rice (forth.), and Glynn &amp;amp; Fischer (in preparation), can be seen as testimony to the importance attached to this issue. Despite the advances in this regard, how the different methods and the results they produce inform each other remains largely ill-understood. Although this question of how elicited, experimental and found data relate has been addressed in the work of Schonefeld (1999,2001), Gries &amp;amp; al. (2005, in press), Goldberg (2006), Arppe &amp;amp; Jarvikivi (in press), Gilquin (in press), Divjak (forth.), and Wiechmann (subm.), it warrants further investigation.&lt;br /&gt;&lt;/blockquote&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;The fast development of data-driven investigation methods within the field of Cognitive Linguistics is further pointed out by Glynn in his opening talk to the theme session &lt;a href="http://wwwling.arts.kuleuven.ac.be/qlvl/EmpiricalApproaches.pdf"&gt;&lt;span style="font-style: italic;"&gt;Empirical Approaches to Polysemy and Synonymy&lt;/span&gt;&lt;/a&gt;, at the Cognitive and Functional Perspectives on Dynamic Tendencies in Languages event, on May 29th-June 1 2008. In that particular address, Glynn presents empirical cognitive approaches as a way to address existing issues in the cognitive treatment of polysemy:&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;blockquote&gt;Within the cognitive tradition, both the study of polysemy and synonymy have rich traditions. Brugman (1983) and Vandeloise (1984) began the study of sense variation in spatial prepositions that evolved into the radial network model applied to a wide range of linguistic forms, especially grammatical cases and spatial prepositions (Janda 1993, Cuyckens 1995). (...) Despite the success of this research, studies such as Sandra &amp;amp; Rice (1995) and Tyler and Evans (2001) identified serious shortcomings. In light of this, &lt;span style="font-weight: bold;"&gt;empirical cognitive approaches to semantic structure do not question the validity of the radial network model, but seek to develop methods for testing proposed semantic variation and relation&lt;/span&gt;. (abs.) [my emphasis]&lt;br /&gt;&lt;/blockquote&gt;&lt;/div&gt;In relation to my project (which includes a Cognitive Linguistics treatment of polysemous &lt;span style="font-style: italic;"&gt;may,can &lt;/span&gt;and&lt;span style="font-style: italic;"&gt; pouvoir&lt;/span&gt; via an investigation of corpus data), it is with much excitement that I begin to explore the work if Dylan Glynn.&lt;br /&gt;&lt;br /&gt;Below is a selected bibliography of Glynn's work and that will be of interest for my research (unfortunately, several references are still in press or in preparation!):&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Glynn, D. In press (6pp). Multifactorial Polysemy. Form and meaning variation in the complex web of usage. R. Caballero (ed.). &lt;span class="style29"&gt;Lexicología y lexicografía. Proceedings of the XXVI AESLA Conference&lt;/span&gt;. Almería: University of Almería Press.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Glynn, D. &lt;span class="style41"&gt;2008&lt;/span&gt;. Polysemy, Syntax, and Variation. A usage-based method for Cognitive Semantics. V. Evans &amp;amp; S. Pourcel (eds). &lt;span class="style29"&gt;New Directions in Cognitive Linguistics.&lt;/span&gt; Amsterdam: John Benjamins. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Glynn, D. 2006. Conceptual Metonymy - A study in cognitive models, reference-points, and domain boundaries. &lt;span class="style29"&gt;Poznan Studies in Contemporary Linguistics &lt;/span&gt;42: 85-102.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Glynn, D. 2006. Cognitive Semantics and Lexical Variation. Why we need a quantitative approach to conceptual structure. O. Prokhorova (ed.). &lt;span class="style36"&gt;Edinstvo sistemnogo i functionalnogo andliza yazykov &lt;/span&gt;(Systemic and Functional Analysis of Language). 53-60. Belgorod: Belgorod University Press.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;In preparation&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Glynn, D., Multidimensional Polysemy. A case study in usage-based cognitive semantics. &lt;span class="style41"&gt;Will be submitted to &lt;/span&gt;&lt;span class="style29"&gt;Cognitive Linguistics.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Glynn, D., Geeraerts, D., &amp;amp; Speelman, D. Testing the hypothesis. Confirmatory statistical techniques for multifactorial data in Cognitive Semantics. D. Glynn &amp;amp; K. Fischer (ed.). &lt;span class="style29"&gt;Usage-Based Cognitive Semantics. Corpus-Driven methods for the study of meaning&lt;/span&gt;. Berlin: Mouton de Gruyter.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Glynn, D. &amp;amp; Fischer, K. (eds). U&lt;span class="style29"&gt;sage-Based Cognitive Semantics. Corpus-Driven methods for the study of meaning&lt;/span&gt;. Berlin: Mouton de Gruyter.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Glynn, D. M&lt;span class="style29"&gt;apping Meaning. Toward a usage-based methodology in Cognitive Semantics&lt;/span&gt;. &lt;span class="style41"&gt;Will be submitted to &lt;/span&gt;Mouton de Gruyter.&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-6244767550882299518?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/6244767550882299518/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/dylan-glynn-on-theme-of-data-driven.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/6244767550882299518'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/6244767550882299518'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/dylan-glynn-on-theme-of-data-driven.html' title='Dylan Glynn on the theme of data-driven methodology in Cognitive Linguistics and its usefulness for the treatment of polysemy'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-9040408813111978714</id><published>2009-06-14T03:04:00.000-07:00</published><updated>2009-06-14T05:07:01.099-07:00</updated><title type='text'>Behavioral Profiling and polysemy</title><content type='html'>&lt;div style="text-align: justify;"&gt;In their paper entitled &lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/research/BehavioralProfiles_NWLC.pdf"&gt;&lt;span style="font-style: italic;"&gt;In defense of corpus-based methods: A behavioral profile analysis of polysemous 'get' in English&lt;/span&gt; &lt;/a&gt;(presented at the 24th North West Linguistics Conference, 3-4th May 2008), &lt;a href="http://www.uweb.ucsb.edu/%7Eaberez/"&gt;Andrea L. Berez&lt;/a&gt; and &lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/"&gt;Stefan Th. Gries &lt;/a&gt;make a general case for the use of corpus data. their paper serves as a response to &lt;a href="http://www.helsinki.fi/varieng/people/varieng_raukko.html"&gt;Raukko&lt;/a&gt;'s (1999,2003) proposal to disregard corpus data investigations in favour of experimentally motivated studies.  Berez and Gries conclude that:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;[A] rejection of corpus-based investigations of polysemy is premature: our BP approach to &lt;span style="font-style: italic;"&gt;get&lt;/span&gt; not only avoids the pitfalls Raukko mistakenly claims to be inherent in corpus research, it also provides results that are surprisingly similar to his own questionnaire-based results, and Divjak and Gries (to appear) show how predictions following from a BP study are strongly supported in two different psycholinguistic experiments." (P.165)&lt;br /&gt;&lt;/blockquote&gt;Before conducting a case study of polysemous&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt; get&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; -- the results of which are compared , in the second part of the paper, to those presented in Raukko's &lt;span style="font-style: italic;"&gt;An "intersubjective" method for cognitive semantic research on polysemy: the case of 'get'&lt;/span&gt; (1999), the authors briefly state the advantages of corpus data:&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div style="text-align: justify;"&gt;- (...) the richness of and diversity of naturally-occurring data often forces the researcher to take a broader range of facts  into consideration;&lt;br /&gt;- the corpus output from a particular search expression together constitute an objective database of a kind that made-up sentences or judgements often do not. More pointedly, made-up sentences or introspective judgements involve potentially non-objective (1) data gathering, (ii) classification, (iii) interpretive process on the part of the researcher. Corpus data, on the other hand, at least allow for an objective and replicable data-gathering process; given replicable retrieval operations, the nature, scope and the ideas underlying the classification of examples can be made very explicit (...) (p.159)&lt;br /&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Methodologically,  Berez and Gries attempt to make their case by targeting 'polysemy' as their domain of investigation and by applying the Behavioral profiling method (described &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/02/corpus-based-behavioral-profile.html"&gt;here&lt;/a&gt;):&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div style="text-align: justify;"&gt;Given the recency of this method, the number of studies that investigate highly polysemous items is still limited. We therefore apply this method to the verb &lt;span style="font-style: italic;"&gt;to get&lt;/span&gt; to illustrate that not only does it not suffer from the problems of the intersubjective approach, but it also allows for a more bottom-up/data-driven analysis of the semantics of lexical elements to determine how many senses of a word to assume and what their similarities and differences are. (p.157)&lt;br /&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Generally, the results encountered in both Berez and Gries' study and Raukko study are very similar. However, Berez and Gries' BP approach allows for a finer grained investigation:&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;blockquote&gt;we show that some of our results are incredibly close to Raukko's, but also provide an illustration of how the BPs can combine syntactic and semantic information in a multifactorial way that is hard to come by using the kinds of production experiments Raukko discusses. (p.159)&lt;br /&gt;&lt;/blockquote&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="font-weight: bold;"&gt;With regard to my project, broadly concerned with a corpus-driven investigation of polysemous lexical items , Berez and Gries' paper provides, methodologically, a useful illustration of how to exploit corpus data optimally for the retrieval of semantic information. &lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-9040408813111978714?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/9040408813111978714/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/behavioral-profiling-and-polysemy.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/9040408813111978714'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/9040408813111978714'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/behavioral-profiling-and-polysemy.html' title='Behavioral Profiling and polysemy'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-2758132416964936556</id><published>2009-06-09T12:15:00.000-07:00</published><updated>2009-06-10T04:30:28.668-07:00</updated><title type='text'>Behavioral Profiles, snake plots and cross-linguistic comparisons</title><content type='html'>&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} a:link, span.MsoHyperlink 	{color:blue; 	text-decoration:underline; 	text-underline:single;} a:visited, span.MsoHyperlinkFollowed 	{color:purple; 	text-decoration:underline; 	text-underline:single;} @page Section1 	{size:595.3pt 841.9pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:35.4pt; 	mso-footer-margin:35.4pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt; &lt;/style&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:""; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin:0cm; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Times New Roman"; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} &lt;/style&gt; &lt;![endif]--&gt;  &lt;p class="MsoNormal"&gt;This post complements this earlier post: &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/02/corpus-based-behavioral-profile.html"&gt;&lt;i&gt;The corpus-based Behavioral Profile approach to cognitive semantics&lt;/i&gt;&lt;/a&gt; as it revisits the Behavior Profile (BP) methodology and reports how, according to Divjak and Gries, snake plot representations can graphically reveal the relative significance of ID tags thus allowing for cross-linguistic ID tag-level comparisons. In this post I make reference to Divjak and Gries recent paper: &lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/research/ContrastivePhasalVerbs.pdf"&gt;&lt;i&gt;Corpus-based cognitive semantics: a contrative study of phasal verbs in English and Russian&lt;/i&gt;&lt;/a&gt; (to appear).&lt;br /&gt;&lt;br /&gt;Overall, Divjak and Gries demonstrate that the BP methodology not only allows to pick up dissimilarities between polysemous and near synonyms but it also allows to recognise and simultaneously process dissimilarities that are characteristically different:&lt;br /&gt;&lt;br /&gt;"Because these dissimilarities are of an entirely different order, they can only be picked up if a methodology is used that adequately captures the multivariate nature of the phenomenon. The Behavioral Profiling approach we have developed and apply here does exactly that." (p.273, abst.).&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;For their investigation of polysemous and near synonymous lexical items the authors assume the existence of networks of words/senses. They also assume that the investigated lexical items in their study are included in such networks. Further, these networks demonstrate internal structure in the sense that "elements which are similar to each other are connected and the strength of the connection reflects the likelihood that the elements display similar syntactic and semantic behaviour" (p.281)&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;Divjak and Gries' paper achieves three goals:&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br /&gt;&lt;/p&gt;&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt; &lt;/style&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:""; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin:0cm; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Times New Roman"; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} &lt;/style&gt; &lt;![endif]--&gt;&lt;span style=";font-family:&amp;quot;;font-size:12;"  &gt;1/ Presents the BP methodology as a means to provide a usage-based characterisation of the lemma under investigation by identifying individual syntactic and semantic characteristic features.&lt;br /&gt;&lt;br /&gt;2/ Demonstrates that a snake plot graphic representation of those syntactic and semantic characteristic features allows to rank them in order of significance and therefore contributes to the identification of clusters of senses "on the basis of distributional characteristics collected in BPs" (p.292). Consequently, snake plots representations allow for the recognition of prototypical features of the investigated lexical items.&lt;br /&gt;3/ Illustrates that semantically the BP approach allows for more rigorous investigation of translational cross-linguistic equivalents.&lt;br /&gt;&lt;br /&gt;Overall, the authors are testing the BP approach for a simultaneous treatment of both language-specific data and cross-linguistic data.&lt;br /&gt;&lt;br /&gt; "The (...) purpose is to show that this approach can also be applied to the notoriously difficult area of cross-linguistic comparisons. (...) [T]he approach will be put to the test by attempting a simultaneous within-language description and across-languages comparison of polysemous and near-synonymous items belonging to different subfamilies of Indo-European, i.e., English and Russian" (p.277)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Generally, Divjak and Gries' paper encourages to put the BP methodology further to the test by applying it to an interlanguage type of data where the investigated lexical items in language x and carving a specific conceptual space &lt;/span&gt;&lt;/span&gt;&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link style="font-weight: bold;" rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt; &lt;/style&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:""; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin:0cm; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Times New Roman"; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} &lt;/style&gt; &lt;![endif]--&gt;&lt;span style=";font-family:&amp;quot;;font-size:12;"  &gt;&lt;span style="font-weight: bold;"&gt;is used by a native speaker of language y whose conceptual space for the translational equivalent of the investigated item in language x is potentially different. In other words and with regard to the application of the BP methodology to my project, while the paper raises questions about the nature of conceptual spaces in interlanguage, it convincingly offers a methodology that would allow for the computation of my three-way data (including native English, native French and Fr-English interlanguage, details of the three sub-corpora can be found &lt;/span&gt;&lt;a style="font-weight: bold;" href="http://cognitionandinterlanguage.blogspot.com/2009/02/icle-and-locness-welcome-codif-latest.html"&gt;here&lt;/a&gt;&lt;span style="font-weight: bold;"&gt;).  Simultaneous treatment of may, can and pouvoir can be carried out within language -- taking into account the native English data vs. the Fr-English interlanguage data, and across language -- taking ito account the native French vs. native English vs. Fr-English interlanguage data. Finally, the BP approach also provides the opportunity to investigate the possibilty of a correlation between the word class membership of &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;may&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;, &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;can&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; and &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;pouvoir&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; and their semantic BPs&lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;!--[if !supportLineBreakNewLine]--&gt;&lt;br /&gt;&lt;!--[endif]--&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style=";font-family:&amp;quot;;font-size:12;"  &gt; &lt;!--[if !supportLineBreakNewLine]--&gt;&lt;br /&gt;&lt;!--[endif]--&gt;&lt;/span&gt;&lt;p class="MsoNormal"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p class="MsoNormal"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-2758132416964936556?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/2758132416964936556/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/behavioral-profiles-snake-plots-and_09.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/2758132416964936556'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/2758132416964936556'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/06/behavioral-profiles-snake-plots-and_09.html' title='Behavioral Profiles, snake plots and cross-linguistic comparisons'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-2907087500924111434</id><published>2009-05-24T16:06:00.000-07:00</published><updated>2009-05-24T17:02:55.883-07:00</updated><title type='text'>R training at the University of Uppsala</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_IG-WOzwBoRc/ShngGG5f4yI/AAAAAAAAABg/D5pw9Pi17wo/s1600-h/Linguistics+department+-+Uppsala+Univ.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 200px; height: 160px;" src="http://3.bp.blogspot.com/_IG-WOzwBoRc/ShngGG5f4yI/AAAAAAAAABg/D5pw9Pi17wo/s200/Linguistics+department+-+Uppsala+Univ.JPG" alt="" id="BLOGGER_PHOTO_ID_5339545228760048418" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Finally ... back after too long! ...&lt;br /&gt;&lt;br /&gt;In previous posts I tried to point out the advantages of using R as a methodological tool for my research project (&lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/03/case-for-using-r-for-statistical.html"&gt;here&lt;/a&gt; and &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/03/from-corpus-to-clusters-gries-and.html"&gt;here&lt;/a&gt;). Since the publication of Gries's &lt;span style="font-style: italic;"&gt;&lt;a href="http://www.routledgelanguages.com/books/Quantitative-Corpus-Linguistics-with-R-isbn9780415962704"&gt;Quantitative Linguistics with R: A Practical Introduction&lt;/a&gt; &lt;/span&gt;at the end of March, I have started familiarising myself with the R language and working on possible scripts for the application of R to my data. The process has been taking longer than anticipated and is still at an initial stage -- hence the long absence from the blog!&lt;br /&gt;&lt;br /&gt;On the 18-19 May 2009, the &lt;a href="http://www.lingfil.uu.se/lingfil_eng/"&gt;linguistics department at the University of Uppsala&lt;/a&gt; organised an R training workshop led by Stefan Gries (&lt;span style="font-style: italic;"&gt;Statistics for linguistics with R: monofactorial tests and beyond&lt;/span&gt;), along with a research seminar on 20 May 2009, also given by Stefan Gries. I am extremely grateful to the Linguistics department at the University of Uppsala, and particulalry to &lt;a href="http://www.anst.uu.se/chrisgl/"&gt;Christer Geisler&lt;/a&gt; and  &lt;a href="http://www.anst.uu.se/merjkyto/"&gt;Merja Kito&lt;/a&gt; for welcoming me so warmly during the occasion and letting me attend Stefan Gries' workshop and research seminar.&lt;br /&gt;&lt;br /&gt;The experience was extremely enriching and motivating; I am now planning to put my new skills to the test within the next few days ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-2907087500924111434?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/2907087500924111434/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/05/r-training-at-university-of-uppsala.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/2907087500924111434'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/2907087500924111434'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/05/r-training-at-university-of-uppsala.html' title='R training at the University of Uppsala'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_IG-WOzwBoRc/ShngGG5f4yI/AAAAAAAAABg/D5pw9Pi17wo/s72-c/Linguistics+department+-+Uppsala+Univ.JPG' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-5706440674161498781</id><published>2009-03-27T06:29:00.000-07:00</published><updated>2009-03-28T07:15:32.757-07:00</updated><title type='text'>Image-Schema transformations and cross-linguistic polysemy: a matter of terminology</title><content type='html'>In her 2004 paper (&lt;a href="http://www.hf.uib.no/forskerskole/Tre_upps.Transf_.pdf"&gt;&lt;span style="font-style: italic;"&gt;Transformation on image schemas and cross-linguistic polysemy&lt;/span&gt;&lt;/a&gt;), &lt;a href="http://www.sol.lu.se/person/LenaEkberg"&gt;Lena Ekberg&lt;/a&gt; is generally concerned with diachronic semantic change across different languages and she argues that cross-linguistic semantic change is &lt;span style="font-style: italic;"&gt;cognitively&lt;/span&gt; motivated. She recognises that "[m]odern research within the field of historical lexical semantics and grammaticalization in fact has provided arguments that meaning change is motivated by cognitive principles independent of specific languages" (p.42). Although Ekberg (2004) links with my project in the sense that it takes a cross-linguistic approach to investigate polysemous lexical items while trying to incorporate a Cognitive Semantics approach, it differs from my project in two major ways: i) it identifies specific semantic changes in specific languages and then compares those changes cross-linguistically; and ii) it considers semantic variance diachronically. My project, on the other hand, is concerned with cross-linguistic semantic change in terms of word senses in language x affecting the senses of &lt;span style="font-style: italic;"&gt;corresponding&lt;/span&gt;  words in language y. Further, my project is concerned with on-line cross-linguistic semantic interference and is not concerned with the development of word senses overtime. Despite these differences, Ekberg (2004) is of interest to me because it raises a number of terminology-, methodology- and theoretical framework-related issues.&lt;br /&gt;&lt;br /&gt;Ekberg's overall stand on semantic change is stated in &lt;a href="http://www.hf.uib.no/forskerskole/Tre_upps.Construal_.pdf"&gt;&lt;span style="font-style: italic;"&gt;Construal operations in semantic change: the case of abstract nouns&lt;/span&gt;&lt;/a&gt;):&lt;br /&gt;&lt;br /&gt;"The prerequisites of meaning variation of a lexeme are intrinsic in the underlying schematic structure as well as in the construal operations that may apply to that structure. Thus every instance of semantic change and variation - either resulting in polysemy or contextual meaning variation, is motivated by the possibilities of varying a given schematized structure by means of general and cognitively motivated construal operations" (p.63)&lt;br /&gt;&lt;br /&gt;Further,&lt;br /&gt;&lt;br /&gt;"[T]he processes generating semantic variation and change operate on the schematized structure underlying the lexical representation of a linguistic expression" (p. ).&lt;br /&gt;&lt;br /&gt;Ekberg investigates cross-linguistic semantic change by considering and trying to bring together two theoretical approaches with different theoretical assumptions: the lexical semantics approach and the cognitive semantics approach. In her investigation of "the potential polysemy of lexemes based on a common schema" (p.25), Ekberg (2004) attempts to deal simultaneously with lexical patterns, conceptual processes and cognitive mechanisms. Overall, the paper highlights the limitations of such an inclusive methodology that ultimately relies on loose use of terminology.&lt;br /&gt;&lt;br /&gt;Ekberg's (2004) working assumption is that:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;"semantic structures at a certain level of abstraction, as well as the principles of meaning change, are universal devices for generating new lexical meaning variants" (p.26)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Ekberg (2004) claims that:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;polysemy results from a process of image-schema transformation which itself results from a mental construal process&lt;/li&gt;&lt;li&gt;polysemy refers to meaning variants of the same lexeme related by means of image-schema transformations and which are regarded as separate senses, i.e. instantiation of polysemy&lt;br /&gt;&lt;/li&gt;&lt;li&gt;lexical meaning extensions reflecting transformations of image-schematic structure are cognitively motivated and thus arise cross-linguistically&lt;/li&gt;&lt;li&gt;image-schema transformations are motivated by mental construal processes&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Raising issues:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Ekberg recognises the image schema transformation as a central process in the emergence of new senses. However, in the paper, the term &lt;span style="font-style: italic; font-weight: bold;"&gt;image schema&lt;/span&gt; lacks a reliable working definition. The term is first defined on page 28, in the sense of Johnson (1987) as " a recurring dynamic pattern [...] that gives coherence and structure to our experience". The term is then later referred to on page 36 as being "the most abstract basis of lexical meaning", and on page 43 as an "underlying abstract semantic structure". In other words, throughout the paper, it is unclear whether the term refers to schematic representations of word senses or whether it refers to schematic representations of physical experiences. In the first case, the approach to cross-linguistic semantic change and polysemy is lexically based. In the second case, the approach is experientially based and therefore conceptual in nature (i.e. pre-linguistic). Distinguishing between the two cases is important because they both ultimately refer to different stages/levels in the construction of meaning. The author's attempt to bridge lexical matters (i.e. linguistic in nature) and conceptual matters (i.e. pre-linguistic in nature) creates a degree of confusion about the level of abstraction targeted in the discussion.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Similarly, the term &lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;cognitively motivated&lt;/span&gt; &lt;/span&gt;("lexical meaning extensions reflecting transformations of image-schematic structure are cognitively motivated and thus arise cross-linguistically") calls for clarification. Assuming that lexical meaning extensions do reflect transformations of image-schematic structure (as understood in the CL framework) then those meaning extensions are by definition &lt;span style="font-style: italic;"&gt;cognitively motivated&lt;/span&gt; and the phrase quoted above is redundant and therefore not useful. Alternatively, the term (in the context of the example) could be referring to a speaker's specific cognitive ability which could be applied to the process of lexical meaning extensions.Under the term &lt;span style="font-style: italic;"&gt;cognitive&lt;/span&gt;, it is unclear whether the author refers to a cognitive ability allowing speakers to extend lexical meanings in similar ways in different languages or whether the author refers to a conceptual process (i.e. image-schema, as understood in the CL framework). Without a solid working definition of the term &lt;span style="font-style: italic;"&gt;image schema&lt;/span&gt;, it is difficult to recognise that polysemy results from a process of image schema transformation. It is also difficult to recognise what exactly  is being transformed in the process of meaning extension: the schematic representation of lexical meanings or the image schema as an analog representation of a physical experience.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt; &lt;span style="font-weight: bold;"&gt;Ekberg (2004) raises questions about the possibility of/feasibility in bridging the lexical and the conceptual via the cognitive process of image schema. As far as my study is concerned, even though an overall CL approach to &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;may/can&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; in French-English IL will allow for an analysis of how the senses of &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;may/can&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; are represented in the French-English bilingual mind, the study may well be restricted to show just that!  Talmy, Sweetser and Johnson  have investigated the English modals in terms of linguitsic tools referring to the image schema of Force Dynamic. Although I cannot ignore such studies, the question is now how can they be exploited empirically? &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-5706440674161498781?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/5706440674161498781/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/image-schema-transformations-and-cross.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/5706440674161498781'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/5706440674161498781'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/image-schema-transformations-and-cross.html' title='Image-Schema transformations and cross-linguistic polysemy: a matter of terminology'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-4300423969258984270</id><published>2009-03-23T04:30:00.000-07:00</published><updated>2009-03-23T06:42:52.033-07:00</updated><title type='text'>From corpus to clusters: Gries and Divjak's suggested methodology </title><content type='html'>&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Font Definitions */  @font-face 	{font-family:Wingdings; 	panose-1:5 0 0 0 0 0 0 0 0 0; 	mso-font-charset:2; 	mso-generic-font-family:auto; 	mso-font-pitch:variable; 	mso-font-signature:0 268435456 0 0 -2147483648 0;}  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;}  /* List Definitions */  @list l0 	{mso-list-id:768044938; 	mso-list-type:hybrid; 	mso-list-template-ids:-1619509930 134807553 134807555 134807557 134807553 134807555 134807557 134807553 134807555 134807557;} @list l0:level1 	{mso-level-number-format:bullet; 	mso-level-text:; 	mso-level-tab-stop:72.0pt; 	mso-level-number-position:left; 	margin-left:72.0pt; 	text-indent:-18.0pt; 	font-family:Symbol;} @list l1 	{mso-list-id:989290450; 	mso-list-type:hybrid; 	mso-list-template-ids:217869700 440670940 134807577 134807579 134807567 134807577 134807579 134807567 134807577 134807579;} @list l1:level1 	{mso-level-number-format:roman-lower; 	mso-level-text:"%1\)"; 	mso-level-tab-stop:36.0pt; 	mso-level-number-position:left; 	text-indent:-36.0pt;} @list l2 	{mso-list-id:1652058098; 	mso-list-type:hybrid; 	mso-list-template-ids:1356002564 1467778520 134807577 134807579 134807567 134807577 134807579 134807567 134807577 134807579;} @list l2:level1 	{mso-level-number-format:roman-lower; 	mso-level-text:"%1\)"; 	mso-level-tab-stop:54.0pt; 	mso-level-number-position:left; 	margin-left:54.0pt; 	text-indent:-36.0pt;} @list l3 	{mso-list-id:1853495059; 	mso-list-type:hybrid; 	mso-list-template-ids:-1361119612 134807567 134807577 134807579 134807567 134807577 134807579 134807567 134807577 134807579;} @list l3:level1 	{mso-level-tab-stop:36.0pt; 	mso-level-number-position:left; 	text-indent:-18.0pt;} @list l4 	{mso-list-id:2051952528; 	mso-list-type:hybrid; 	mso-list-template-ids:-609043712 134807553 134807555 134807557 134807553 134807555 134807557 134807553 134807555 134807557;} @list l4:level1 	{mso-level-number-format:bullet; 	mso-level-text:; 	mso-level-tab-stop:72.0pt; 	mso-level-number-position:left; 	margin-left:72.0pt; 	text-indent:-18.0pt; 	font-family:Symbol;} ol 	{margin-bottom:0cm;} ul 	{margin-bottom:0cm;} --&gt; &lt;/style&gt;    &lt;p class="MsoNormal"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;In &lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/research/BehavioralProfiles.pdf"&gt;&lt;span style="font-style: italic;"&gt;Behavioral profiles: a corpus approach to cognitive semantic analysis&lt;/span&gt;&lt;/a&gt; (to appear), Gries and Divjak propose a methodology to approach polysemy both using an empirical approach and following the Cognitive Linguistics (CL) framework. The author's methodology is of interest for my project because of I adopt an empirical approach, I follow the CL framework and my investigated words (i.e. &lt;span style="font-style: italic;"&gt;may, can &lt;/span&gt;and  &lt;span style="font-style: italic;"&gt;pouvoir&lt;/span&gt;) are all polysemous lexical items.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br /&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;span style=""&gt;      &lt;/span&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;In their introduction, the authors review:&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;    &lt;p class="MsoNormal" style="margin-left: 54pt; text-indent: -36pt;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;span style=""&gt;i)&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;              &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;The treatment of polysemy in CL &lt;o:p&gt;&lt;/o:p&gt;&lt;span style=""&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin-left: 54pt; text-indent: -36pt;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;span style=""&gt;ii)&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;         &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Present existing issues behind the identification of the prototypical sense(s) of a word &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-left: 54pt; text-indent: -36pt;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;span style=""&gt;iii)&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;    &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Claim that a more sophisticated quantitative approach to corpus investigation would provide cognitive-linguistically relevant results.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin-left: 54pt; text-indent: -36pt;"&gt;&lt;br /&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Gries and Divjak’s methodology is based on the assumption that it “is &lt;i style=""&gt;radically &lt;/i&gt;corpus-based because it relies on the correlation between distributional patterns and functional characteristics to a much larger extent than most previous cognitive-linguistic work” (p.60). The authors claim that their methodology “aims at providing the best of both worlds, i.e. a precise, quantitative corpus-based approach that yields cognitive-linguistically relevant results” (p.60) &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;b style=""&gt;&lt;u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Method: &lt;/span&gt;&lt;/u&gt;&lt;/b&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Four-step method based on the concepts of &lt;i style=""&gt;ID tags&lt;/i&gt; (cf. Atkins 1987) and the notion of &lt;i style=""&gt;Behavioral Profile&lt;/i&gt; (cf. Hanks’s 1996). &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;The method assumes that “the words or sense investigated are part of a network of words/senses”: &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;“In this network, elements which are similar to each other are connected in such a way that the strength of the connection reflects the likelihood that the elements display similar behavior with respect to phonological, syntactic, semantic or other type of linguistic behaviour” (p.61) &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;b style=""&gt;&lt;span style=";font-family:&amp;quot;;" &gt;The four stages&lt;/span&gt;&lt;/b&gt;&lt;span style=";font-family:&amp;quot;;" &gt;: &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Stages 1-3 are concerned with data processing.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Stage 4 is concerned with meaningful data evaluation. &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;b style=""&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;  &lt;ol style="margin-top: 0cm;" start="1" type="1"&gt;&lt;li class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;The      retrieval of all instances of a word’s lemma from a corpus&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;A manual      analysis of many properties of the word form (i.e. the annotation of the      ID tags)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;The      generation of a co-occurrence table&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;The      evaluation of the table by means of exploratory and other statistical      techniques &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;  &lt;p class="MsoNormal" style="margin-left: 18pt; line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;i style=""&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Data processing&lt;/span&gt;&lt;/i&gt;&lt;span style=";font-family:&amp;quot;;" &gt;: &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Stage 1&lt;/span&gt;&lt;/u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;: use of a concordance program to retrieve all hits of a lemmata of a word&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Stage 2&lt;/span&gt;&lt;/u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;: all hits are annotated for ID tags&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Results from step 2 are displayed in a co-occurrence table where each row contains: &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-left: 72pt; text-indent: -18pt; line-height: 200%;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style="font-family:Symbol;"&gt;&lt;span style=""&gt;·&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;         &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;one citation of the word in question&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-left: 72pt; text-indent: -18pt; line-height: 200%;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style="font-family:Symbol;"&gt;&lt;span style=""&gt;·&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;         &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;each column contains an ID tag&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;    &lt;p class="MsoNormal" style="margin-left: 54pt; line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;    &lt;p class="MsoNormal" style="margin-left: 72pt; text-indent: -18pt; line-height: 200%;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style="font-family:Symbol;"&gt;&lt;span style=""&gt;·&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;         &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;each cell contains the level of the ID tag for this citation &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;br /&gt;&lt;/span&gt;&lt;/u&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin-left: 72pt; text-indent: -18pt; line-height: 200%;"&gt;&lt;u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Stage 3&lt;/span&gt;&lt;/u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;: &lt;o:p&gt;&lt;/o:p&gt;The co-occurrence table is turned into a frequency table (every row contains a level of an ID tag while every column contains a sense of the polysemous word. Each cell in the table provides the frequency of occurrence of the ID tags with the word sense(s)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-left: 72pt; line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;[NB: to compare senses that occur at different frequencies, absolute frequencies need to be turned into relative frequencies (i.e. within ID tag percentages)]&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Step 3 results in the &lt;i style=""&gt;Behavioral profile &lt;/i&gt;for a word sense: “each sense of a word (…) is characterized by one co-occurrence vector of within-ID tag relative frequencies” (p.63) &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Stage 4 of Gries and Divjak’s methodology evaluates the vector-based behavioural profiles identifies in stage 3.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style="font-style: italic;"&gt;Data evaluation &lt;/span&gt;&lt;br /&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;The evaluation can be carried out using quantitative approaches (i.e. standardized statistical tests). &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Gries and Divjak recognise two types of evaluations: &lt;span style="font-weight: bold;"&gt;monofactorial&lt;/span&gt; and &lt;i style=""&gt;&lt;span style=""&gt; &lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight: bold;"&gt;multifactorial evaluations:&lt;/span&gt; &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;ul&gt;&lt;li&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Monfactorial evaluation&lt;/span&gt;&lt;span style=";font-family:&amp;quot;;" &gt;: looks at token frequency and type frequency. “A useful strategy to start with is identifying in one’s corpus the most frequent senses of the word(s) one is investigating” (p.64) &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;ul&gt;&lt;li&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Multifactorial evaluation&lt;/span&gt;&lt;span style=";font-family:&amp;quot;;" &gt;: &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style=";font-family:&amp;quot;;" &gt;The authors specifically focus on the exploratory technique of hierarchical agglomerative cluster analysis. The Hierarchical agglomerative cluster analysis (HAC) is a family of methods that aims at identifying and representing (dis)similarity relations between different items.&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;br /&gt;&lt;/span&gt;    &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;i style=""&gt;&lt;u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;How to do a Hierarchical agglomerative cluster analysis:&lt;/span&gt;&lt;/u&gt;&lt;/i&gt;&lt;i style=""&gt;&lt;span style=";font-family:&amp;quot;;" &gt; &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-left: 36pt; text-indent: -36pt; line-height: 200%;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;span style=""&gt;i)&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;              &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Relative co-occurrence frequency table needs to be turned into a &lt;b style=""&gt;similarity/dissimilarity matrix&lt;/b&gt; (need to settle on a specific measure) &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-left: 36pt; text-indent: -36pt; line-height: 200%;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;span style=""&gt;ii)&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;         &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Selection of an &lt;b style=""&gt;amalgamation strategy&lt;/b&gt; ( =algorithm that defines how the elements that need to be clustered will be joined together on the basis of the variables or the ID tags that they were inspected for (most widely used amalgamation strategy is Ward’s rule)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-left: 36pt; text-indent: -36pt; line-height: 200%;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;span style=""&gt;iii)&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;    &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Results appear in the form of a &lt;b style=""&gt;hierarchical tree diagram&lt;/b&gt; representing distinguishable clusters with high within-cluster similarity and low between-cluster similarity&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="margin-left: 36pt; text-indent: -36pt; line-height: 200%;"&gt;&lt;br /&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;i style=""&gt;&lt;u&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Detailed analysis of the clustering solution &lt;/span&gt;&lt;/u&gt;&lt;/i&gt;&lt;i style=""&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;i) Assessment of the ‘cleanliness’ of the tree diagram&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style=";font-family:&amp;quot;;" &gt;ii) Assessment of the clearest similarities emerging from the tree diagram &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="margin-left: 36pt; text-indent: -36pt; line-height: 200%;"&gt;&lt;!--[if !supportLists]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;&lt;span style=""&gt;iii)&lt;span style=";font-family:&amp;quot;;font-size:7;"  &gt;         &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;!--[endif]--&gt;&lt;span style=";font-family:&amp;quot;;" &gt;Between-cluster differences can be assessed using &lt;i style=""&gt;t&lt;/i&gt;-values&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;b style=""&gt;&lt;span style=";font-family:&amp;quot;;" &gt;NB&lt;/span&gt;&lt;/b&gt;&lt;span style=";font-family:&amp;quot;;" &gt;: “the fact that a cluster analysis has grouped together particular sense/words does not necessarily imply that these senses or words are identical or even highly similar – it only shows that these sense/words are more similar to each other than they are to the rest of the senses/words investigated. By means of standardized &lt;i style=""&gt;z-&lt;/i&gt;scores, one can tease apart the difference between otherwise highly similar senses/words and shed light on what the internal structure of a cluster looks like” (p.67)&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;span style="font-weight: bold;"&gt;The author's methodology and my project&lt;/span&gt;:&lt;br /&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Can the authors' method lead to the identification of semantic clusters between the different senses of &lt;span style="font-style: italic;"&gt;may, can&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;pouvoir&lt;/span&gt;?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;If so, what semantic features characterise each cluster? Can between-cluster differences be identified?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;How useful is the proposed methodology for the elaboration of a cross-linguistic semantic network of the senses of &lt;span style="font-style: italic;"&gt;may, can&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;pouvoir&lt;/span&gt;?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;How&lt;span style="font-style: italic;"&gt; &lt;/span&gt;useful&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt; &lt;/span&gt;&lt;/span&gt;is the proposed methodology for both the identification of cross-linguistic between cluster differences and the identification of within-cluster characterics? &lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Overall, the exploration of the authors' proposed methodology using my data should prove a useful exercise because it provides the opprotunity to investigate the mental semantic organisation of word senses at cross-linguistic level. &lt;/span&gt;&lt;br /&gt;&lt;p class="MsoNormal" style="line-height: 200%;"&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-4300423969258984270?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/4300423969258984270/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/from-corpus-to-clusters-gries-and.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/4300423969258984270'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/4300423969258984270'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/from-corpus-to-clusters-gries-and.html' title='From corpus to clusters: Gries and Divjak&apos;s suggested methodology '/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-4969759712903386667</id><published>2009-03-06T09:22:00.000-08:00</published><updated>2009-03-06T14:15:18.731-08:00</updated><title type='text'>Approaching the data statistically: what to test, how and why ?</title><content type='html'>At this point in the project, the investigation of the data is broadly anticipated to include two separate stages, each one of those stages bearing different methodological assumptions. The first stage is purely quantitative in nature and follows a traditional trend in corpus linguistics to assess "the distribution of a single variable such as word frequency" (Oakes: 1998). The literature refers to that type of approach as univariate By adopting the traditional approach in the first stage of the investigation of the data, my aim is to provide a preliminary overview of the behaviour of &lt;span style="font-style: italic;"&gt;may, can &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;pouvoir&lt;/span&gt; in &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/02/icle-and-locness-welcome-codif-latest.html"&gt;all three subcorpora&lt;/a&gt;. However, although that stage will provide general patterns of uses of the modals in the different subcorpora, the weight of the results gathered from frequency tests will need to be handled cautiously on the basis of &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/02/point-of-variability-assessment-within.html"&gt;variability within and between corpora&lt;/a&gt;. The second stage of the data investigation process includes the computation of qualitative information such as word senses and contextual/pragmatic information. That stage is anticipated to consist mainly of cluster analyses. A description of that type of analysis and its implications for my study will be presented in a later post.&lt;br /&gt;&lt;br /&gt;This post is only concerned with the first stage of investigation. I present an overview of the range of statistical tests available and that I judge suitable for word-frequency motivated investigations. I then show the relevance of those tests in the context of my data. The information presented below is drawn from Michael P. Oakes's&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt; &lt;a href="http://www.eupjournals.com/book/9780748608171"&gt;Statistics for Corpus Linguistics&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;As a first step into the quantitative stage, the &lt;span style="font-style: italic;"&gt;central tendency&lt;/span&gt; of the data needs to be identified. The &lt;span style="font-style: italic;"&gt;central tendency measure&lt;/span&gt; represents the data of a group of items in a single score and as being the most typical score for a data set (p.2). There are three &lt;span style="font-style: italic;"&gt;&lt;/span&gt; possible types of measure to identify the central tendency of a data set: the &lt;span style="font-weight: bold;"&gt;median&lt;/span&gt; (the central score of the distribution with half of the scores being above the median and the other half falling below), the &lt;span style="font-weight: bold;"&gt;mode&lt;/span&gt; (the most frequently obtained score in the data set) and the &lt;span style="font-weight: bold;"&gt;mean&lt;/span&gt; (the average of all scores in the data set).The mode measure is recognised to have the disadvantage to be easily affected by chance scores in smaller data sets. The disadvantage of the mean, on the other hand, is that it is affected by extreme values and might not be reliable in cases where the data is not normally distributed. In the context of my data, the mean is judged to be the most appropriate central tendency measure (a preliminary investigation of the frequency of the occurrences of &lt;span style="font-style: italic;"&gt;may&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;can, may not, cannot &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;can't &lt;/span&gt;did not reveal cases of extremely low/high number of uses; parametric tests (described below) assume that the mean is an appropriate measure of central tendency). The mean measure is also necessary for the calculation of &lt;span style="font-style: italic; font-weight: bold;"&gt;z&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; scores&lt;/span&gt; (statistical measure of the closeness of an element to the mean value for all the elements in a group) and &lt;span style="font-weight: bold;"&gt;standard deviation&lt;/span&gt; (measure which takes into account the distance of every data item from the mean).&lt;br /&gt;&lt;br /&gt;Once the central tendency of individual data sets is identified, specific statistical tests will allow for the comparison of those data sets. Broadly, there are  two types of tests: &lt;span style="font-weight: bold;"&gt;parametric &lt;/span&gt;tests and &lt;span style="font-weight: bold;"&gt;non-parametric&lt;/span&gt; tests. Parametric tests assume that: i) the data is normally distributed, ii) the mean and the standard deviation (described below) are appropriate measures of central tendency and dispersion, iii) observations are independent and scores assigned to one case must not bias the score given to any other. Non-parametric tests work with frequencies and ranked-ordered scales and they do not depend on the population being normally distributed.&lt;br /&gt;&lt;br /&gt;Generally, parametric tests are considered to be more powerful and are recommended to be the tests of choice if all the necessary assumptions apply.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 51, 0);"&gt;Parametric tests&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;t&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; test&lt;/span&gt;: statistical significance test based on the difference between observed and expected results. In other words, the &lt;span style="font-style: italic;"&gt;t &lt;/span&gt;test allows for the comparison of the mean of two different data sets. In that way, the &lt;span style="font-style: italic;"&gt;t&lt;/span&gt; test assesses the difference between two groups for normally distributed intervals of data where the mean and standard deviation are appropriate measures of central tendency and variability of the scores.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;T&lt;/span&gt; tests are used rather then &lt;span style="font-style: italic;"&gt;z&lt;/span&gt; score tests whenever the analyst is dealing with a small sample. (i.e. where either group has less than 30 items).  A z-score + 1 indicates one standard variation above the mean.  A z-score of -1.5 indicates 1.5 SDs below the mean.Once the standard deviation is calculated, the Z-score indicates how far off the mean a particular data item is located.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;In the context of my data, a &lt;span style="font-style: italic;"&gt;t &lt;/span&gt;test would establish whether there is any significant  statistical difference (i.e. certainty that a result is unlikely to be purely due to chance) between:&lt;br /&gt;&lt;br /&gt;-the uses of &lt;span style="font-style: italic;"&gt;may&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;can &lt;/span&gt;in ICLE FR and LOCNESS.&lt;br /&gt;-the uses of &lt;span style="font-style: italic;"&gt;may not, cannot &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;can't &lt;/span&gt;in ICLE FR and LOCNESS&lt;br /&gt;-the uses of &lt;span style="font-style: italic;"&gt;may &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;can  &lt;/span&gt;in ICLE FR and LOCNESS, in argumentative texts&lt;br /&gt;-the uses of &lt;span style="font-style: italic;"&gt;may &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;can &lt;/span&gt;in ICLE FR and LOCNESS, in literary texts&lt;br /&gt;&lt;br /&gt;Based on the calculation of the mean, applying the standard variation test to the&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;ICLE FR subcorpus would allow to identify the overall proportion of that data set not showing expected results and consequently being typical of that data set. Further, a calculation of the &lt;span style="font-style: italic;"&gt;z&lt;/span&gt; scores in ICLE FR would allow to identify the uses of &lt;span style="font-style: italic;"&gt;may/can &lt;/span&gt;that are the most typical of native French speakers (those would be represented by the &lt;span style="font-style: italic;"&gt;z &lt;/span&gt;scores the closest to the mean) and the least typical uses (those would be represented by the &lt;span style="font-style: italic;"&gt;z&lt;/span&gt; scores the furthest away from the mean). &lt;br /&gt;&lt;br /&gt;The calculations will be useful because they will also enable to establish whether there are statistically significant differences in the uses of &lt;span style="font-style: italic;"&gt;may/can&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;between&lt;/span&gt; individual native French speakers. Such information will ultimately be useful at the qualitative stage of the investigation while  examining the possible motivation for such possible differences at cognitive level.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(153, 51, 0);"&gt;Non-parametric tests:&lt;br /&gt;&lt;span style="color: rgb(0, 0, 0);"&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102);"&gt;In the above section, I pointed out the usefulness of parametric tests for the purpose of my study. However, it is worthy to note that as a non-parametric test, the &lt;/span&gt;&lt;span style="font-style: italic; color: rgb(102, 102, 102);"&gt;Chi-Square &lt;/span&gt;&lt;span style="color: rgb(102, 102, 102);"&gt;test assesses the relationship between frequencies in a display table. That test allows for an estimation of whether the frequencies in a table differ significantly from each other. Oakes (1998) notes that when working with frequency data, the &lt;/span&gt;&lt;span style="font-style: italic; color: rgb(102, 102, 102);"&gt;Chi-Square&lt;/span&gt;&lt;span style="color: rgb(102, 102, 102);"&gt; test is a good technique for modelling a two-variable table. In my study, the &lt;/span&gt;&lt;span style="font-style: italic; color: rgb(102, 102, 102);"&gt;Chi-Square &lt;/span&gt;&lt;span style="color: rgb(102, 102, 102);"&gt;test could perhaps be used as an additional test to confirm results found from the standard deviation test. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 102, 102);"&gt;So what's next?:&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;calculate the mean of the uses of &lt;span style="font-style: italic;"&gt;may/can&lt;/span&gt; in ICLE FR&lt;/li&gt;&lt;li&gt;calculate the mean of the uses of &lt;span style="font-style: italic;"&gt;may/can&lt;/span&gt; in LOCNESS&lt;/li&gt;&lt;li&gt;calculate the mean of the uses of &lt;span style="font-style: italic;"&gt;may not/cannot/can't &lt;/span&gt;in ICLE FR&lt;/li&gt;&lt;li&gt;calculate the mean of the uses of &lt;span style="font-style: italic;"&gt;may not/cannot/can't &lt;/span&gt;in LOCNESS&lt;/li&gt;&lt;li&gt;calculate the standard deviation in all of the above&lt;/li&gt;&lt;li&gt;carry out a &lt;span style="font-style: italic;"&gt;t &lt;/span&gt;test in all of the above&lt;/li&gt;&lt;li&gt;calculate the &lt;span style="font-style: italic;"&gt;z &lt;/span&gt;scores in all of the above&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="color: rgb(153, 51, 0);"&gt;&lt;span style="color: rgb(0, 0, 0);"&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-4969759712903386667?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/4969759712903386667/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/approaching-data-statistically-what-to.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/4969759712903386667'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/4969759712903386667'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/approaching-data-statistically-what-to.html' title='Approaching the data statistically: what to test, how and why ?'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-8259959001044991034</id><published>2009-03-02T15:36:00.000-08:00</published><updated>2009-03-03T02:41:27.484-08:00</updated><title type='text'>The semantic map model raises an issue for the comparison of 'may', 'can' and 'pouvoir' </title><content type='html'>&lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/02/corpus-based-behavioral-profile.html"&gt;In this post&lt;/a&gt;, I briefly touch on the difficulty to carry out cross-linguistic studies:&lt;br /&gt;&lt;br /&gt;'Gries and Divjak recognise that "[c]ross-linguistic semantic studies are notoriously difficult given that different languages carve up conceptual space(s) in different ways (cf. Janda, to appear for discussion); for that reason, linguistic dimensions are difficult to compare across languages" (p7)'&lt;br /&gt;&lt;br /&gt;Here, I raise a methodological difficulty involved in the cross-linguistic comparison of 'may', 'can' and 'pouvoir' on the basis of a forthcoming paper by &lt;a href="http://hum.uit.no/lajanda/"&gt;Laura Janda &lt;/a&gt;: &lt;a href="http://209.85.229.132/search?q=cache:NDWNDEjjwbEJ:www.unc.edu/depts/slavdept/lajanda/jandabltfestschrift.doc+%22What+is+the+role+of+semantic+maps+in+cognitive+linguistics%3F+%22&amp;amp;hl=en&amp;amp;ct=clnk&amp;amp;cd=1&amp;amp;gl=uk"&gt;&lt;span style="font-style: italic;"&gt;What is the role of semantic maps in cognitive linguistics?&lt;/span&gt;&lt;/a&gt;&lt;a href="http://209.85.229.132/search?q=cache:NDWNDEjjwbEJ:www.unc.edu/depts/slavdept/lajanda/jandabltfestschrift.doc+%22What+is+the+role+of+semantic+maps+in+cognitive+linguistics%3F+%22&amp;amp;hl=en&amp;amp;ct=clnk&amp;amp;cd=1&amp;amp;gl=uk"&gt;&lt;/a&gt;&lt;br /&gt;(&lt;a href="http://209.85.229.132/search?q=cache:eda8RMu1_LgJ:hum.uit.no/lajanda/conference%2520presentations/Lodz%25202008/Semantic%2520maps.ppt+%22What+is+the+role+of+semantic+maps+in+cognitive+linguistics%3F+%22&amp;amp;hl=en&amp;amp;ct=clnk&amp;amp;cd=3&amp;amp;gl=uk"&gt;here is the Powerpoint verion&lt;/a&gt;) .&lt;br /&gt;&lt;br /&gt;In her paper, although Janda grants some degree of usefulness to the semantic map model (helps identify patterns across languages, helps visualise complex data), she, nevertheless, identifies the limitations of the model, particularly in the context of the cognitive linguistics analysis.&lt;br /&gt;&lt;br /&gt;Broadly, semantic maps are designed to compare large numbers of languages. The semantic map model assumes that:&lt;br /&gt;&lt;br /&gt;i) a single universal conceptual space exists&lt;br /&gt;ii) the grammar of each language is the sum of the 'lines' drawn by that language across this single shared space&lt;br /&gt;iii) all languages are based on the same parameters&lt;br /&gt;&lt;br /&gt;The semantic map model implies a &lt;span style="font-weight: bold;"&gt;conceptual space&lt;/span&gt;, that is the "universal backdrop of possible distinctions that human beings can recognise (and might grammaticalise)" and a &lt;span style="font-weight: bold;"&gt;conceptual map&lt;/span&gt;, that is "a distribution of actual distinctions made by one or a number of languages across the parameters of conceptual space" (p.5)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Janda follows Langacker (2006) in her distinction between discrete and continuous linguistic models :&lt;br /&gt;&lt;br /&gt;&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt; &lt;/style&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:""; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin:0cm; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Times New Roman"; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} &lt;/style&gt; &lt;![endif]--&gt;  &lt;p class="MsoNormal" style="text-align: justify;"&gt;&lt;span style="font-size:11;"&gt;&lt;span style="color: rgb(204, 102, 0);"&gt;"I would like to frame this discussion of semantic maps in terms of Langacker’s (2006) concerns about continuity and discreteness in linguistic models. As Langacker points out, all models are metaphorical, and all metaphors are potentially misleading, particularly if one forgets that the metaphor may be suppressing some information, and/or if the metaphor is excessively discrete or continuous. Most phenomena, including linguistic phenomena, are complex enough to justify applying both discrete and continuous models in their interpretation (Langacker 2006:107). Imposing discreteness on a system means that grouping and reification facilitate the identification of units that would not be available in a continuous description, such as galaxies, archipelagos, villages, and discrete (yet related) languages. Continuity has the advantage of facilitating focus on the relationships among parts of a system, making it possible to identify fields of similarity that discreteness ignores, such as dialect continua and all manner of gradients. We have the option of choosing various models, some of which will be relatively discrete and some of which will be relatively continuous." (p.12) &lt;/span&gt;&lt;u1:p&gt;&lt;/u1:p&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;&lt;br /&gt;&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman"; 	mso-ansi-language:EN-US; 	mso-fareast-language:EN-US;} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt; &lt;/style&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:""; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin:0cm; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Times New Roman"; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} &lt;/style&gt; &lt;![endif]--&gt;In other words, semantic maps only show distances and are not semantically meaningful. Further, as the semantic map model focuses on the discrete points, it ignores the continuous zones and the relations between each point.  Janda insists that in cross-linguistic studies, these characteristics are amplified. Further, according to Janda, semantic maps fail to capture in detail "differences in metaphor, construal and scalability, all of which are key to a cognitive analysis" (p.30)&lt;br /&gt;&lt;br /&gt;Finally, Janda points out that the semantic map model fails to take into consideration the qualitative differences between languages. Indeed she notes that a concept can be expressed by a grammatical category in one language but be expressed lexically in another language (p. 21). That point is of particular relevance to my project as English, 'may' and 'can' are grammatical words and therefore belong to the closed word-class. French 'pouvoir' on the other hand, is a lexical verb which belongs to an open word-class and which takes on inflections. So 'may'/'can' and 'pouvoir' show different degrees of grammaticalisation. Such difference in the lexicalisation process of the semantic domain of &lt;span style="font-size:85%;"&gt;POSSIBILITY&lt;/span&gt; raises the issue of a possible cross-linguistic lexico-grammatical continuum which naturally contradicts the discrete quality of the semantic map model. Indeed, in her paper, Janda mainly uses the case of cross-linguistic polyfunctional grams to illustrate that there is no direct correlation between grams and concept and that cross-linguistic studies will reveal overlaps between markers and what they express. Janda's crosslinguistic illustrations mainly include languages that share similar grams and the discussion is centred around the various senses of those grams. In the case of 'may', 'can' and 'pouvoir', French and Englsih are not comparable in that way. As mentioned above in English, the forms are fully grammticalised whereas in French, 'pouvoir' inflects. Janda's paper raises the issue of the comparability of the three modals and the necessity to identify clear comparison criteria.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;So in sum, the semantic map model is not attractive for the purpose of my study because as discrete by nature, it does not allow to infer on construal mechanisms. Further, it is mainly concerned with quantitative external differences and does not address qualitative properties.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:595.3pt 841.9pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:35.4pt; 	mso-footer-margin:35.4pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt; &lt;/style&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:""; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin:0cm; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Times New Roman"; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} &lt;/style&gt; &lt;![endif]--&gt;  &lt;p class="MsoNormal" style="text-align: justify;"&gt;&lt;span style="font-size:11;"&gt;&lt;br /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;o:p&gt; &lt;/o:p&gt;&lt;/p&gt;&lt;br /&gt;&lt;p:colorscheme colors="#ffffff,#000000,#969696,#000000,#fbdf53,#ff9966,#cc3300,#996600"&gt;&lt;div shape="_x0000_s1026"&gt;&lt;div class="O" style=""&gt;&lt;/div&gt;  &lt;/div&gt;  &lt;/p:colorscheme&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-8259959001044991034?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/8259959001044991034/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/semantic-map-model-raises-issue-for.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/8259959001044991034'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/8259959001044991034'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/semantic-map-model-raises-issue-for.html' title='The semantic map model raises an issue for the comparison of &apos;may&apos;, &apos;can&apos; and &apos;pouvoir&apos; '/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-1159362760312463620</id><published>2009-03-01T08:11:00.000-08:00</published><updated>2009-03-01T11:10:10.906-08:00</updated><title type='text'>A case for using R for the statistical computation of my data</title><content type='html'>As a usage-based study my project involves quantitative data analyses. This post makes a brief case for the use of R as the chosen statistical computation program for the quantitative analyses of my data.&lt;br /&gt;&lt;br /&gt;R is:&lt;br /&gt;- a language and environment for statistical computing and graphics&lt;br /&gt;- a program providing a variety of statistical and graphical techniques&lt;br /&gt;- a free open-source program&lt;br /&gt;&lt;br /&gt;The use of R is rapidly growing in the fields of statistics, engineering and science. &lt;a href="http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=1&amp;amp;scp=1&amp;amp;sq=%22data%20analysts%20captivated%20by%20R%27s%20power%22&amp;amp;st=cse"&gt;This article&lt;/a&gt; from The New York Times (07/01/2009) provides an overview of the various uses of R by data analysts from differing professional backgrounds.&lt;br /&gt;&lt;br /&gt;In corpus linguistics, the use of R is confidently spreading as it allows analysts to carry out multifactoral searches and approach data with fine degrees of granularity. &lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/other/private.html"&gt;Stefan Gries&lt;/a&gt; is actively contributing to the development of R and its application to the field of corpus linguistics and is the author of recently published &lt;a href="http://www.routledge.com/books/Quantitative-Corpus-Linguistics-with-R-isbn9780415962704"&gt;&lt;span style="font-style: italic;"&gt;Quantitative Corpus Linguistcs with R&lt;/span&gt;&lt;/a&gt;. As an open-source program, R is continually being improved and updated with new codes. In that respect, Gries provides linguists using R with &lt;a href="http://groups.google.com/group/corpling-with-r/web/quantitative-corpus-linguistics-with-r"&gt;downloadable updated codes&lt;/a&gt; on a regular basis.&lt;br /&gt;&lt;br /&gt;Generally, the use of R has been praised in the literature concerned with analysis of linguitsic data. As Larson-Hall writes in her review of&lt;span style="font-style: italic;"&gt; &lt;/span&gt;Baayen's (2008)&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;a href="http://www.equinoxjournals.com/ojs/index.php/SS/article/viewFile/5557/4070"&gt;Analysing linguistic data: A practical introduction to statistics using R &lt;/a&gt;&lt;/span&gt;: "(...) the statistical program you use guides the way you think about statistical analysis, and I do think R is far superior to any menu-driven program in this way"(p.472).&lt;br /&gt;&lt;br /&gt;In the field of cognitive semantics, Dogmar Divjak and Stefan Gries (2008) (&lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/research/ClustersMind_ML.pdf"&gt;&lt;span style="font-style: italic;"&gt;Clusters in the mind? Converging evidence from near synonymy in Russian&lt;/span&gt;&lt;/a&gt;) (&lt;span style="font-style: italic;"&gt;The Mental Lexicon&lt;/span&gt; 3.2:188-213) provide illustrations of the use of R. Further, in her CMLLP-2008 [Corpus Methods in Linguistics and Language Teaching] Masterclass material used at the University of Chicago, Dogmar Divjak provides a suggested procedure to approach semantic issues via the use of R. Divjak uses the case of the semantics of &lt;span style="font-style: italic;"&gt;be&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;have&lt;/span&gt; as a case study. The suggested methodology is as follows:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Identify problem&lt;/li&gt;&lt;li&gt;Come up with a list of variables&lt;/li&gt;&lt;li&gt;Operationalize variables: ensure assigning unique value during manual annotation process&lt;/li&gt;&lt;li&gt;Annotate corpus extractions&lt;/li&gt;&lt;li&gt;? hypothesis:&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;ul&gt;&lt;li&gt;no &gt; exploratory analysis&lt;/li&gt;&lt;li&gt;yes &gt; confirmatory analysis&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Considering all of the above, the use of R, for the purpose of my project, would methodologically place my investigation in line with other recognised current studies .&lt;/span&gt; However, it should be noted that the actual use of R is not recognised as straight forward. As Larson-Hall notes in her above-mentioned review:&lt;br /&gt;&lt;br /&gt;"While I myself have become fairly familiar with R and think it is an excellent statistical program, I have to admit that there is something of a learning curve when it comes to using it for one's own data. (...) Although R is elegant and useful, I would not label it as an 'easy to learn' program (...)" (p. 472)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-1159362760312463620?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/1159362760312463620/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/case-for-using-r-for-statistical.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/1159362760312463620'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/1159362760312463620'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/03/case-for-using-r-for-statistical.html' title='A case for using R for the statistical computation of my data'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-5478481756576053801</id><published>2009-02-25T04:35:00.000-08:00</published><updated>2009-02-25T08:08:43.672-08:00</updated><title type='text'>Bridging the conceptual and the contextual using evidence from corpus data</title><content type='html'>&lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/research/MandarinAdverbs_Corpora.pdf"&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;Zhuo and Gries (to appear) (&lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/research/MandarinAdverbs_Corpora.pdf"&gt;&lt;span style="font-style: italic;"&gt;Schematic meaning and pragmatic inference: The Mandarin adverbs 'hai', 'you', and 'zai' &lt;/span&gt;&lt;/a&gt;) could contribute to the development of a method to investigate my data in view of identifying the semantic and image-schematic profiling characteristics of Fr-Engl IL 'may' and 'can'.&lt;br /&gt;&lt;br /&gt;In their paper, Zhuo and Gries are concerned with the relation between the abstract schematic meaning of specific lexical items and the variety of concrete contextual messages those lexical items give rise to. The authors use the case of Mandarin adverbs 'hai', 'you' and 'zai', all loosy translating into 'again', to demonstrate that although the three lexical items belong to a common &lt;span style="font-style: italic;"&gt;semantic&lt;/span&gt; &lt;span style="font-style: italic;"&gt;system&lt;/span&gt; [the term is understood here as refering to a 'semantic notion'; the authors also refer to the term &lt;span style="font-style: italic;"&gt;semantic substance&lt;/span&gt; to express the same idea]&lt;span style="font-style: italic;"&gt;&lt;/span&gt;, each individual lexical item refers to a specific facet of the &lt;span style="font-style: italic;"&gt;semantic system&lt;/span&gt; it is a member of. Within specific &lt;span style="font-style: italic;"&gt;systems&lt;/span&gt;, all members contrast semantically with one another:&lt;br /&gt;&lt;br /&gt;"[W]e shall treat the three adverbs as signs in semantic opposition. We shall assign each word a schematic meaning as a salient component of a semantic system in which they contrast" (p.7)&lt;br /&gt;&lt;br /&gt;Although schematic meanings can be contextually enriched (via idiosyncratic lexical input, encyclopaedic knowledge of a particular word or the &lt;span style="font-style: italic;"&gt;human factor&lt;/span&gt;), they are considered by the authors as &lt;span style="font-style: italic;"&gt;semantic values&lt;/span&gt; and are to be dissociated from contextual inference. Further, Zhuo and Gries recognise that "the human ability to utilize all kinds of knowledge including knowledge of language as well as world and cultural knowledge and the ability to pick up contextual cues in discourse" (p.8) is part of the contextual enrichment process of the schematic meaning of a particular lexical item.&lt;br /&gt;&lt;br /&gt;One of the core issues addressed in the paper is that of semantic compatibility bewteen a particular lexical item and its discourse environment. On the basis of discourse coherence and semantic compatibility, the authors predict and confirm that due to their individual schematic meaning, the three lexical items 'hai', 'you' and 'zai' show in discourse different collocation preferences, thus bringing evidence that different lexical items from a common &lt;span style="font-style: italic;"&gt;semantic system &lt;/span&gt;do profile, semantically, different facets of that system.&lt;br /&gt;&lt;br /&gt;Methodologically, the authors investigated a small corpus made out of two subcorpora. Their data are multifactoral, based on 4 variables: CORPUS: narrative vs. non-narrative, TEMP_REF: non-past vs. past, ADVERB: 'hai' vs. 'you' vs. 'zai'.&lt;br /&gt;&lt;br /&gt;In principle, Zhuo and Gries' s study reminds me of a paper by Clausner and Croft (1999) and that I briefly mentioned in &lt;a href="http://cognitionandinterlanguage.blogspot.com/2009/02/getting-strated.html"&gt;this post&lt;/a&gt;. Despite the fact that Clausner and Croft are concerned with image-schemas and Zhou and Gries are concerned with schematic -- but yet linguistic, meaning, both studies have in common the idea of a general category including various contrasting members. Clausner and Croft (1999) make a case for image-schematic domains and they argue that image-schemas are a subtype of domain. They also argue that image-schematic domains show internal structure and that the image-schemas included within a specific image-schematic domain stand in various relationships and profile different aspects of the image-schematic domain they belong to. This parallel between the two studies raises the question of whether Zhou and Gries' s methodology (i.e. investigating discourse collocations as a way to differentiate members of a &lt;span style="font-style: italic;"&gt;semantic system&lt;/span&gt;) could be applied at image-schema level.&lt;br /&gt;&lt;br /&gt;Should Zhuo and Gries's methodology be applied to my project, one may speculate that:&lt;br /&gt;&lt;br /&gt;Preffered collocation sets for 'can' and 'may' would generally allow for the identification of individual image-schemas. In the case of 'may' and 'can' as produced in native English , the preferred collocation sets would be expected to confirm Talmy's (1998) finding. In the case of 'pouvoir', the preferred collocation set would be expected to be in line with Achard's (1996) finding. Ultimately, the investigation of the preferred collocation sets for native English 'may'/'can' and native French 'pouvoir' would contribute to the &lt;span style="font-weight: bold;"&gt;identification of the image-schematic representation of 'may' and 'can' in Fr-Eng IL.  &lt;/span&gt;In other words, this method could be useful in the identification of the profiling characteristics of IL 'may'/'can' at conceptual level [by 'conceptual level' I mean pre linguistic level], and in contrast with native English 'may'/'can' and native French 'pouvoir'.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;So in sum, an analysis of the cross-linguistic collocation patterns of 'may', 'can' and 'pouvoir' in native and second language English and native French corpora could provide a way of bridging the &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;contextual&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;, the &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;linguistic&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; and the &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;conceptual&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; in bilingual mental meaning representation. &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-5478481756576053801?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/5478481756576053801/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/bridging-conceptual-and-contextual.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/5478481756576053801'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/5478481756576053801'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/bridging-conceptual-and-contextual.html' title='Bridging the conceptual and the contextual using evidence from corpus data'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-8492091191623800145</id><published>2009-02-24T07:54:00.000-08:00</published><updated>2009-02-24T10:43:50.283-08:00</updated><title type='text'>The point of variability assessment within and between corpora </title><content type='html'>Gries (2006) ( &lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/research/ExploringVariability_Corpora.pdf"&gt;&lt;span style="font-style: italic;"&gt;Exploring variability within and between corpora: some methodological considerations&lt;/span&gt;&lt;/a&gt; ) generally shows that the reliability of corpus findings results from corpus variability/homogeneity assessments both within and between corpora. Variability/homogeneity assessments can be carried out via a range of statistical tests involving the identification, by the analyst, of parameters whose variability will be measured and a chosen level of granularity at which the corpus will be investigated. Ultimately, Gries's method aims to improve descriptive accuracy in corpus-based studies.&lt;br /&gt;&lt;br /&gt;On the basis that "no corpora are alike, and that sometimes different results are reported for very similar corpora (or even the same corpus)" (abstract), Gries addresses three core issues:&lt;br /&gt;&lt;br /&gt;i) "how to identify and quantify the degree of variation coming with one's results" (abst.)&lt;br /&gt;ii) "how to investigate the source of the observed variation in corpora" (abst.)&lt;br /&gt;iii)"how homogeneous one's corpus is with respect to a particular phenomenon" (abst.)&lt;br /&gt;&lt;br /&gt;Although Gries recognises that many quantitative studies limit themselves to reporting word frequencies, such methodology is not the most useful way to approach phenomenon X in corpus Y. Gries points out that those approaches are not sufficient and argues that statistical testing (more on the specific statistical tests relevant to my study in a later post) and subsequent interpretation of the data summarised  are necessary to reach reliable corpus findings:&lt;br /&gt;&lt;br /&gt;"This [methodological] choice seriously limits the range of applicability of these approaches [ word frequency approaches]. First, an approach to corpus homogeneity based on word frequency is much more likely to produce biased results when applied to corpora containing text samples focusing on a particular topic. " [A primary investigation of ICLE FR and LOCNESS has verified the point that depending on the nature of the topics discussed in the corpus, 'may' and 'can' are used more or less frequently. Note: although the topics discussed in ICLE FR and LOCNESS independently, are similar, they are not systematically identical.]&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;link rel="Edit-Time-Data" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_editdata.mso"&gt;&lt;!--[if !mso]&gt; &lt;style&gt; v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} &lt;/style&gt; &lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt; &lt;/style&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:""; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin:0cm; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Times New Roman"; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} &lt;/style&gt; &lt;![endif]--&gt;&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;link rel="Edit-Time-Data" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_editdata.mso"&gt;&lt;!--[if !mso]&gt; &lt;style&gt; v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} &lt;/style&gt; &lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} -&lt;/style&gt;&lt;br /&gt;&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;link rel="Edit-Time-Data" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_editdata.mso"&gt;&lt;!--[if !mso]&gt; &lt;style&gt; v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} &lt;/style&gt; &lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt; &lt;/style&gt;&lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:""; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin:0cm; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Times New Roman"; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} &lt;/style&gt; &lt;![endif]--&gt;&lt;span style=";font-family:&amp;quot;;font-size:12;"  &gt;&lt;!--[if gte vml 1]&gt;&lt;v:shapetype id="_x0000_t75" coordsize="21600,21600" spt="75" preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"&gt;  &lt;v:stroke joinstyle="miter"&gt;  &lt;v:formulas&gt;   &lt;v:f eqn="if lineDrawn pixelLineWidth 0"&gt;   &lt;v:f eqn="sum @0 1 0"&gt;   &lt;v:f eqn="sum 0 0 @1"&gt;   &lt;v:f eqn="prod @2 1 2"&gt;   &lt;v:f eqn="prod @3 21600 pixelWidth"&gt;   &lt;v:f eqn="prod @3 21600 pixelHeight"&gt;   &lt;v:f eqn="sum @0 0 1"&gt;   &lt;v:f eqn="prod @6 1 2"&gt;   &lt;v:f eqn="prod @7 21600 pixelWidth"&gt;   &lt;v:f eqn="sum @8 21600 0"&gt;   &lt;v:f eqn="prod @7 21600 pixelHeight"&gt;   &lt;v:f eqn="sum @10 21600 0"&gt;  &lt;/v:formulas&gt;  &lt;v:path extrusionok="f" gradientshapeok="t" connecttype="rect"&gt;  &lt;o:lock ext="edit" aspectratio="t"&gt; &lt;/v:shapetype&gt;&lt;v:shape id="_x0000_i1025" type="#_x0000_t75" style="'width:432.75pt;"&gt;  &lt;v:imagedata src="file:///C:\Users\Sandra\AppData\Local\Temp\msohtml1\01\clip_image001.emz" title=""&gt; &lt;/v:shape&gt;&lt;![endif]--&gt;&lt;!--[if !vml]--&gt;&lt;!--[endif]--&gt;&lt;/span&gt;&lt;meta equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;meta name="ProgId" content="Word.Document"&gt;&lt;meta name="Generator" content="Microsoft Word 11"&gt;&lt;meta name="Originator" content="Microsoft Word 11"&gt;&lt;link rel="File-List" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"&gt;&lt;link rel="Edit-Time-Data" href="file:///C:%5CUsers%5CSandra%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_editdata.mso"&gt;&lt;!--[if !mso]&gt; &lt;style&gt; v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} &lt;/style&gt; &lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:punctuationkerning/&gt;   &lt;w:validateagainstschemas/&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables/&gt;    &lt;w:snaptogridincell/&gt;    &lt;w:wraptextwithpunct/&gt;    &lt;w:useasianbreakrules/&gt;    &lt;w:dontgrowautofit/&gt;   &lt;/w:Compatibility&gt;   &lt;w:browserlevel&gt;MicrosoftInternetExplorer4&lt;/w:BrowserLevel&gt;  &lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate="false" latentstylecount="156"&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;style&gt; &lt;!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt;&lt;/style&gt;&lt;span style="font-weight: bold;"&gt;Within and between corpora variability&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;- Gries provides a brief literature review of the studies addressing both types of corpora variability.&lt;br /&gt;&lt;br /&gt;- provides statistical evidence that different corpus-based studies on the overall frequency of the present perfect in English bring different results. Such evidence suggests that i) a word frequency approach is not reliable and ii) alternative and more reliable ways to approach the corpus are needed.&lt;br /&gt;&lt;br /&gt;- case studies are presented as detailed exemplifications of the above&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Assessment of variability: limitations for ICLE FR, LOCNESS and CODIF:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Although Gries convincingly argues in favour of corpora variability assessment, practically and in the context of learner corpora, the application of Gries' methodology is not straight forward and can only be applied partially. This leads to the necessity to integrate the word frequency approach and the corpora variability approach  to a single corpus investigation.  The nature corpus investigation in learner language involves the comparison of (at least) two sub data sets: one set compiled of language produced by non-native speakers of language x in language y (the investigated corpus), and one set compiled of language produced in language y by native speakers of language y (used as the control corpus). In terms of background information on the participants, the amount of information provided alongside the investigated and control data sets is not even. ICLE FR, for instance, my investigated corpus, provides background information that would, in the context of a corpora variability assessment, allow me to identify a range of parameters such as: female/male writers, literary/argumentative texts, writing conditions (e.g. exam condition, timed/not timed conditions, use of reference tools/reference tools not allowed). LOCNESS and CODIF, on the other hand, as my control data sets, provide very limited background information on the participants. In the case of LOCNESS, the only identifiable working parameters are genre (i.e. literary/argumentative texts), individual essays/files and negation (although genre is not a very reliable parameter as literary texts are generally under represented in the corpus). As for CODIF, although I have the information that the data set is directly comparable with ICLE FR and LOCNESS, I have no detailed information allowing me to identify workable parameters in view of a corpora variability assessment. So in sum, there are limitations to the application of Gries' methodology to the corpora I am specifically using. Some degree of assessment can, however, be carried out:&lt;br /&gt;&lt;br /&gt;- corpora variability between ICLE FR and LOCNESS:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;possible parameters: individual essays, negation, genre&lt;/li&gt;&lt;/ul&gt;- corpora variability between the ICLE corpus and its French subsection ICLE FR:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;possible parameters: male/female writers, genre, individual essays/files, writing conditions, negation&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;- corpora variability between ICLE FR and CODIF:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;possible parameter: individual essays/files, negation&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;- corpora variability between LOCNESS and CODIF:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;possible parameter: individual essays/files, negation&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;-corpora variability between ICLE FR, LOCNESS, CODIF:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;possible parameter: individual essays/files&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;The above shows that in the case of ICLE FR, LOCNESS and CODIF corpora, the within corpus variability assessment would prove much more thorough and useful than the between corpora variability assessment (incl. ICLE FR vs LOCNESS, ICLE FR vs CODIF, LOCNESS vs CODIF). Ultimately and following Gries, we may speculate that results of the overall investigation could be affected by the limited applicability of his methodology to the project. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;More useful quote:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;"One of the most important concepts within corpus linguistics is variability. Variability is a key issue on several levels, simultaneously. First, variability of always of prime importance when reporting one's results: without an indication of the variability found in one's data, the interpretation of, say, aggregated frequencies/percentages or measures of the central tendency of a single study is usually quite difficult, and the comparison of results between different studies is seriously impaired" (p.110)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-8492091191623800145?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/8492091191623800145/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/point-of-variability-assessment-within.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/8492091191623800145'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/8492091191623800145'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/point-of-variability-assessment-within.html' title='The point of variability assessment within and between corpora '/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-4023603326340554152</id><published>2009-02-23T07:06:00.000-08:00</published><updated>2009-02-28T03:37:21.941-08:00</updated><title type='text'>The corpus-based Behavioral Profile approach to cognitive semantics</title><content type='html'>Gries and Divjak (in press) (&lt;span style="font-style: italic;"&gt;&lt;a href="http://www.linguistics.ucsb.edu/faculty/stgries/research/QuantCognSem.pdf"&gt;Quantitative approaches in usage-based cognitive semantics: myths, erroneous assumptions, and a proposal&lt;/a&gt;) &lt;/span&gt;generally argue in favour of quantitative corpus-linguistics methods in cognitive linguistics. At this stage of my project, Gries and Divjak's paper provides me with methodological tools to combine numbers (i.e. frequency of occurrences of 'may' and 'can') and word senses (i.e. frequency of occurrences of the various senses of 'may' and 'can'). Of particular interest is the attention that the authors pay to cases of polysemy and to cross-linguistic studies.&lt;br /&gt;&lt;br /&gt;Gries and Divjak point out that "cognitive linguistics can only benefit from reducing the subjective element in its methods as much as is feasible" (p.4). For that purpose, the authors propose&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt; &lt;span style="font-weight: bold;"&gt;the Behavioral Profile approach (BP)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;. &lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;Behavioral profiling of lexical items is based in distributional properties captured by percentages and "allows researchers to analyze the BP data using statistical techniques as well as to compare the results to data/results from other studies" (p.8)&lt;br /&gt;&lt;br /&gt;The BP approach is based on &lt;span style="font-weight: bold;"&gt;two assumptions&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;i) "corpus data provides (nothing but) distributional frequencies" (p.4)&lt;br /&gt;ii) "distributional similarity reflects, or is indicative of, functional similarity" (p4)&lt;br /&gt;&lt;br /&gt;[functional similarity = any function of a particular expression, ranging from syntactic to discourse-pragmatic]&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Methodological steps&lt;/span&gt; involved in the BP approach:&lt;br /&gt;&lt;br /&gt;1) &lt;span style="font-weight: bold;"&gt;Retrieval of all instances&lt;/span&gt; of a word's lemma from a corpus in their context.&lt;br /&gt;&lt;br /&gt;2) &lt;span style="font-weight: bold;"&gt;Semi-manual analysis of many properties&lt;/span&gt; of the use of the word forms (following Atkins (1987): morphological characteristics, syntactic characteristics, semantic characteristics. The identification of those features allows to compile ID tags for the word forms).&lt;br /&gt;&lt;br /&gt;3) Generation of a &lt;span style="font-weight: bold;"&gt;co-occurrence table&lt;/span&gt; that specifies which ID tag level is attested how often in percent with each sense of a polysemous word. The columns containing the percentages for each sense is referred to as the sense's behavioral profile.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Application of the BP approach to polysemy &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Gries and Divjak show how the BP approach can assist in answering questions related to the phenomenon of POLYSEMY, such as the &lt;span style="font-weight: bold;"&gt;identification of prototypical senses&lt;/span&gt; of specific lexical items, the &lt;span style="font-weight: bold;"&gt;connection of a particular sense of a polysemous word&lt;/span&gt; to the network of already identified senses, the usefulness of a &lt;span style="font-weight: bold;"&gt;cluster-analytic approach &lt;/span&gt;in the domain of POLYSEMY.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Application to cross-linguistic studies &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This section is of particular interest to me because of the recent addition of the CODIF corpus to my data set. So a semantic study of French and English sub data sets will be carried out.&lt;br /&gt;&lt;br /&gt;Gries and Divjak recognise that "[c]ross-linguistic semantic studies are notoriously difficult given that different languages carve up conceptual space(s) in different ways (cf. Janda, to appear for discussion); for that reason, linguistic dimensions are difficult to compare across languages" (p7)&lt;br /&gt;&lt;br /&gt;[what is meant here exactly by 'linguistic dimensions'?]&lt;br /&gt;&lt;br /&gt;For Gries and Divjak, because the BP approach is based on operationalizable distributional properties, it can be applied to cross-linguistic studies : "concordance lines from different languages can be annotated for a number of common characteristics while at the same time doing justice to any individual languages characteristics and avoiding overly subjective intuitions regarding cross-linguistic semantic differences" (p.7)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The BP approach seems that it could provide a unified model to investigate the semantic domain of POSSIBILITY both cross-linguistically and via polysemous 'may', 'can' and 'pouvoir'.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;References to check out&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;Janda, Laura A. (to appear) What is the role of semantic maps in cognitive linguistics? In Piotr Stalmaszczyk and Wieslaw Oleksy (eds.). &lt;span style="font-style: italic;"&gt;Festschrift for Barbara Lewandowska-Tomaszczyk.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;More useful quotes&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;"(...) the concordance lines of a particular search expression and the uses of the word and their frequencies constitute an objective database of the kind that made-up sentences do not since researchers cannot invent all uses of an expression in a corpus let alone their frequencies of occurrence" (p.3)&lt;br /&gt;&lt;br /&gt;" (...) corpus-linguistics studies meaning in terms of use, which in turn is made tangible  through distribution, and hence lends itself better to quantification." (p.4)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-4023603326340554152?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/4023603326340554152/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/corpus-based-behavioral-profile.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/4023603326340554152'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/4023603326340554152'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/corpus-based-behavioral-profile.html' title='The corpus-based Behavioral Profile approach to cognitive semantics'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-2012819511391141701</id><published>2009-02-20T08:07:00.000-08:00</published><updated>2009-02-20T10:46:55.762-08:00</updated><title type='text'>ICLE and LOCNESS welcome CODIF -- the latest addition to the database</title><content type='html'>Finally coming back after a two-week immersion in the depth of ICLE and LOCNESS!&lt;br /&gt;&lt;br /&gt;The quantitative investigation of the data started with a pilot study comparing the frequency of occurrences of 'may' and 'can' across ICLE and LOCNESS, including comparisons with the frequency of occurrences of the other central modals ('could', 'might', 'must', 'shall', 'should', 'will' and 'would') both in LOCNESS as well as in the other subsections of the ICLE corpus. In the later case, the purpose of the investigation was to find out to what extent the use patterns of 'may' and 'can' in French-English IL reflect those observable in second language English in general. The results from the pilot study proved useful as it became clear that 'may' and 'can' play a role in the profiling of French-English interlanguage through different use patterns. The findings of the pilot study are now recorded in the form of a paper entitled &lt;span style="font-style: italic;"&gt;Investigating the typicality of 'may' and 'can' in a corpus of learner English&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;I am now at a stage where I am trying to zoom into my first general findings to see if there is anything striking there. In order to do that, I have had to laboriously count and record the occurrences of 'may' and 'can' in LOCNESS file by file, pretty much manually by copying and pasting each file into Word and then finding each occurrence in the 324 304 words data set! An exercise that I only wanted to carry out once! So at that point, having made no decision about whether to consider 'may' and 'can' as individual modals or as lemmas -- which would then have included 'may not', 'cannot, 'can't' and (?)'can not' in the study, all forms of the two modals were accounted for (the decision to include 'can not' as an acceptable spelling is still being debated). So far, these are the data sets that I am able to work from:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;- LOCNESS: MAY and CAN (as featuring per essay)&lt;br /&gt;- LOCNESS: MAY NOT, CANNOT, CAN'T (as featuring per essay)&lt;br /&gt;- LOCNESS: MAY and CAN in argumentative and literary texts (as featuring per essay)&lt;br /&gt;&lt;br /&gt;-ICLE FR: MAY and CAN (as featuring per essay)&lt;br /&gt;-ICLE FR: MAY NOT, CANNOT, CAN'T (as featuring per essay)&lt;br /&gt;-ICLE FR: MAY and CAN in argumentative texts (as featuring per essay)&lt;br /&gt;-ICLE FR: MAY and CAN in literary texts ( as featuring per essay)&lt;br /&gt;&lt;br /&gt;-ICLE FR, ICLE (excl FR), LOCNESS: MAY, MAY NOT, CANNOT, CAN'T (as featuring generally across the three data sets -- this count does not include the distinction between individual files/essays)&lt;br /&gt;&lt;br /&gt;- ICLE FR, ICLE (excl FR), LOCNESS: control variable AND&lt;br /&gt;&lt;br /&gt;NB: Tables indicating occurrences of 'may' do not included cases of 'may not'. Cases of 'may not's are only included in relation to cases of 'cannot's and 'can't's. That allows to consider negation as a variable and to investigate its interaction with modality.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Recently, the issue of the usefulness of a native French comparison data set was raised in discussion. Such data set would be particularly helpful at the qualitative stage of the data analysis process and create opportunities for cross-linguistic collocation searches. That way, I would be able to identify what contextual features are generally lexicalised via 'pouvoir' and assess whether those features are also lexicalised via 'may' and 'can' in French-English IL.  In other word, it would allow me to establish whether 'may'/'can' in Fr-English IL carry over some semantic features of 'pouvoir' and if so, in what measure. In order to carry out those collocation searches I was recently granted access to the COrpus de DIssertations Francaises (CODIF) database which is a corpus of native French essay writing (dissertations written by French undergraduates at the University of Louvain, Belgium). The CODIF database was compiled by the Centre for English Corpus Linguistics (CECL) at the Universite Catholique de Louvain, Belgium. The data set counts around 100 000 words.&lt;br /&gt;&lt;br /&gt;From the perspective of the Cognitive Semantics framework, a three-way database (ICLE FR, LOCNESS and CODIF) allows for an investigation of the conceptual domains recruited by 'may', 'can' and 'pouvoir'. As members of the same semantic domain (i.e. POSSIBILITY), do the three modals recruit  the same conceptual domains/frames? What is the nature of the relation between those domains? Does the nature of those relations vary cross-linguistically?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-2012819511391141701?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/2012819511391141701/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/icle-and-locness-welcome-codif-latest.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/2012819511391141701'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/2012819511391141701'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/icle-and-locness-welcome-codif-latest.html' title='ICLE and LOCNESS welcome CODIF -- the latest addition to the database'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-8795528673872902412</id><published>2009-02-08T09:39:00.000-08:00</published><updated>2009-02-08T11:28:08.553-08:00</updated><title type='text'>ICLE and LOCNESS: words and figures</title><content type='html'>A little bit about the data I am using for my project:&lt;br /&gt;&lt;br /&gt;The data is drawn from two corpora: the International Corpus of Learner English (ICLE) and the Louvain Corpus of Native English Essays (LOCNESS).&lt;br /&gt;&lt;br /&gt;ICLE is a corpus of written learner English including essays written by native speakers of Bulgarian, Czech, Dutch, Finnish, French, German, Italian, Polish, Russian, Spanish and Swedish. &lt;strong&gt;ICLE&lt;/strong&gt; counts a total number of &lt;strong&gt;2,500,353 words&lt;/strong&gt; distributed evenly across the eleven national subcorpora. My project focuses specifically on the French subcorpus (namely here &lt;strong&gt;ICLE FR&lt;/strong&gt;) which counts a total of &lt;strong&gt;228,081 words&lt;/strong&gt;. The French subcorpus comprises a further two subcorpora: a subcorpus of &lt;strong&gt;argumentative texts&lt;/strong&gt; -- counting &lt;strong&gt;177,963 words&lt;/strong&gt;, and a subcorpus of &lt;strong&gt;literary texts&lt;/strong&gt; – counting &lt;strong&gt;50,118 words&lt;/strong&gt;. The French subcorpus comprises &lt;strong&gt;347 essays&lt;/strong&gt; averaging 500 words each.  All participants in the ICLE corpus “are university undergraduates in English (usually in their third or fourth year)”, and “the proficiency level ranges from higher intermediate to advanced” (Granger et al., 2002)&lt;br /&gt;&lt;br /&gt;LOCNESS is a corpus of native English essays comparable with ICLE (i.e. the participants are also university undergraduates, essays are averaging the same length and are dealing with similar topics). &lt;strong&gt;LOCNESS&lt;/strong&gt; counts a total of &lt;strong&gt;324,304 words&lt;/strong&gt; and comprises &lt;strong&gt;three subcorpora&lt;/strong&gt;: a &lt;strong&gt;British pupils’ A level essays&lt;/strong&gt; subcorpus of &lt;strong&gt;60,209 words&lt;/strong&gt;, a &lt;strong&gt;British university students’ essays&lt;/strong&gt; subcorpus of  &lt;strong&gt;95,695 words&lt;/strong&gt; and an &lt;strong&gt;American university essays&lt;/strong&gt; subcorpus of &lt;strong&gt;168,400 words&lt;/strong&gt;. Similarly to ICLE, LOCNESS also includes argumentative and literary texts.&lt;br /&gt;&lt;br /&gt;What are the figures telling us so far?:&lt;br /&gt;&lt;br /&gt;Early results of quantitative analyses of ICLE(FR) and LOCNESS have allowed me to establish that the patterns of uses of &lt;em&gt;may &lt;/em&gt;and &lt;em&gt;can &lt;/em&gt;in the French subsection of ICLE do play a role in the profiling of French-English IL. With the help of my kind friend B, statistics expert, I am now planning to continue to approach the data quantitatively and to dig deeper into it by running a number of variance tests that should i) consolidate nicely the results I have so far and ii) provide a much sharper picture of the uses of &lt;em&gt;may/can &lt;/em&gt;in ICLE FR. Results from the variance tests should be ready to be analysed by the end of this week, early next week max.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Soon, the data will be looked into qualitatively -- counting up occurrences of specific meanings of &lt;em&gt;may/can &lt;/em&gt;instead of occurrences of the actual words (I've already started to think about about manual searches of image-schemas in ICLE FR and LOCNESS). However, before I start the process, it might be usefull to check out whether I can pick up a few tips from &lt;strong&gt;Adam Kilgarriff&lt;/strong&gt;(&lt;a href="http://www.kilgarriff.co.uk/"&gt;http://www.kilgarriff.co.uk/&lt;/a&gt;). Particular papers he wrote that could be of interest to me:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.kilgarriff.co.uk/Publications/1997-K-CHum-believe.pdf"&gt;"I don't believe in word senses"&lt;/a&gt; (1997). Computers and the Humanities 31: 91-113.&lt;br /&gt;Reprinted in Practical Lexicography: a Reader. Fontenelle, editor. Oxford University Press. 2008.&lt;br /&gt;Reprinted in Polysemy: Flexible patterns of meaning in language and mind Nerlich, Todd, Herman and Clarke, editors. Walter de Gruyter.  Pp 361-392.&lt;br /&gt;To be reprinted in Readings in the Lexicon  Pustejovsky and Wilks, editors. MIT Press.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.kilgarriff.co.uk/Publications/2007-K-CLLT-grammarlaw.doc"&gt;Grammar is to meaning as the law is to good behaviour&lt;/a&gt; (2007) Corpus Linguistics and Linguistic Theory 3 (2): 195-198.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.kilgarriff.co.uk/Publications/2001-K-CompCorpIJCL.pdf"&gt;Comparing Corpora&lt;/a&gt; (2001) International Journal of Corpus Linguistics 6 (1): 1-37.&lt;br /&gt;Reprinted in Corpus Linguistics: Critical Concepts in Linguistics. Teubert and Krishnamurthy, editors. Routledge. 2007.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.kilgarriff.co.uk/Publications/2004-K-TSD-CommonestSense.pdf"&gt;How dominant is the commonest sense of a word?&lt;/a&gt; (2004) In: Text, Speech, Dialogue. Lecture Notes in Artificial Intelligence Vol.  3206.   Sojka, Kopecek and Pala, Eds.  Springer Verlag: 103-112.&lt;br /&gt;Reprinted in Lexicology: Critical concepts in Linguistics Hanks, editor. Routledge, 2007&lt;br /&gt;&lt;br /&gt;Busy week ahead!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-8795528673872902412?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/8795528673872902412/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/icle-and-locness-words-and-figures.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/8795528673872902412'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/8795528673872902412'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/icle-and-locness-words-and-figures.html' title='ICLE and LOCNESS: words and figures'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5805835432542538049.post-8705634694447265740</id><published>2009-02-04T13:38:00.001-08:00</published><updated>2009-02-04T13:41:51.433-08:00</updated><title type='text'>Getting started ...</title><content type='html'>A year and a half into my PhD project, this blog is long overdue! It will, I hope, serve the purpose of helping me keeping track of my readings and ongoing thoughts as well as helping me to remain focused and ultimately achieve a real sense of direction -- at last! &lt;br /&gt;&lt;br /&gt;Here is a little bit of background for my research: &lt;br /&gt;&lt;br /&gt;The specificity of my project lies in that it brings together the fields of Interlanguage and Cognitive semantics. &lt;br /&gt;&lt;br /&gt;First, interlanguage: &lt;br /&gt;&lt;br /&gt;Interlanguage, as defined by the OED refers to ‘a linguistic system typically developed by a student before acquiring fluency in a foreign language, and containing elements of either his or her native tongue and of the target language’. So broadly, interlanguage could be considered as a sort of hybrid of two linguistic systems. Effectively, there are many types of interlanguage, depending on the native language of the speaker and his/her second language. My research focuses particularly on the French-English type of interlanguage where the speakers’ first language is French and their second language is English. &lt;br /&gt;&lt;br /&gt;The case of Interlanguage is currently raising some interest in the fields of psycholinguistics and neuroscience as researchers are trying to identify the nature of the relations between L1 and L2 in the bilingual mind (e.g. Obler 1993, Snellings 2002, Finkbeiner, Almeida, Janssen and Caramazza 2006, Kovelman, Baker and Pettito 2008). Recent research in neurolinguistics (Kovelman, Baker and Pettito 2008) supports the existing view that “bilinguals have differentiated neural representations of their two languages” (p. 165). Further, another recent study concerned with the selection of lexicon in bilingual speech production, Finkbeiner, Almeida, Janssen and Caramazza (2006), recognises the potentially complicated process of bilingual lexical access in which “concept selection serves to activate two lexical representations to an equal extent” (p. 1075). In other words, there is the possibility of interference between the bilingual’s two linguistic systems. This view is generally recognised in cross-linguistic investigations on interlanguage and second language (L2) knowledge organisation. However, the issue of cross-linguistic interference from a semantic perspective remains under-investigated.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;My project offers to investigate first language (L1) and L2 interferences using the cognitive semantics framework. So I am looking at possible interferences of L1 and L2 at conceptual level. Cognitive linguists, generally, are concerned with language use in relation to conceptual representation (conceptual structure (i.e. knowledge representation) and conceptualisation processes (i.e. meaning representation), and they postulate that our bodily experiences contribute to the way we conceptualise the physical world. &lt;br /&gt;&lt;br /&gt;On that basis, Image-Schemas have been recognised as one possible cognitive process that reinterprets sensory information as conceptual representation. So Image-schemas are like analogue representations of perceptual states from which lexical meanings can derive and they profile word meanings. Talmy (1981) argues that the meanings of the English modals (may, can, must, etc.) derive from the experiential domain of force dynamics which itself includes a number of Image-Schemas: compulsion, restraint, enablement, blockage, counterforce, attraction, resistance. The literature recognises MAY as referring to the Image-Schema of ‘removal of restraint’ and CAN as referring to that of ‘enablement’. The semantic domain of force dynamics is also applied to the French modal verb POUVOIR in Achard (1996). It is worthy to note here that French doesn’t differentiate lexically between MAY and CAN. Both lexical forms are included under the umbrella of POUVOIR. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;According to Lakoff (1987), Image-Schemas can be transformed which means that shifts in the profiling of specific lexical items can take place and thus allow for semantic shifts to be observed. I here question whether those shifts (image-schematic and ultimately semantic shifts) can be equally observed in French-English interlanguage, on the basis of the cognitive economy principle. One way to start tackling the question is to carry out a quantitative analysis of the corpus to find out whether the schemas of ‘enablement’ and ‘restraint’ are activated in equal frequency by French English learners and native English speakers. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Generally, within the Cognitive Linguistics framework it is assumed that the meanings of linguistic forms are understood relative to background/encyclopaedic knowledge. In other words, they are understood as part of a specific experiential domain. Clausner and Croft (1999) make a case for Image-Schematic domains and they argue that Image-Schemas are a subtype of domain. They also argue that Image-Schematic domains show internal structure and that the Image-Schemas included within a specific Image-Schematic domain stand in various relationships. Their argument leads to the speculation that i) POUVOIR, MAY and CAN are all included in the same experiential domain of force dynamics, ii) as separate lexical items, they profile word meanings in different ways and iii) as part of a common structured domain they stand in various relationships. This implies that theoretically, image-schematic shifts could take place cross-linguistically, thus allowing to speculate that cross-linguistic semantic shifts take place at conceptual level. &lt;br /&gt;&lt;br /&gt;At this point, a question would be: how do image-schematic shifts (i.e. transformations) take place? Clausner and Croft argue that Image-Schema transformations are the result of the mapping of one image-Schema onto another (1999:23). A few months ago, as an experiment, I started exploring the idea of possible Image-Schema mappings between MAY, CAN and POUVOIR. Although the idea needs to be further investigated, early results seemed to prompt towards a possible metonymic relation between the Image-Schema profiled by POUVOIR and those profiled by MAY/CAN. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Now the corpus data really needs to be scrutinised quantitatively and qualitatively! More on that in the next post …&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5805835432542538049-8705634694447265740?l=cognitionandinterlanguage.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cognitionandinterlanguage.blogspot.com/feeds/8705634694447265740/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/getting-strated.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/8705634694447265740'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5805835432542538049/posts/default/8705634694447265740'/><link rel='alternate' type='text/html' href='http://cognitionandinterlanguage.blogspot.com/2009/02/getting-strated.html' title='Getting started ...'/><author><name>sandra</name><uri>http://www.blogger.com/profile/01848338272106760827</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry></feed>
