Thursday 18 June 2009

Profile-based methodology for the comparison of language varieties

In this post, I would like to briefly point out the usefulness of 'profiling' methods for corpus data investigation. For that purpose I specifically refer to a paper entitled Profile-based linguistic uniformity as a generic method for comparing language varieties (2003), authored by Dirk Speelman, Stefan Grondelaers and Dirk Geerearst. The authors' paper is inspired by studies in language varieties and research methods currently used in dialectometry. For my own purposes, it is interesting to note that the authors make a case for the validity of profile-based linguistic methodology for corpus-data investigation as the annotation process of my data will include profiling occurrences of may, can and the lemma pouvoir.

In their paper, the authors present "the 'profile-based uniformity', a method designed to compare language varieties on the basis of a wide range of potentially heterogeneous linguistic variables" (abst.) The aim of the authors is to show that profiling investigated lexical items contributes to the identification of global dissimilarities between language varieties on the basis of individual variables which are ultimately summarised in global dissimilarities. Such process allows language varieties to be clustered or charted via the use various multivariate techniques.

Unlike standard methods of corpus investigation, namely frequency counts, the profile-based method "implies usage-based, but add another criterion. The additional criterion is that the frequency of a word or a construction is not treated as an autonomous piece of information, but is always investigated in the context of a profile."(p.11)

The profile-based approach assumes that mere frequency differences in a corpus contribute to the identification of differences between language varieties. According to the authors, the profile-based approach presents two advantages: the avoidance of thematic bias and the avoidance of referential ambiguity.

For the purpose of my project, the author's paper generally supports my methodological choice to semantically profile the occurrences of may, can and lemma pouvoir as found in my data. However, in their case study (see paper on p.18) the authors choose to take an onomasiological perspective (i.e. to use a concept as a starting point, and then investigate which words are associated with that concept). My project, on the other hand, takes on the opposite perspective, namely the semasiological approach which in the first instance considers individual words and looks at the semantic information that may be associated with those words. Inevitably, such difference in approaching the word/sense/concept interface leads to differing acceptations of the term 'profile' as both onomasiological and semasiological perspectives have different starting points. In that respect, the authors consider '[a] profile for a particular concept or linguistic function in a particular language variety [to be] the set of alternative linguistic means used to designate that concept or linguistic function in that language variety, together with their frequencies" (p.5)

For the purpose of my project, the term profile necessarily needs to be defined at word level and needs to incorporate the elements of sense and morpho-syntactic information. In that regard, the Behavioural Profile methodology proposed by Gries and Divjak in Quantitative approaches in usage-based cognitve semantics: myths, erroneous assumptions, and a proposal (in press) is an appropriate methodology for my project. Broadly, the BP methodology involves the identification of both semantic and morpho-syntactic features characteristic of the investigated lexical item, as found in the data. Ultimately, these identified features are used as linguistic variables and are investigated statistically. In the BP model, the identified features are referred to and processed as ID tags, each one of which contributes to the profiling of the lexical item under investigation.

To sum up, Speelman, Grondelaers and Geeraerst's paper provides me here not only with the opportunity to reflect on the notion of 'profiling' in the context of corpus-data investigation but also with the opportunity to consider the notion in the perspective of my own study.







No comments:

Post a Comment