top of page

Our main lines of work are:

 

● Theoretical and practical research on grammar formalisms and different parser types: among others, dependency analysis, categorial grammars, HPSG grammars, to reach a consensus on the type of formalism and analyzer which will implement the achievements.

● Survey and integration of lexical resources from different sources and possibly different languages​​: integration of verbal lexicons, annotated corpora, extrapolation from annotated corpora in one language to another language, using comparable corpora for terminology extraction.

● Feasibility study of applying machine learning techniques to the chosen formalism and the enriched lexicon.

 

In the groups involved in this project there are a number undergraduate and graduate students working in the areas of work listed above.

 

Specific lines of work

 

LIGM (Takuya Namakura) and UFSCar (Oto Vale) are working in collaboration on a comparison between lexicon-grammar tables of frozen expressions of French and of Brazilian Portuguese (Vale, 2001), in order to prepare their possible conversion into an LGLex syntactic lexicon and integration in a parser.  Lexicon-grammar tables (Gross, 1975) are currently one of the major sources of syntactic lexical information for the French language, and some exist for other languages, such as Italian, Brazilian Portuguese, Modern Greek, Korean, Romanian, and others.

 

From October 1st, 2014, Eric Laporte co-supervises the doctoral thesis of Aline Evers (UFRGS) on application of multilingual resources for clustering of texts in Portuguese.

 

Matthieu Constant uses supervised machine learning and dictionaries of MWEs in order to detect MWEs in the context of syntactic parsing. He experimented on French (Candito & Constant, 2014) and Serbian (Constant et al., 2014) and the technique can be transferred to Spanish and Portuguese.

 

Bibliographical references
  • Marie Candito, Matthieu Constant. 2014. Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing. ACL. Baltimore. 

  • Matthieu Constant, Cvetana Krstev, Dusko Vitas. 2014. "Joint Compound/Named Entity Recognition and POS Tagging for Serbian: Preliminary Results", poster, Parseme meeting, Frankfurt.

  • Maurice Gross. 1975. Méthodes en syntaxe : Régime des constructions complétives. Hermann. Paris, France.

  • Oto Araújo Vale. 2001. Transparência e opacidade de expressões cristalizadas. In HIRATA-VALE, Flávia B.M. Anais do IV Seminário Nacional de Literatura e Crítica e do II Seminário Nacional de Lingüística e Língua Portuguesa, Goiânia Gráfica e Editora Vieira, 2001, p. 240-246 

 

 

 

The PLN group at UdelaR, Montevideo, has been working on different issues related to the proposal in this project. On the one hand, between 2000 and 2002 we worked on the construction of a partial parser that segments sentences into clauses, for Spanish (Clatex) and for French (Propos).

 

The group will be working on the analysis of the main grammatical formalisms that are applicable to our interests in NLP (constituent grammars, dependency grammars, HPSG grammars and categorial grammars), together with a study of existing parsers and their performance. This has been an area of ​​interest to the group and there are currently some theses on these issues. In particular, one master's thesis involves the development of a statistical parser based on HPSG.

On the other hand, another line that has been investigated in the framework of different projects within the group is the identification of events. Identifying events in texts has some aspects in common with parsing, because a reference to an event tends to have a strong correspondence with a clause at syntactic and semantic analysis level, as the distinctive element of the event is the predicative element of a linguistic utterance. In this sense, the identification of nominal events influences the syntactico-semantic analysis of text.

 

 

MoDyCo is currently working on a methodology for annotating variations in enunciative and modal commitment in a text. We are developing a 

corpus  of  French  newswire  texts automatically annotated  with  enunciative  and  modal  commitment  information. The annotation scheme we propose is based on the detection of predicative cues -  referring to an enunciative and/or modal variation - and their scope at a sentence level. The results of the evaluation show  that the most challenging task is not to find the  predicative  cues  but  to  delimit  their  scopes  and  beyond this delimitation question to define how  to assess whether a scope is correct or not. Next  step of our work is to launch a larger annotation  campaign involving more human annotators and  a  bigger  corpus.  In  this  second  step,  our  model  will integrate discursive  cues  that  can  impact  more than a single sentence.

bottom of page