MC2: Computational Linguistics

Speaker:

Walter Daelemans

Content:

Computational Linguistics is part of Artificial Intelligence, and relies heavily on insights
and techniques from Knowledge Representation, Search, and Machine Learning (statistics). At the same time the field also borrows heavily from theory, methodology and results in Linguistics, Psycholinguistics, and Cognitive Science in general. In this course I will illustrate this multidisciplinary feature of the field. I will sketch a brief history of the field, how it evolved, and the current state of the art in computational modeling of morphology, syntax, semantics and discourse. Computational Linguistics started out as rich in linguistic knowledge and based on handcrafting. Currently, the field is dominantly statistical, corpus-based (data-oriented), and knowledge-poor. I will focus on current trends trying to bring deeper linguistic knowledge back to the field in hybrid approaches, and discuss whether this is a good idea.

To illustrate the interdisciplinary concepts, I will describe three application areas in some
detail.

  1. Computational modeling of human language acquisition and processing. Computational psycholinguistics is an interesting interdisciplinary area of computational linguistics that combines the insights and meyhodologies from corpus linguistics, psycholinguistics and computational modeling. I will illustrate this on the basis of work on morphological acquisition and processing.
  2. Computational Stylometry. The data-oriented, machine learning based approach to
    language processing has led to applications concerned with finding the identity of the author of a text, or properties of authors of texts (e.g. gender, age, personality, region, etc.). These techniques, trying to characterize the style of (groups of) authors in terms of fairly superficial linguistic features is useful in areas as diverse as semantic web meta-tagging, literary studies, and forensics.
  3. Negation and modality. As an illustration of the principle that deep analysis of text can be done with statistical, knowledge-poor techniques, I will describe a corpus-based statistical approach to analyzing negation and speculation (modality) in language.

Disciplines:
Computer Science, Artificial Intelligence, Cognitive Science, Linguistics, Psycholinguistics,
Corpus Linguistics


References:

  • Jurafsky & Martin, Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Second Edition. 2008. Website.
  • Bird, Klein & Loper, Natural Language Processing with Python. 2009. NLTK website. Free on-line version of the book is available at http://www.nltk.org/book.
  • Clark, Fox, Lappin, The Handbook of Computational Linguistics and Natural Language Processing (Blackwell Handbooks in Linguistics). 2010.

CV:
Walter Daelemans studied linguistics and psycholinguistics in Antwerpen and Leuven and
trained as a computational linguist in Nijmegen and the Brussels AI-lab. While teaching
computational linguistics at the University of Tilburg, he created a research group on Machine Learning of Language (ILK) and started investigating among other things memory-based learning approaches to language processing. He is currently professor of computational linguistics at the University of Antwerp. http://www.clips.ua.ac.be/~walter/

 

 

Last update: 26.01.2011, Webadmin