Spoken Language Understanding

28 03 2007

Dilek Hakkani-Tür, ICSI and Gokhan Tur, SRI International

Abstract

Understanding language is about extracting the “meaning” from natural language input. Although meaning is the holy grail to not only linguistics but also to philosophy, psychology, and neuroscience, [1] in the last decade a variety of practical language understanding systems have been built. Language understanding tasks include information extraction, topic detection and tracking, question answering, summarization, information distillation and understanding in spoken dialog systems.

These systems mostly use task-specific meaning representations, hand-crafted based on the requirements of the application. An example is the user intents used in some customer care spoken dialog systems [2]. On the other hand, in the computational linguistics domain, task independent semantic representations have been proposed since the last few decades. Two notable studies are the FrameNet [3] and PropBank [4] projects.

One of the biggest challenges of spoken language understanding is the characteristics of naturally spoken language, which varies greatly orthographically and incorporates prosody and syntax. The same meaning can be expressed in many different surface forms and also the same surface form can express many different meanings. Another challenge for spoken language understanding is robustness to noise in the input resulting from the errors in the speech recognizer output and the disfluencies in spontaneously spoken language. Furthermore, one has to deal with the lack of typographic cues such as paragraphs and punctuation in the speech recognizer output.

In this tutorial, we first begin by describing spoken language understanding and summarizing the challenges. We briefly present related work on domain-dependent and independent meaning representations. We then describe and present the state-of-the-art for some of the popular language understanding tasks in detail. We have categorized the SLU tasks into 2. Lower-level tasks are typically enabling technologies which are then used by higher-level SLU tasks. For example named entity extraction is a basic understanding task whose output is used in distillation or spoken dialog systems.

We hope that this tutorial will provide an introductory knowledge of the existing SLU tasks and the state-of-the-art methods used for each of them. We will compare and contrast these with each other as well as well know speech processing tasks on which the audience may have better knowledge.

  • Introduction (30 minutes)
    • What is understanding?
    • What are the challenges?
    • Additional challenges when dealing with speech
  • Semantic Representations (30 minutes)
    • Task-dependent representations (e.g. user intents)
    • Task-independent representations (e.g. PropBank, FrameNet)
  • Survey of recent work on understanding tasks (2 hours)
    • Lower-level SLU tasks: (1 hour)
      • Sentence Segmentation
      • Information Extraction
        • Mostly covering MUC [5] and ACE [6] for
          • Named-Entity Extraction
          • Co-reference resolution
          • Event determination
      • Topic/Sub-Task Segmentation/Detection/Tracking
        • Mostly covering TDT [7]
      • Dialog Act Tagging
    • Higher-level SLU tasks: (1 hour)
      • Understanding in Spoken Dialog Systems
        • For Human/Machine SDS
        • For Human/Human SDS
      • Question Answering
        • Mostly covering TREC QA track [8] and AQUAINT [9]
      • Summarization
        • Mostly covering DUC [10]
      • Information Distillation
        • Mostly covering GALE [11]
  • Conclusions

References

[1] R. Jackendoff. Foundations of Language. Chapter 9. Oxford University Press. 2002.
[2] N. Gupta, G. Tur, D. Hakkani-Tur, S. Bangalore, G. Riccardi, M. Rahim. The AT&T Spoken Language Understanding System. In the IEEE Transactions on Speech and Audio Processing. Vol. 4, No. 1, pp. 213-222, January 2006.
[3] J. B. Lowe, C. F. Baker, C. J. Fillmore. A Frame-Semantic Approach to Semantic Annotation. Proceedings of the ACL - SIGLEX Workshop. Washington D.C. April 1997.
[4] P. Kingsbury and M. Marcus and M. Palmer. Adding Semantic Annotation to the Penn TreeBank. Proceedings of the HLT workshop. San Diego, CA. March 2002.
[5] Proceedings of the 7th Message Understanding Conference (MUC-7), Fairfax, VA, April 1998.
[6] “Automatic content extraction (ACE),” http://projects.ldc.upenn.edu/ace
[7] Charles L. Wayne, “Topic Detection and Tracking (TDT) Overview and Perspective,” in Proceedings of the DARPA Broadcast News Tracsription and UnderstandingWorkshop, Lansdowne, VA, June 1998.
[8] ARDA Aquaint Programme, http://www.ic-arda.org/InfoExploit/aquaint
[9] “Text retrieval conference (TREC),” http://trec.nist.gov
[10] “Document understanding conference (DUC),” http://www-nlpir.nist.gov/projects/duc
[11] “Global Autonomous Language Exploitation (GALE), http://www.darpa.mil/IPTO/programs/gale

Speaker Biographies

Dilek Hakkani-Tür is a senior researcher at ICSI speech group. Prior to joining ICSI, she was a senior technical staff member in the Voice Enabled Services Research Department at AT&T Labs-Research in Florham Park, NJ. She received her BSc degree from Middle East Technical University, in 1994, and MSc and PhD degrees from Bilkent University, Department of Computer Engineering, in 1996 and 2000, respectively. Her PhD thesis is on statistical language modeling for agglutinative languages. She worked on machine translation during her visit to Carnegie Mellon University, Language Technologies Institue in 1997, and her visit to Johns Hopkins University, Computer Science Department, in 1998. In 1998 and 1999, she visited SRI International, Speech Technology and Research Labs, and worked on using lexical and prosodic information for information extraction from speech. In 2000, she worked in Natural Sciences and Engineering Faculty of Sabanci University, Turkey. Her research interests include natural language and speech processing, spoken dialog systems, and active and unsupervised learning for language processing. She co-authored more than 50 papers in natural language and speech processing. Dr. Hakkani-Tür is the organizer of the NAACL’04 and AAAI’05 Workshops on SLU, and the editor of the Speech Communication Special Issue on SLU. She has also given tutorials at EuroSpeech’03 and ACL’04 conferences on adaptive learning for spoken language understanding systems. She is a member of ISCA, IEEE, Association for Computational Linguistics and an associate editor of IEEE Transactions on Audio, Speech and Language Processing.

Gokhan Tur was born in Ankara, Turkey in 1972. He received his B.S., M.S., and Ph.D. degrees from the Department of Computer Science, Bilkent University, Turkey in 1994, 1996, and 2000 respectively. Between 1997 and 1999, he visited the Center for Machine Translation of CMU, then the Department of Computer Science of Johns Hopkins University, and then the Speech Technology and Research Lab of SRI International. He worked at AT&T Labs - Research from 2001 to 2006. He is currently with the Speech Technology and Research Lab of SRI International. His research interests include spoken language understanding (SLU), speech and language processing, machine learning, and information retrieval and extraction. His work has been published in several refereed journals, and presented in more than 30 international conferences. Dr. Tur is the organizer of the HLT-NAACL 2007 Workshop on Spoken Dialog Technologies, and the HLT-NAACL 2004 and AAAI 2005 Workshops on SLU, and the editor of the Speech Communication Special Issue on SLU in 2006. He is also the spoken language processing area chair for IEEE ICASSP 2007, spoken dialog area chair for HLT-NAACL 2007, finance chair for IEEE/ACL SLT 2006 Workshop, and SLU area chair for IEEE ASRU 2005 Workshop. Dr. Tur is a senior member of IEEE, ACL, and ISCA, and a member of IEEE Signal Processing Society (SPS), Speech and Language Technical Committee (SLTC) for 2006-2008. Dr. Tur is also an editor of the IEEE SLTC Newsletter.


Actions

Informations

Leave a comment

You must be logged in to post a comment