TRAINS-95 Task: - interactive route planning Architecture: - speech recognition/generation - including a post-processor for recognition error correction (p.3) and a parser (p.5) - discourse manager - maintains a goal stack , including goal motivating the sement, object focus and history list for segment, and status of problem solution (p.5) - reference resolver - identifies unresolved objects, may correct parser (p.6) - verbal reasoner - matches patterns in input speech acts and discourse state, generates responses (p.6) - problem solver - deals with plan extensions and corrections (p.6) - domain reasoner - plans routes (p.6) Theoretical Background: - use overall structure of plan-based systems, which are only ones that can account for interactions, but use domain-specific reasoning for real-time performance (p. 1) Evaluation: - contrasted speech and keyboard input to test robustness of spoken dialogue (p.7,8) TRAINS-93 Task: - interactive route planning Architecture: - Linguistic Reasoning (Parser, Deindexing, Conversation Act Analysis) using Episodic Logic (p.17) - Plan Reasoning (Dialogue Manager, Domain Plan Reasoner, Execution Planning and Monitoring, NL generator) using Event-Based Temporal Logic (p.17) Theoretical Background: - use planning mechanisms and BDI-like architecture (p. 6) Evaluation: - discusses validity as a research platform, and limitations as a working system (p. 47-48) ARTIMIS Task: - information services, spoken query Architecture: - rational unit - first-order modal logic (p. 1031) - natural language input/output (p. 1033) Theoretical Background: - generic axioms in a first-order modal logic that model rational behavior (p.1030) Evaluation: - none presented VERBMOBIL Task: - speech-to-speech translation Architecture: - dialogue sequence memory: information about speaker and utterance (p.2) - intentional structure - built by plan recognizer, displays phase of the dialogue (p. 2) - thematic structure - displays integrated time/date information (p.2) - FSA for clarification dialogues (p.2), FSA or plan recognizer for dialogue modelling (p. 7) Theoretical Background: - "the solution to dialog processing...[depends]...on a combination of several simple, efficient, and robust components." (p.1). Combines knowledge-based and statistical methods. Evaluation: - none presented >From soller+@pitt.edu Wed Jan 23 09:19:40 2002 Date: Tue, 22 Jan 2002 18:20:03 -0500 From: Amy Soller To: roque@pitt.edu Subject: questions for wed. allen et al. 1994 (TRAINS) 246 phrase structures and 1844 lexical rules. Amazing! Is there some way of automating this acquisition process? Also, I don't understand exactly how their algorithm gets around the problem of how the constituent probabilities are not independent of the input length. sadek at al., 1997 Isn't it important that an agent know whether or not its intention is to convince another agent to achieve a proposition, or to achieve the goal itself? It seems that the authors' model could capture this, but it also seems that they claim it's not critical. allen et al. 1996 I would like to know the additional computational complexity, or time, the Speech post processor adds to Spinx II. >From hcl@cs.pitt.edu Wed Jan 23 09:19:48 2002 Date: Tue, 22 Jan 2002 20:26:18 -0500 From: H. Chad Lane To: Antonio Roque Cc: H. Chad Lane Subject: 3710 questions Robust Understanding in a Dialogue System (Allen96): In the sample dialogue on p.2, the system asks for help "choosing a route from Montreal to Lexington". I was hoping for an explanation of the decision to ask for help. Was this done for efficiency? to keep the user involved? a little of both? something else? The output of the parser is a minimal cover of speech acts for the utterance (p.5). Contrasting this with what is stated in Allen94 (p.35) regarding conversation act analysis (which states that each utterance contains acts at the four levels of turn-taking, grounding, core speech acts, and argumentation), something seems to conflict here. On one hand, we want to minimize , but on the other we're saying each utterance contributes at all levels. Is this a problem? Maybe a more appropriate question: what exactly is the relationship between conversation analysis and speech act analysis? -- Insights into ... VERBMOBIL (Alexandersson97) It is totally unclear to me how this is a translation system. The point of the paper seems to be that dialogue understanding can contribute to ambiguity resolution (which is cool), but I don't see how translation fits into this, and their example involves no translation at all (both speakers are German!). It's likely that this is an irrelevant issue since the points made seem very general. -- ARTIMIS... (Sadek97) Basing communication acts on rational behavior seems very reasonable to me... but I have to wonder, what sort of limitations does this commitment result in? Comedians are often completely irrational. Psychiatrists don't always follow common sense and rationality when working with patients. Kids are frequently irrational when talking to other kids or parents, etc. What if (someday) we want simulate irrational behavior (which I propose is more common, and easier, in language than in physical action)? I know this is sort of a stretch, but if Sadek et. al. are seeking a theoretical foundation for all dialogue behavior, then I think it's ok to ask this question. >From mbell@cs.pitt.edu Wed Jan 23 09:20:07 2002 Date: Wed, 23 Jan 2002 00:03:05 -0500 (EST) From: Matthew T. Bell To: roque@pitt.edu Cc: Diane J. Litman Antonio, Here are my questions for tomorrow. Best, Matt -------------------------------------- Nu sculon herigean heovonrices Weard Meotodes mighte ond His modgethanc Now we must praise the Kingdom of Heaven's Sovereign, The Measurer's power and his completed design --First two lines, Caedmon's Hymn --------------------------------------- ============================================================ Sadek et. al The authors take a key-word, almost info-extraction aproach to detecting the topic of conversation and planning a response. Has any work been done showing where this limits dialog systems, or vice versa, theoretical work showing where e.g. knowledge of syntax is required for dialog systems of a given complexity. Put another way, does any extant theory of discourse or dialog specify a heirarchy of complexity similar to Chomsky's well-known heirarchy of grammars? ============================================================ Allen et al (AI, 1995) The paper states that they built both the dialog system itself and a TRAINS world simulator in which the TRAINS intelligent agent would go, subsequent to interaction with the human, and "communicate" with the operators at different train stations. Why was this done? It doesn't seem to be directly related to or aid the task of building the dialog system itself, and computer-computer interaction, although interesting, seems somewhat far afield from the other system goals. ============================================================ Allen et. al. (ACL, 1996) The system was evaluated in part by measuring the degree to which humans chose to use the voice interface rather than the keyboard interface. Although most chose the voice interface, some chose the keyboard interface instead, believing that such was more "efficient" despite evidence that such was not the case (presumably in terms of the amount of type spent). Has research addressed the question of how humans measure "efficiency" w.r.t. the sorts of tasks to which dialog systems are applied? Perhaps human interlocuters are influenced by factors other than strict time utility in measuring coversation desirability? ============================================================ Alexandersson et. al. They mention several times that a key element of their system was that it, if I understood correctly, would attempt to predict human conversational moves. Why was this done? Did it aid in computational efficiency when processing later utterences? >From alandale@cs.pitt.edu Wed Jan 23 09:20:16 2002 Date: Wed, 23 Jan 2002 08:30:58 -0500 (EST) From: Alan D. Berfield To: roque@pitt.edu Subject: ai questions Robust Understanding How much is the error rate reduced with the use of the fertility model mentioned at the end of section 4 (speech recognition error correction)? What are some examples of productions in the grammar of the speech act parser? How fast is this parser? VERBMOBIL It seems that there is a set collection of semantic tags (GREET, INIT_DATE, etc.)? How was this set determined? How difficult was it to come up with this set, and how hard is it to modify it? ARTIMIS In the section on mental attitue transfer, there are axioms for setting beliefs, but all of them seem to require that a conflicting belief not already be held. So how does an agent change an existing belief? The TRAINS Project I did not fully understand the comment about temporal databases being structured into "chains." What exactly does this mean? >From andyg@cs.pitt.edu Wed Jan 23 09:20:37 2002 Date: Wed, 23 Jan 2002 08:39:41 -0500 (EST) From: Andy P. Gaydos To: Antonio Roque Subject: Dialogue Systems Question Clarification/General Question: For the TRAINS system, it seemed there were a lot of errors from the speech recognition output. Is it possible to make use of the context that is used during parsing or planning at the speech recognition stage? Does the speech recognition system output sounds/pauses or words/fragments? >From rwilson+@pitt.edu Wed Jan 23 09:20:56 2002 Date: Tue, 22 Jan 2002 08:47:27 -0500 From: Roy Wilson To: "roque@pitt.edu" , Roy Wilson Subject: Questions (1) Suppose you are designing a speech-to-speech, mixed-initiative, Intelligent Tutoring System for physics called PITS (aka Pie-In-The-Sky :-)). How would PITS go about planning communicative acts? (2) Although TRAINS-93 could plan speech acts, it couldn't handle incompletely parsed text input. Allen, et. al., proposed running shallow and deep understanding processes in parallel and adjudicating conflicts. Was this necessary and was it an example of good/bad science/engineering? (3) Although the "Robust Understanding ... " paper describes of a sumbolic/statistical approach that reduces the likelihood of an incomplete parse in TRAINS-96, it seems that the ability to predict the next dialog act (a la VERMOBIL) provides a somewhat better basis for handling incomplete parses that actually occur. Do you agree? -- Roy Wilson AI Programmer/Natural Language Generation rwilson@pitt.edu CIRCLE Group LRDC #701 Learning Research and Development Center 412-624-7464 University of Pittsburgh >From mrotaru@cs.pitt.edu Wed Jan 23 09:21:05 2002 Date: Wed, 23 Jan 2002 09:02:32 -0500 (EST) From: Mihai Rotaru To: roque@pitt.edu Subject: CS3710 questions Robust Understanding in a Dialog System (J. Allen) & VERMOBIL (J. Alexandersson) Q: Integrating the prediction module from VERMOBIL in SPEECHPP module from TRAINS95 will further enhance speech recognition accuracy? Details: One of the most interesting parts from TRAINS 95 architecture is (for me) the speech post processing (SPEECHPP) module (section 4). It is used to "adapt" the speech recongnier to the inteded domain thus boosting its accuracy. On the other hand, VERMOBIL has a prediction module used to predict next dialog acts (section 5). I was wondering how a prediction module from VERMOBIL can be integrated on the SPEECHPP to further increase speech recognizer accuracy. Mihai Rotaru >From goldin+@pitt.edu Wed Jan 23 09:21:30 2002 Date: Wed, 23 Jan 2002 09:18:00 -0500 From: Ilya Goldin To: roque@pitt.edu Subject: dialog systems q's, 1/23/02 [ The following text is in the "iso-8859-1" character set. ] [ Your display is set for the "US-ASCII" character set. ] [ Some characters may be displayed incorrectly. ] ARTIMIS: Natural Dialogue meets Rational Agency. Sadek, P. Bretier, F. Panaget. Proceedings of IJCAI '97, 1997. In this paper we get a definition of island-driven parsing, a technique we first saw in the previous AGS/ARTMIS paper by Sadek & De Mori. "Island-driven parsing simply means finding small syntactic structures in the text, with as few long-range dependencies as possible. ... The result is a set of /mentioned concepts/, or a list of possible alternatives when overlapping phrases yield nondeterminism." (section 4.1) This is in some ways a reasonable extension of the concept of stopword removal, but it also seems more dramatic because it eliminates semantic heavyweights like verbs. Is there something about the AGS domain that permits the use of this technique? Would it be applicable in a tutoring system? In analyzing human-human CMC? --- The TRAINS Project. James F. Allen et al. Journal of Experimental and Theoretical AI, 1995. A Robust System for Natural Spoken Dialogue. James F. Allen, Bradford W. Miller, Eric K. Ringger, and Teresa Sikorski. Proceedings of the Association for Computational Linguistics (ACL), 1996. Insights into the dialogue processing of Verbmobil. Jan Alexandersson, Norbert Reithinger, and Elisabeth Maier. Proceedings of the Conference on Applied Natural Language Processing (ANLP), 1997. In all these papers we begin to see a number of system architectures, sometimes similar, sometimes orthogonal. The early TRAINS-93 system is close to a bare minimum as far as architecture is concerned: parser and deindexer, conversation act analyzer, dialogue manager, domain plan reasoner, execution planner and monitor, and NL output generator. The other systems modify this skeleton to suit their own objectives. Although system design is obviously largely goal-driven, many functions (and thus components) begin to repeat. What are the essential characteristics of a dialog system that let us call it that name? What are the basic functions, and which of them can be factored out for reuse elsewhere? -- Ilya Goldin >From twilson@cs.pitt.edu Wed Jan 23 09:21:41 2002 Date: Wed, 23 Jan 2002 09:20:07 -0500 (EST) From: twilson@cs.pitt.edu To: roque@pitt.edu Subject: question for today In reading about these real world/demonstration systems, the dialogues that they can understand and reason about are impressive. It seems that the architecture and modules of many of these systems could be easily ported for use in another domain other than the one for which it was easily designed, for example, the dialogue manager in TRAINS. On the other hand, it also seems that if/when these dialogue systems are grown or extended to become more general purpose, they they will quickly run in into the knowledge acquisition and knowledge representation bottleneck that plagues many an AI system. Has any work been done or would it be possible to add a learning component to the dialogue systems so that they could attempt to acquire and extend their knowledge about the world or even about new dialogue situations? I'm not referring here to adding information about beliefs and intensions to its knowledge, but rather about learning knowledge about a new domain through is dialogue. Theresa