==================================== JUPITER Paper: I liked this paper a *lot*, pleasure to read - it was very clear with what it claimed, and how the system was described. In particular, the authors did not give the impression that they claimed more than they did! In contrast to most papers we read, this project makes the assumption that there is a strong domain model, and that the interaction between the user and the system is well-behaved (e.g., the beep-thingy on p. 101). There seems to be no real "dialog management", which makes me wonder what the benefits and cost would be for the discussed application from adding that functionality? Elephant paper: This paper introduces two very intuitive design ideas for dialog system, namely adding as much knowledge as soon as possible, and keeping alternative hypotheses as long as possible. It has lots of interesting information - the authors write about a large collection of very interesting "tricks" in language processing, and how all of these tricks make their systems better, unfortunately without presenting illustrating examples. After being overwhelmed by all these interesting details, I am wondering about a few things: 1. What are the disadvantages of all the tricks from the bag? Which sort of leads to: 2. When would one use which tricks? Is it always a great idea to use template-based generator or a spelling filter (Section II)? 3. What should be the overall strategy of someone developing a spoken dialog system be to then integrate the methods described in the paper? 4. How general are the presented techniques? Are they only applicable in a few domains the authors used them in, how would they generalize to, for instance, building a tutorial dialog application, or perhaps a more planning oriented application, like a spoken dialog recipe agent? (Sort of "Call Martha Stewart" or "Call Juila Child", depending one one's cooking style) Stefanie Bruninghaus ====================================== In Zue et al's paper, they specify that Jupiter's error rates decrease by 25% when utterances containing crosstalk or other nonspeech signals are removed. I am guessing it might be difficult to recognize crosstalk, but the nonspeech items should be fairly straighforward to remove, shouldn't they? This would apparently significantly boost performance. In the Souvignier et al paper, I assume that the concepts c_i are automatically determined during the clustering process. If this is the case, then any misclassification during clustering gets propagated through the system. I wonder if the errors due to these intiial misclassifications are significant. A comment - The supervisor mode (in the Rudnicky et al, 1999, paper) seems like a useful idea. I wonder if this has had any success. Amy Soller ====================================== Abella & Gorin: The SLU output is obviously very important to the overall success of the system. I'm having trouble seeing the connection between it and the original sentences in their examples. Anybody know how this works? I'm also curious about the boolean formulas - where do they come from? In the "Superseding Call Types" example, I don't understand why the "DIAL FOR ME" portion of the SLU output can be dropped in the boolean formula. Rudnicky et al: Perhaps it was due to space constraints, but I was somewhat disappointed in the lack of better descriptions of the hierarchical tree structure used to represent itineraries and of schemas. They point out the problems of form-based approaches, but don't really make it obvious why schemas are superior. To turn this complaint into a question, then... are they superior? It seems that schemas highly domain dependent, so what does this say about the generality of the approach? Chad Lane ========================================= "The Thoughtful Elephant..." (and the rest of these papers) work from the assumption that spoken dialog interfaces are efficient at exchanging information, and intuitive. But what are the limitations of this, and are they evident in the papers we've read? One main limitation that spoken telephone-like interfaces have over, for example, text unix command-line-like interfaces, is that previous turns are unavailable to the human for immediate review. In the case of shorter, less complex dialogues such as a Weather Information interface, that wouldn't be an issue, but for a train timetable or air travel information systems it might. "Creating Natural Dialogs in the Carnegie Mellon Communicator System", the authors mention implementing scripts capturing the conversational activities, based on analyses of human-human travel planning (section 3.4). It'd be interesting to see exactly what those analyses where, and to what extend they could be automated. Antonio Roque ============================================ "Thoughtful elephant" paper: The authors point out their use of a n-best unsupervised method to learn from new utterences supplied to the system in an online fashion, discarding the utterence when done. I'm not clear on how this works, although it sounds cool. Could we cover how this was done in some greater detail? Motivating my interest is their claim that this minimized the need for data to learn from. This seems roughly parallel to the motivations for Chomsky's nativistic transformational grammer, GB theory, and its dozen other incarnations, yet it seems to be doing so not through use of syntactic, but rather semantic primitives. Thoughts? Abella and Gorin: Just a comment: This seems to be a shift back to the direction of using logical reasoning in order to govern a dialogue as opposed to heavy reliance upon surface cues. The reasoning power used, however, is lightweight compared to those discussed earlier. It seems this has also involved a departure from the goal of generality. Does it? Rudnicky et al: To be honest, I had trouble understanding this paper. They mention using schemas and scripts, which sounds reminiscent of things I've heard about in psycholinguistics and knowledge representation, but they do very little in the way of giving a definition of what a schema is to them. Did I just miss it? What do they mean by schema? Jupiter weather dialogue system paper: The amount of human intervention required for this system sounds incredible. They mention maitaining a corpus of about 40 sentences per day which their system cannot parse, which is set aside and handled later by humans manually. They do not describe how they would do this, but given my small amount of experience writing grammars for parsing and NLU in linguistics classes, I find this thought staggering. By contrast, they seem to mention it almost nonchalantly, as if this step were no big deal. Is their method for studying and producing meaningful grammar rules to cover all these new cases detailed anywhere? Are they attempting any abstraction at all, or does their method reduce to some sort of human-mediated memory based learning? Matt Bell ============================================ (1) It seems that neither of the following two papers offer an evaluation: Abella and Gorin; Rudnicky, et.al. Is that a shortcoming? (2) I was very impressed by the way in which Souvignier, et. al., construct and propose to use confidence measures for semantic items. Might, however, this approach might be vulnerable to certain kinds of problems? (3) Abella and Gorin say "To use the dialog manager for a different application requires creating a new inheritance hierarchy but not a new dialog manager". Tht fact that the algorithm presented is very simple and elegant inclines me to accept their claim, but I wonder if they provide "enough" justification for their claim. Roy Wilson =========================================== Abella and Gorin This system doesn't seem nearly as extensible as others we've seen. Does anyone else get this impression? To me it seems that the SLU is too domain specific. The authors use Karnaugh maps to find a minimal cover of Boolean functions. Aren't there more computationally efficient ways to find minimal covers (semi-rhetorical...My background in computer engineering gives me high confidence that this is the case and so I wonder why they chose this method.) Rudnicky, et al The authors state on p1, "This expertise reflects what humans know about performing tasks; it is both domain-specific and more general, representing "conversational skills" that may be transferable between domains." While I tend to agree with their assessment of conversational skills as domain independent, I would still like to know what psychological and linguistic theories support their claims. Why does it seem that so many researchers in computational linguistics are bold enough to use their systems as theoretical models of human dialogue, but are unwilling to bear the burdens thus placed upon them by scientific method? What are class-based trigrams (As mentioned on p3)? This project struck me as having similar goals and theory to TRAINS. Was this project a reaction to, inspiration for, or continuation of TRAINS? The authors mention a rule learner on p4 that is used to initiate supervisor mode in the event of a dialogue gone awry. Does anybody know which they used? Eric Williams ==================================