H. Chad Lane Levin, et al. The first example (day & month) and the flight information task are fairly constrained and yet, the space of possible strategies is still huge (p.14, middle of left column). I understand this is important to test the MDP/RL approach to finding dialogue policies, but what does it say about the practical limitations of the approach? Is there any hope of it applying to more open-ended dialogue situations like trouble-shooting or (of course) tutoring? Is it just as much work to set up the simulated users as it is to manually set (domain reasonable) strategies to begin with (as Singh et al. do)? Are there any longterm benefits? One potential answer is that the same simulated audience can be reused in new domains, but how much work would it be re-orient them for new systems? Singh, et al. I agree that initiative is very important, but I'm not convinced confirmation is up there with it. What is the worst thing that can happen if the system asks for confirmation too often? Why not just give the user the option to request less confirmation at the risk of being misunderstood once and a while? Early in the paper it is mentioned that levels of confirmation often comes down to user preference. I expected to see mention of user modeling, then, as future work, but only found the expert/novice difference discussed. Alan D. Berfield Levin et. al ------------ The ATIS domain is really pretty simple. How well do these techniques translate to more complicated dialogues? Roy et. al ---------- I found the Augmented MDP approach very interesting. The assumption about uncertainty makes intuitive sense, and seems to result in increased performance. In the conclusions, they mention learning the reward structure. Have they done anything with this yet? Singh et. al ------------ The amount of training data is clearly a problem. What are some of the advantages/disadvantages of this paper's approach versus the simulated user approach? Antonio Roque Although the NJFun system allowed a variable amount of user initiative in the dialogue itself, it was always very restrictive in the feedback part of the dialogue. It might have been interesting to have longer feedback dialogues involving greater user initiative. Matthew T. Bell Singh et al: Just out of curiousity, could the benefits of using reinforcement learning over e.g. rule learning or decision trees for this task be discussed. It seems like the output of the algorithm is very like a rule learner. I'm also having trouble understanding the connection between MDPs and state->action strategies. MDPs are probabalistic models. Are the MDPs viewed as generative of the strategies? Roy at al: To summerize (and then query): The benefit of POMDPs as opposed to MDPs is that on-the-fly performance situations with ASR and other AI components infact only models what the user is doing, intending, etc...it cannot say what the user's state actually is. My question, then, is what making explicit this uncertainty gives us: It seems that MDPs are already probabalistic, and so already implicitly keep track of some uncertainty. Are POMDPs giving explicit treatment to this somehow? The way it sounds to me: MDPs: I have probability P of being in state S; I know nothing about my probability of being in any other state POMDPs: I have probabilities P1, P2, ..., PN of being in S1, S2, ..., SN. Is this the difference? Just a bit confused, and I'm reading this rather late :-) Andy P. Gaydos All of these learning techniques were tested using fairly small search spaces and needed to approximate solutions. Would these techniques be able to handle larger search spaces? Singh et al. The popular activities that occur in a town would vary by season and a certain event might last for one or two weeks in the summer. Could these systems using learning techniques or any dialog system be adjusted or adjust itself to be ready to handle temporary dialog topics? Roy Wilson (1)What are the effect sizes for the three evaluation measures proposed by Singh, et. al: that is, how big were the changes (in standard deviation units)? I wonder whether with a much larger state space, the approach could be applied to each subdialog with single entry/exit point, thus "partitioning" the state space (a la what Levin terms macroactions?). This approach might make it possible to empirically build an MDP model for each subdialog using direct human-machine interaction. (2) Singh, et.al, refer to the POMDP approach of Roy (no relation) et. al, as a possible way to reduce state spaces. Although Figure 4 shows what appears to be a large gain in reward over time, Table 3 does not analyze the average difference for statistical significance (it appear to be ns) or effect size. Since only three real users actually used the system, the evaluation done by Roy, et.al, lacks the credibility of the evaluation in the Singh paper. So, it seems that the jury is/was still out on the useability of the POMDP approach to design large-state systems to be used by a substantial number of naive users. (3) Using a corpus to simulate users is very ingenious: With a large state-space that cannot be partitioned, it may be the only way (in the absence of POMDP) to design the system and to evaluate it. Unfortunately, Levin, et. al did not or could compare the behavior of simulated and real users, so the useability of their modeling approach is not yet established. Eric Williams Singh, et al The NJFun system uses RL to perform a bias space search for a good dialog policy. This is done automatically, whereas it must normally be done by hand with a fair a mount of black magic (like most other bias space searches). On the surface this automation seems good until one realizes that the authors introduced another bias space search - design of an appropriate state space. To me this seems to be every bit as "mystical" as designing a good dialog policy. Has anything really been gained by this system or have the authors simply dodged an old problem in favor of a new one? Ilya Goldin 1) In Levin et al., I find the use of a simulated user to provide feedback during learning a fascinating idea. I wonder whether there are special properties of their domain that make it possible to use a simulation, or whether this approach can be applied more generally. Specifically, I suspect that the simulation approach requires properties like short turns on the part of both user and system, or constraints on user initiative. 2) Roy et al. claim that their representation of state is a representation of the user rather than the system. It seems to me that their state represents not the user, but the task. Amy Soller In the Levin paper - While estimating cost during the training phase, how does the algorithm determine how satisfied the simulated user is? This information is subjective, but apparently critical to establishing the optimal strategy. Also, I wonder how much work would be required for the system to learn a bidirectional link between the retrieval and output states. This small change would allow the system to carry on a more interesting dialog. In the Singh et al paper - How many of the total 311 dialogs are actually necessary for the system to stabilize to the optimal policies?