====================================================================== From: Amy Soller Date: Tue, 19 Feb 2002 15:28:35 -0500 In the Core et al paper - If a particular plan (i.e. TEACH-STEP-BY-STEP) doesn't seem to be working, can the dialog planner change its strategy in mid-dialog? Or, can the student request that the planner use a different strategy? Also, I wonder just how much of line 25 in the example the system actually understands. In the Chi et al paper - it seems that the s-hypothesis and the t-hypothesis are the same hypothesis, just viewed from different perspectives. For example, it is likely the case that certain actions by the tutor at certain instances will provoke constructive responses by the student. ====================================================================== From: Vincent Aleven Date: Tue, 19 Feb 2002 17:02:59 -0500 Chi, et al. Very thorough analysis of learning through tutoring with interesting and surprising results. The study found evidence for three hypotheses regarding tutoring: the tutors' moves, the student's construction of knowledge, and the tutor-student interaction all contribute to the effectiveness of one-on-one tutoring. Further, the study found that in the hands of inexperienced tutors, a more interactive, less didactic style of tutoring works just as well as the more didactic style that these tutors naturally engage in. However, the resulting learning processes are different. I have a few comments on the limitations: First, the discussion in the paper does not highlight the fact that the results were obtained with inexperienced tutors - but this seems rather an important factor. The paper shows that in the hands of inexperienced tutors, both a more didactic and a more interactive style of tutoring have their advantages. It seems plausible (or at least an interesting hypothesis) that experienced tutors are better able to combine the advantages of the two styles. That would suggest a follow up study with the same design but with expert human tutors. Further, the difference between statements that are shallow follow ups and statements that are deep follow ups seems somewhat ill-defined or subjective. No inter-rater reliability results are reported. (The raters did code some amount of material and discussed their codings before working on their own (p. 492), but is that sufficient to make the case that the rating scheme has a reasonable level of objectivity? I'd say no.) The same can be said about the coding of scaffolding episodes as either deep or shallow. What are the implications of this paper for the building of tutorial dialogue systems? Not an easy question to answer. At minimum, the study suggests that system designers should be aware that differences in tutorial style may influence the learning process. Further, it seems rather likely that the more interactive the dialog, the harder the task of natural language understanding will become. (So, given that the more didactic and more interactive tutoring styles were equally effective, one might start out by building a more didactic tutor.) Finally, it is important to keep in mind that what goes for human tutors does not necessarily go for machine tutors, so that would suggest that experimentation is key. ====================================================================== From: Steffi Bruninghaus Date: Tue, 19 Feb 2002 17:21:02 -0500 Core et al. paper: It seems that the major point of this paper is the architecture, which seperates the dialog planner from the tutoring agent. While the paper claims that it supports a variety of human teaching tactics, it does not give a good intuition how that would happen. I would assume that there would be different operators in the dialog planning in figure 4. But then, wouldn't that violate the assumption that one seperates the tutoring from the dialog planning?!? The paper mentions a student model, but again, I missed the intuition how that may be integrated into the overall system. Katz & Aronis: This paper shows how one can use specialized and novel data mining techniques to find typical structures of tutorial dialogs, which is a pretty neat idea. However, I was left with somewhat of a "so what?" feeling. What would one do with the insights found in the experiment? A totally different question concerns the ML aspects of the experiments, how good are the rules that they found? One may wonder what percentage of the dialogs could be covered, and what kind of errors did the ML algorithm make? Graesser et al.: The authors make the point early on (p. 497) that feedback, error diagnosis and remediation are not overly relevant in real-world tutoring, which I find hard to swallow. It seems that for a tutor, it is very important to have some idea about what the student is doing - and to detect errors. In fact, isn't error handling/diagnosis inevitable for doing the loop in Figure 1? How can a tutor work with the student on improving the student's understanding without error handling? ====================================================================== From: "Alan D. Berfield" Date: Wed, 20 Feb 2002 00:08:07 -0500 (EST) Core et al Have they done anything yet with an actual implementation? Performed any experiments to test their architecture and ideas? Does this paper seem reasonable/innovative to the tutoring people among us? Graesser et al I found the issue of politeness to be interesting. Have any studies looked into how different levels or strategies affect learning? How polite are current tutoring systems? Katz et al How could the information and relations they found be used in tutoring systems? ====================================================================== From: "Matthew T. Bell" Date: Wed, 20 Feb 2002 01:36:14 -0500 (EST) Chi et al My questions below will seem a bit critical, only because I truly enjoyed this paper. Reading it encourages me to pose several questions, including one that has bothered me throughout the term's discussions. See question (2) below. 1) "Scaffolding episodes" never seem to be given a precise definition. Is there one? 2) The paper doesn't seem to address the question of whether performance gains are truly due the interaction, or due some intangible factor such as a human desire to connect with other humans. Granted, the latter would not be relevent to computerized dialog systems, which is precisely the point. Is some scepticism on the goals for a computer tutor called for by the very definition of tutoring? When we build a dialog system of any sort, tutor or otherwise, are we really aiming to do all the things that a human would do (and if not, how does that impact our design?). Conversely, are there things that a computer could do effectively that a human could not which we should take advantage of. This relates to the question of how to best implement an artificial intelligence: Is it by mimicing humans? Planes aren't built by mimicing birds. Are we going the wrong way focussing on human dialog when building computer dialog systems? 3) It seems that one factor not controlled for in the experiment was that of interlocuter beliefs, expectations, and goals with respect to tutorial communication situations. Might a tutor's belief that s/he is the cause of the learning, or vice versa a students analogous belief, have a powerful impact upon an experiment such as this? Given the non-confirmatory nature of their results w.r.t. their sets of hypotheses, it seems that non-controlled factors such as this (and that raised in 2) may well have impacted their experiment. Core et al Do I understand correctly that they built, or at least proposed, a tutorial system but did not evaluate it? Have they evaluated it since? How would one evaluate a successful system? Since they are aiming to model human tactics, what sorts of methods could be used to measure not simply student performance, but also the similar nature of dialogs as produced by the system and dialogs produced by humans under like circumstances? ====================================================================== From: Antonio Roque Date: Wed, 20 Feb 2002 05:10:13 -0500 (EST) Graesser et al. p 512: "It was the good students who said that they did not understand." is closely followed by "The feedback is misleading" Are the authors dismissing a correlation between being a good student and identifying when you don't understand? It's not really as counterintuitive as they claim: in fact, it might help explain how tutoring is more conductive to learning than classrooms, where asking questions is often frowned upon (by the other students, at least). Katz et al. p 544: comparing aggregate properties of each set - is it true that they're comparing complex speech acts in terms of their atomic speech acts? ====================================================================== From: "H." Date: Wed, 20 Feb 2002 07:07:56 -0500 Graesser, et al. The 5-step frame is obviously a very important contribution. As they point out, the most interesting steps are 4 and 5 (p.505). These two steps are what make tutoring unique from classroom instruction, but aren't they also what make it unique from other sorts of dialogue? These two steps make the imbalance between the participants very clear. They observed that "only 5% of tutor questions were driven by student errors" (p. 514), and then conclude that "it is very difficult for tutors to identify underlying bugs and misconceptions." My problem with this is that since they did not ask the tutors to explain their tutorial actions offline, the observation should be changed to say "only 5% of tutor questions REFLECTED student errors." In other words, tutors very well could have identified the existence of them and simply chose not to pursue the details because sometimes it is simply a bad idea to flesh out stinkin' thinkin'. Core, et al. I really like the modularity of the system, especially the clear demarcation between domain dependent and indepedent operators. Unfortunately, I think the dialogue planning operators are less interesting (fig 4) than the content plan operators (that "capture teaching tactics"), and I see no examples of the latter. Am I missing something? What teaching tactics are implemented? ====================================================================== From: Eric Williams Date: Wed, 20 Feb 2002 08:08:25 -0500 Katz et al The name Kleene rings a bell from my Finite Automata class, but I've long since forgotten what it means. Could you explain Kleene-star and Kleene-plus? The use of a PCFG to break down utterance grammar is interesting. Have there been any attempts to automatically learn CFGs or CSGs for dialogs and/or sub-dialogs? ====================================================================== From: "Andy P. Gaydos" Date: Wed, 20 Feb 2002 08:15:33 -0500 (EST) In what ways do dialog systems for tutoring differ from dialog systems for other tasks? What components of dialog systems for tutoring or non-tutoring are different and which could be used for both? ====================================================================== From: twilson@cs.pitt.edu Date: Wed, 20 Feb 2002 09:54:37 -0500 (EST) Graesser et al. At the end of this paper, the authors discuss the 'politeness principle' as a norm of conversation, including conversation during tutoring, but that politenes goals can be incompatible with cognitive pedagogical goals. They continue on to speculate what tutors would be like if they violated politeness maxims and Grice's maxims of conversation. Until now, the dialog agents we have examined in class have been coorperative. It is interesting to think about cases, such as an intelligent tutoring system, where following politeness principles may not always the best route. Has research been done into building such an ITS? Chi et al. What exactly is a tutoring protocol? Also, could you give some clarification as to the difference between moves, turns, and episodes? For example, the authors at times discuss "scaffolding moves" and at other times "scaffolding episodes". These are all terms that the authors used in reference to coding the tutoring dialogues. ====================================================================== From: Mihai Rotaru Date: Wed, 20 Feb 2002 10:12:19 -0500 General: Did anyone studied tutoring with no visual interaction between tutor and student. How about with no audio interaction. What where the result in term of the ability of the tutor to model the student state? It seems that modelling the student state is very important in tutoring. Sandra Katz et all Probabilistic CFG seem very interesting. Can they be modeled by MDPs? Where they appied to part-of-speech recognition? Core et all I understand that their model can follow the dialog given in appendix. But will it be able to generate it? Graesser et all I am not really convinced by their experiments. They used in the first experiment (research method corpus) only 3 tutors which were graduate students. My problem is that there are 3 tutors and they are graduate students. I guess that graduate student have a specific propensity since they are in graduate school. And this can reflect in their tutoring style. ====================================================================== From: Ilya Goldin Date: Wed, 20 Feb 2002 10:23:02 -0500 Katz et al provide a wonderful follow-up to Graesser et al: where the latter describe patterns in tutoring dialogues found through the methods of educational psychology, the former automatically through the methods of machine learning. These papers are in contrast to the literature on detecting patterns in task-oriented dialogue (e.g. Reinforcement Learning work). While tutorial and task-oriented applications are different, could be we apply similar methods to analyzing their corpora? One contrast between the two approaches is that the psychology approach literally has a theoretical basis: - the experimenters claim a hypothesis ("Tutorial dialogues carry the feature X") - come up with a coding that tests for the presence of X - report their results. The ML approach moves the first step to the end: - the experimenters come up with a coding that categorizes linguistic features (speech acts in Katz et al) - report statistics and correlations on how speech acts are used in tutorial dialogue - Katz et al go one step further -- they claim that certain speech act patterns detected by their PCFG have certain effects in tutorial dialogues. The problem is that Katz et al do not evaluate their claims. Is the ML approach methodologically flawed by definition? Is it just this paper (which, by the way, I thought was really excellent in terms of their ideas and technology)? How should such studies be done? -- Noboru Matsuda (http://www.pitt.edu/~mazda) Intelligent Systems Program, University of Pittsburgh 3939 O'Hara St. LRDC #810, Pittsburgh, PA 15260-5189 USA Voice. (412) 624-2662 Fax. (412) 624-9149