=========================================================================== Graesser et al note that 56% of their system's chosen pathways ended in prompts, and that this is generally undesirable for a tutoring system. This is an interesting deviation from the other sorts of systems we have been discussing in class, where prompts are desirable in that they help the system obtain the information it needs to achieve the goal. I guess this goes along with some of the comments in the last class in which the goals of tutoring systems were contrasted with the goals of computer-based assistants. Aleven et al - If the time & number of attempts needed to arrive at a complete explanation decreased over time, then it's not necessarily the case that the students were learning the theorems. It is quite possible that the students were simply adapting to the system's dialog form. The only way to know for sure if the students were learning the theorems is to pre and post test. What did the students say about the system? Was there any sort of questionnaire? A few more questions/comments - p.8 - 14% regression seems high. Was there any qualitative analysis of why students regressed? p.9 - How many of the 55 total examples were classified as "difficult"? If all the dialogs with the most rating discrepancies were removed, then the sample that kappa was reported on is not representative of the total population of examples. p. 10 - If the raters could assign multiple labels to the explanations, then which label was used for computing kappa? p. 11 - The "chance agreement" values seem awfully low (3%, 4%), which suggests to me that the kappas ought to be higher than the percentage agreement ratings, since kappa factors in chance. Amy =========================================================================== Core, Moore, and Zinn: To be honest, I found this paper difficult to read and understand. I'm curious though if others found their claim that "high-level tutorial planning" must be seperated from "low-level communication management" perhaps questionable. If we accept the claims of discourse pragmatics as all, wouldn't we have to posit some "high-level" motivations for every "low-level" communcation decision? Aleven, Papescu, Koedinger: A simple question: What do the numbers on pg 6 mean? Some state that it took, e.g. 3.6 +/- 4.5 attempts on the part of the students to give a correct explination. How could that be -- it would make some of the students responses to have occurred given a negative number of attempts. (I'm obviously reading something wrong here :-)) Graesser, Person, Harter: On page 4 a list of five steps in a tutoring dialog frame are listed. Steps 4 and 5 are qualitatively different from the others in that they themselves can be broken down into smaller substeps. Are these substeps listable in the same way as the list of 5? Graesser, VanLehn, Rose, Jordan, Harter: One of the criticisms of Andes given on pg 13 is that "The tutor does not ask students to explain their actions, so students may not learn the domain's language." The subsequent explination of this criticism sounds like part of the educators goals is to instruct students in how to use a sociolinguistic register -- to be able to "talk science" seems neither more nor less than that. Given the amount of languauge exposure needed to learn to acquire register, is it reasonable to have a computer tutor be made responsible for this, even partially? (This is not to say its bad to have the computer take natural languauge explinations; only that I find suspect the goal of having a computer be responsible for teaching high-level social interaction skills. Are tutorial systems being successfully deployed for this task?) Rickel, Lesh, Rich, Sidner, Gertner: This paper was interesting in that it tried to make an explicit connection between work in dialog systems and tutorial systems, and also because it tried to make use of an instantiation of the discourse theories of Grosz and Sidner. It seems from their description that Collagen was designed to be readilly applied to new domains. Has it been? Are others outside their group now using Collagen? Matt =========================================================================== Vincent, can you describe the advantages and disadvantages you've found in using knowledge-based (as opposed to statistical or hybrid) NLU? Antonio =========================================================================== Graesser et al., Teaching Tactics and Dialog in AutoTutor Three examples of Auto-Tutor's original conversational deficits were given. What changes did they make and what would a similar set of utterances look like in the modified version? Graesser et al., ITS With Conversational Dialogue, p.19, "dialogue styles may need to be distinctively tailored to particular classes of knowledge domains." If so, does this tailoring affect Core et al.'s strategy to create a domain independent architecture? Andy =========================================================================== 1) Graesser, et.al, suggest that the effect size of approximately a 0.5 standard deviations in their experiment is due to the fact that AutoTutor dialog tactics are modeled on the actions of relatively unaccomplished tutors, perhaps accounting for the larger effect sizes found for sophisticated ITSs. Even if a more sophisticated version of AutoTutor performs as well or better than the systems referenced by Graesser, I wonder if the sucess of this LSA-based approach is dependent on the form of knowledge that students are to learn and/or have reinforced. In particular, I am skeptical whether the LSA approach (Antonio?) could work well with domains that require what Aleven terms "mathematical precision". Although LSA is more sophisticated than a bag-of words approach, I wonder whether it can distinguish (unlike a BOW approach) between the sentences "a is greater than b" and "b is greater than a" and, if not, how it could work "well" for domains where formal reasoning is, if not the target of instruction, indispensable. (2) Core, et.al, extol the virtues of a modular, reusable architecture that facilitates effective tutorial dialogue (the ITS equivalent of motherhood and apple pie?), chosing the TRINDI framework to implement it, making it possible to "profit from the framework's modularity but also from the work done on currently implemented TRINDI systems." Unless I am mistaken, Diane reported that TRINIDI has been abandoned. Does this mean that profit has been foregone and, if so, what does this imply about the architectural virtues that TRINDI presumably embodied? (3) Rickel, et. al, state "Our experience has been that using Collagen as a starting point for implementing Paco was a great improvement over programming tutorial agents 'from scratch' ..." Although the idea of leveraging models of collaborative dialogue is appealing, especially a model that builds on GST (vs GMT :-)), I wonder if the perceived ease of use/development (per Amy's earlier comment) isn't more highly dependent on the theoretical commitments of the investigators/workers than the comments of Rickel, et.al., suggest. (4a) Aleven et. al state that the combination of curriculum and tutor have been shown to be better than traditional geometry classroom instruction: how better and how much better? (What are the effect sizes?) (4b) Aleven, et. al, state that the number of attempts and time to completion measure "how effective the system is in helping students to explain." Don't these variables also measure the quality of the user interface (largely independent of the content being provided)? (4c) Why do Aleven, et. al, provide no significance information for the set of kappas reported or directional hypothesis tests? Roy =========================================================================== Here's a question that combines several papers: I am wondering what tradeoff we can find between how well one understand the students and how hard one has to work to tutor them. When I look at the Geometry tutor and Autotutor, I am somewhat surprised. In Vincent's paper, the understanding and the tutoring are very deep and very precise, and it seems that the dialog pretty much falls into place, because the rest is done "right". In Autotutor, there is much less understanding, and the focus is much more on what tutoring moves should be selected. On the other hand, one would expect that the more is known about the students (as in the PACT tutors, using NLU), the more possibilities one has to work out a dialog strategy, whereas when one has a rather imprecise idea about the student (as in Autotutor, using LSA), one also less knowledge on which one can base the dialog strategy. And, one question/comment about LSA: While the idea behind LSA is absolutely intriguing, I am still not convinced that using LSA for tutoring is the right approach. Wouldn't a more case-based approach be much better, in which a right answer is compared using a semantic, knowledge-based measure of relatedness is used? It seems to me that this could also be preferable to the hybrid approach suggested for Why-2. Steffi =========================================================================== Aleven et at, 2002 What they mean by "explanation" here seems to be a reminding of a correct definition of geometry postulate. And students' explanations are assumed to be improving gradually following a hierarchical structure representing a degree of elaboration. I wonder how realistic this assumption is and how useful the developed system is. Since this is a research on tutoring, we need to know what exactly students need as scaffolding and I'm always wondering if we do a right thing... Were the students taught to provide a complete explanation in one sentence? In other words, did they understand a goal of the tutoring sessions? Out of 791 explanation attempts, half of them are rushed by a ring and the other half are terminated abnormally? So, the analysis is based on a set of "not-so-good" samples? Maybe I've misunderstood… Noboru =========================================================================== In both Core et al. and Rickel et al. researchers are trying to use existing dialogue systems/research to inform the development of intelligent tutoring systms. For both papers, the tutoring system seems to reside as a front end to a more general dialogue system. This approach seems like an intuitive one to persue, and the ideas in both papers were interesting. However, is the research presented in these papers not far enough along to present an even a preliminary evaluation of the tutoring systems they are developing? I didn't notice this lack of evaluation until I read about AutoTutor, where in addition to evaluating student learning, the authors compared AutoTutors moves to those generated by human tutors. Since the researchers in Core et al. and Ricket at al. are taking a new approach to ITS, I would like to see evaluations of how well their systems compare to ITS that do not take their approach of building a ITS as a front end to general dialogue system. Is the dialogue produced with such a system more natural? Terry =========================================================================== Aleven, et al How valid are the authors' claims that students' improvements resulted from learning how to explain themselves better? It seems to me they could have just as easily been accounted for by the students' ever-increasing familiarity with the tutor and its quirks and flaws. general How has ITS research benefited classroom teaching? Granted, tutors are ideal, but they are often impractical given this country's educational system. What improvements for classroom learning can be learned from ITSs? Eric ===========================================================================