|
Extended Executive Summary
Edited by
Panos K. Chrysanthis Judith L. Klavans University of Pittsburgh Columbia University Pittsburgh, PA New York, NY panos@cs.pitt.edu klavans@cs.columbia.edu http://www.cs.pitt.edu/~panos http://www.cs.columbia.edu/~klavans Also Available in PS format
The Information Management (IM) and Data Management (DM) communities have been remarkably successful in making fundamental contributions to basic research and technology developments. IM and DM systems represent a vital and growing market. Recent advances in computer and network technologies have dramatically changed the sources, volume and form of data to be managed, as well as the forms and ways information is used. In light of these radical changes, these research communities have recognized that it is not enough for each to identify new research opportunities in isolation, as has been the case in the past, and that the success in meeting technology challenges for the 21st century will crucially depend on the establishment of increasing cooperation and collaboration.
Under the aegis of the Information and Data Management Program (IDM) of the National Science Foundation, 84 investigators funded by IDM and 40 federal officials and invited industrial guests participated in a three-day workshop that enabled them to articulate near- and long-term goals for the concerned research communities.
Objectives. The objectives of this workshop were to provide an opportunity to (1) assess the IDM program and highlight its many successes, (2) identify areas for improvement, (3) permit researchers to be informed of targeted government strategic directions, (4) advise funders on community-generated research directions, (5) encourage interdisciplinary interaction between the structured database, unstructured database, and information retrieval communities and (6) cooperatively set out a visionary yet realistic research agenda for the future.
Workshop Structure. Plenary presentations and demos provided wide sharing of information and achievements, while workgroup discussions provided ample opportunities for exchanging views and more in-depth debates of selected issues. To fulfill the objectives of transferring information from government and industry to the academic research community, briefings and panels by invitees were presented. The meeting was opened with presentations from the NSF including: Juris Hartmanis, Assistant Director for the Computer and Information Sciences and Engineering Directorate; Michael Lesk, Director of the Information and Intelligent Systems Division; Maria Zemankova, manager of the Information and Data Management Program, and Les Gasser, Director of the NSF Computation and Social Systems Program, CISE Coordinator for Knowledge and Distributed Intelligence. These presentations covered aspects of NSF's policy and programmatic frameworks, areas of IDM research and possible research directions, stressing the need for evaluation and measurement of success for research and technology. Thomas Kalil, from the White House National Economic Council, provided a motivational talk on high-level U.S. Information Technology policy, which stressed the long-term need for research to leverage government funding in making new headway. From industry, the talks by Avi Silberschatz, Lucent/Bell-Laboratories, and David Lewis, AT&T Research, together with the industrial/ academic panel on "Achievements and Vision'' provided perspectives on potential collaborations between academic researchers and computer industry and between information management and data management research communities.
Major Conclusions. "Omnipresent, omniscient data" supporting access at and from anywhere, at anytime, in any form and by any means within the digital globe, emerged as the primary challenge for the two research communities. The six working groups within which the participants organized, discussed specific needs and opportunities each from their particular perspectives, namely, next generation information access; information presentation and visualization; languages and data models, application models; new environments and data management systems; large-scale open environments of autonomous sources; and multi-modal issues, systems and applications. All working groups identified omnipresent, omniscient data access as the next-century core technological infrastructure that will enable an unprecedented number of new and important applications such as electronic commerce, distance learning and health care, empowering tools for physically challenged people, and tools for new scientific research methods, to name only a few. The following summarizes the conclusions of the workshop.
1. Research topics fundamental in making progress towards omni-present, omniscient data can be grouped into the following three broad areas:
(i) Next Generation Information Access which includes intuitive and efficient access-by-content to multi-dimensional and multimedia/multi-modal data. The focus will move beyond text and business data to multilingual text, the incorporation of advanced linguistically oriented techniques, multidimensional numeric data, spoken documents, music, images, and video management and retrieval. This will imply user-defined Quality-of-Service requirements with respect to data consistency, completeness, relevance, timeliness and cost, and new query types on data, resources and processes, including data mining for implied rules and relationships hidden within data, using both statistical and rule-based techniques.
(ii) Information Presentation and Visualization which includes collaborative evaluation of modes of interaction, development of evaluation models and strengthening the designer-user and the user-user interface, support for rapid construction of visualizations by non-experts, spatial and temporal integration of data of various scales and resolutions, exploitation of spatial metaphors for interaction with non-spatial information and summarization of results using both text and visualization. This will involve leveraging investments in studies of user needs to ensure that information presentation technologies match user-driven specifications.
(iii) Large-scale Management and Integration of Distributed and Migrating Data which include application-aware information management and integration of data from autonomous and heterogeneous sources. New languages and middleware are needed to deal with data from different media and of structures such as text, images, signals, presentations, videos, audio clips, dynamic data, scientific data, and software. Further, new techniques are needed for large-scale and efficient storage of multimedia data and metadata, scalable and flexible management of migrating workflows, transactions across autonomous systems and within different computing and communication environments. These include the Internet/Web, high-speed networking environments and wireless and mobile environments.
2. The Information and Data Management (IDM) research communities will continue their foundational role in creating the information infrastructure into the next century. The proceedings of the workshop, that include reports on all projects funded by IDM during the fiscal year 1997, provide ample evidence for this. This clearly demands increased support from both government and industry for basic and experimental research. An important aspect of this support should be in the development of research facilities which include both general purpose computing and network equipment and systems and special purpose repositories providing access to data and document sets, test collections and procedure depositories, transaction and query traces, IM/DM tools, etc. Industry which has a vested interest in information and data management research should contribute to the development of this academic research facilities. Further, academic research and industry collaboration need to be expanded. NSF can become instrumental in fostering such collaborations and thus enabling industry to provide funding for academic research and to accelerate technology transfer. A step towards this direction could be the establishment of National IM/DM Research Centers.
3. NSF should retain its leadership role of supporting basic and speculative research. The support of basic and speculative research shapes new information and database technologies. The Small Grants for Exploratory Research (SGER) has been a right step towards this direction and NSF should continue it in higher funding levels over a multiple year period. At the same time the IM/DM research community should take initiatives in discussing SGER with NSF program directors. In order to facilitate collaboration and better dissemination of results, NSF should increase its support for specialized research meetings and workshops, in particular interdisciplinary ones.
By articulating a common challenge, the workshop successfully initiated and stimulated future collaborations among the researchers from the different research communities involved in IDM. We anticipate that this will result in new research methods and innovative technologies. The discussions during the workshop have made clear the need for close collaboration with researchers from other disciplines including other sciences, engineering, medicine and the humanities who are involved in building new data intensive applications. NSF currently actively encourages such collaborations with several initiatives. What also became clear was the need for closer collaborations within computer science, for example with the human-computer interaction, artificial intelligence, natural language processing, image processing, process and workflow management, agent and distributed systems, networks and languages research groups. Although implicitly encouraged, such collaborations should be more actively initiated by NSF through both large-scale multi-disciplinary initiatives and inter-division programs for supporting integrative basic research and experimentation.
In conclusion, the workshop provided an effective forum for discussing the issues, which will drive a significant part of the research area for the IDM communities in the future. Further, having NSF officials as well as officials from other governmental agencies participating in the workshop allowed for direct dissemination of the current successes of the PIs funded by IDM and of the formulated research directions. The workshop proceedings and the findings of the workshop along with other background resources are made available on the Web (at http://www.cs.pitt.edu/~panos/idm98 ) to allow for wide dissemination of key issues addressed in the workshop to the entire research community. |