HYPERTEXT, HTML and XML

Hypertext is based on the concept of "Expanding Information".

Blocks of information (text or graphics) can be linked to other blocks of information.

Once two blocks of information have been linked together, they provide an instant "gateway" to the other.

A single block of information can lead to another and so on. Information can literally expand at the user's request.

A hypertext document is a NONLINEAR document with many links.

FATHER OF HYPERTEXT

Vannevar Bush, "As We May Think" (1933), described the Memex Machine:

"One might have the contents of a thousand volumes located in a couple of cubic feet in a desk, so that by depressing a few keys one could have a given page instantly projected before him."

HYPERTEXT CONCEPTS

Link Originator - The starting point to the link. It is surrounded by the symbols to indicate that it is a hypertext link. Link originator is an anchor.

Link End - The other side to a Link originator, where reader is taken after a link is traversed. Link end is also an anchor.

Traverse - The act of traveling from a Link originator to its associated Link End.

INTERNET

The growth of the Internet and the World Wide Web is truly phenomenal.

Internet Concepts

The following are commonly used application layer protocols:
  • HTTP Hypertext Transfer Protocol
  • FTP File Transfer Protocol

    TCP Transmission Control Protocol is at Transport Layer

    IP Internet Protocol is at Internetwork Layer

    For modem connection there are two commonly used protocols:
  • SLIP Serial Line Internet Procotol
  • PPP Point-to-Point Protocol

    IP address: contains sufficient information to uniquely identify a network and a specific computer on the network.

    IP address contains Class ID, Net ID and Host ID
    For example, nova.ksi.edu has IP address 192.217.188.1
     11000000.11011001.1011110.00000001
     ^^^----- -------- ------- ********
      First 3 bits 110 indicates Class C
      Next 21 bits is Net ID
      Last 8 bits is Host ID
    

    HTML CONCEPTS

    Hyptertext: Test with links to other texts

    Home Page: The root of a hypertext structure

    Web Browser: A tool to retrieve a document using URL from the Web server, interpret its HTML and present it to the user

    URL: Universal Resource Locator, the "address" of a Web page

    HTML: Hypertext Markup Language is the language to design the Web page

    Tag: HTML essentially is made up out of tags, which are strings enclosed in angular brackets.
    < HTML>
    < HEAD>
    < TITLE>Sample HTML document< /TITLE>
    < /HEAD>
    < BODY>
    < H3>Sample HTML document< /H3>
    < A HREF="http://www.ksi.edu/"> KSI Graduate School of Computer Science< /A>
    < FORM METHOD="get" ACTION="http://www.webbase.com/docs/webbase/examples/getname.htf">
    Please enter the name to search for: < INPUT NAME="name" SIZE=15 VALUE="Denny">
    < INPUT TYPE=SUBMIT value="Enter">< /FORM>
    < /BODY>
    < /HTML>
    
    The above HTML form as seen by the user looks like this:

    Sample HTML document

    KSI Graduate School of Computer Science
    Please enter the name to search for:

    If you click the ENTER button, this will send a "GET" request to the server that looks like:

    The server passes the information, which is a long string called the Query_String containing the (name,value) pairs, to: http://www.WebBase.com/docs/webbase/examples/getname.htf.

    This form 'getname.htf' takes the Query_String as input and returns another form which is then presented to the user.

    This technique of passing information to the server is used by the common gateway interface CGI to perform tasks using CGI scripts (programs). It is also used by WebBase to pass information to the WebBase form (htf) to perform database retrieval/update operations.

    For more information, consult A Beginner's Guide to HTML (1995)


    COMPARISON OF DATABASE AND HYPERBASE

    Hyperbase supports:

    * Non-linear text
    * Information expansion
    * Information navigation

    The Difference between Database and Hyperbase:
    HYPERBASE            DATABASE
    ---------            --------
    arbitory link        predefined index
    
    irregularity         regularity (all the record
                         has same attributes)
    
    good for navigation  good for data processing &
    & exploration        information processing
    
    easy to handle       more difficult to handle
    exceptions           exceptions
    
    User may get lost    User must formulate queries
    in hyperspace        to retrieve information and
    (disorientation)     may not find what he does not
                         know
    

    A GRAPH-THEORETIC MODEL FOR HYPERBASE

    A hyperbase Hg is an ordered triple (N,A,L) where

    N = {N1,...,Nm} is a set of information nodes

    A = {Al1,...,Amn} is a set of anchors, or subregions within nodes, such that each node Ni has one or more anchors {Ai1, ..., Ain}

    L = < Aij, Akl> is a set of links or ordered pair of anchors.

    Browsing Operator:

    A function mapping a link's first anchor (link originator) to a link's second anchor (link end0

    Authoring Operator:

    defines a link by identifying its two anchors.

    LOST IN HYPERSPACE

    A hyperbase becomes very complex because it has many nodes and links

    To solve the problem, we can try to cluster the nodes and/or links to form more abstract structures

    Aggregate: A set of distinct concepts that taken together form a more abstract concept.



    Aggregate Clustering with Exceptions ACE:

    From the input hypertext graph, construct an aggregated E-R diagram (regularity) and an exception graph (irregularity)



    Reference:

    Y. Hara, A. M. Keller, G. Wiederhold, "Implementing Hypertext Database Relationships through Aggregations and Exceptions", Proc. of 3rd ACM Conference on Hypertext, San Antonio, Texas, Dec 15-18, 1991, 75-90.

    XML, WML AND VRML

    In addition to HTML, we now have eXtensible Markup Language XML, Wireless Markup Languages WML, and Virtual Reality Markup Language VRML. These are for standardization of vraious markup languages for semantic information, wireless applications and virtual reality applications.