Chapter 4
Video and Image Content
Representation and Retrieval

Here we discuss the following topics:

Modelling methods for content based retrieval of video information use one of the following:

manual annotations for extracting descriptive information -- (slow for large)
iconic representations derived using automatic methods for detecting scene changes (cuts) -- (missing moving objects)
static properties derived using image analysis techniques -- (concentrates on individual image - looses temporal aspect of video).

Types of physical information associated with video stream:

What can be derived from a video stream?

The model: data types and functions.

Kinds of data types:

"deliverable types" -- system defined, fixed: string, integer, boolean -- are printable objects. Audio and video deliverable (presentable) types. These are application-independent parts of the system and are present in any specification.
"entity-types" -- user defined. Such as PERSON, STUDENT -- represent objects or concepts from the real world.

A data type may have operators associated with it. User-defined functions for user-defined data. Function types have general form:

F: A0 x A1 x ... x An-1 -> An

where Ai called a type expression, is inductively defined as:

Also we need a collection of operators independent of application domain and that operate on the system defined data types. Here are some examples:

set operation symbols: isIn, isSubsetOf, intersection e.t.c.
equality: is, isnot
temporal synchronization: sim, before, meets, equals, starts_at, finishes
spatial composition (for graphics, images and video): left, right, bottom, up, showIn, arrange
integer operation symbols: +, -, *, /, <, >, <=, >=, min, max, ave, sum, prod
string operators: concat, strLen
logical operations: and, or, implies, not
text: appendPar, cutPar, eqPar, keyword, isKeywordIn, parSim
graphics: insPatch, pictureSum, fill, domain, colors, getPatch, getColor, restriction, scale, translate, dot, lineSeg, box, coincident, contains, disjoint, visible, bounded
audio: intensity, extract, audioIns, audioLen, audioSim
images: shift, zoom, superimpose, overlay, imageSim
video: videoLen, pace, videoClip, videoIns

Hierarchically structured data type:

Physical image (digitized image representation). Intensity values to pixels on a 2D grid.
Object descriptors -- set of extracted boundaries: objects, regions, chain codes
Image features (for object recognition and further classification)
Object semantics association -- semantics to objects or identified features
Semantic level -- real world description.

[Here goes a picture of the Image Data Type - Fig 4-2 pg 109]

Categorization of image operators:

Low-Level Operators: image registration,enhancement, noise removal, contrast, spatial filtering -- operators for image preprocessing. The image then is ready for analysis and region/edge detection. Examples:

Linear and nonlinear filtering: noise suppression, smoothing/sharpening, edge/feature enhancement.
Image arithmetic and logic operators: to detect differences between images, etc.
Intensity mapping: for contrast stretching/compression.

Object-Representation (/extraction) Operators: extracting the shape, boundary and skeleton of the objects inside the image. Also for associating world-related info with the extracted features and objects. Examples:

Retrieval Operators: use all other obtained info. Output is boolean or integer -- similarity with the input object.

Editing operators:

Now we can introduce more elaborate operators [pg 111]:

Some more complex operators:

Delivery operators:

Play and reverse

Video attribute operators: