Chapter 4
Video and Image Content
Representation and Retrieval
Video Information Model
Here we discuss the following topics:
- Video Information Characterization
- A Framework for Information Modeling
- Image Data Type
- Video Data Type
Modelling methods for content based retrieval of video information use
one of the following:
- manual annotations for extracting descriptive information -- (slow
for large)
- iconic representations derived using automatic methods for detecting
scene changes (cuts) -- (missing moving objects)
- static properties derived using image analysis techniques -- (concentrates
on individual image - looses temporal aspect of video).
Video Information Characterization
Types of physical information associated with video stream:
- Physical object -- video stream
- Physical attributes (length, size, frame numbering)
- Video related info (format resolution headers, frame rate)
What can be derived from a video stream?
- O -- set of objects present in video
- M -- set of motion representations
- Features, spatial relationships, derived from O
- Spatiotemporal info derivedonly from O and M together
- Spatiotempotal info supplied by the application designer
- Temporal relationships infered from M
- Image word information
A Framework for Information Modeling
The model: data types and functions.
Kinds of data types:
- "deliverable types" -- system defined, fixed: string,
integer, boolean -- are printable objects. Audio and video deliverable
(presentable) types. These are application-independent parts of the system
and are present in any specification.
- "entity-types" -- user defined. Such as PERSON, STUDENT --
represent objects or concepts from the real world.
A data type may have operators associated with it. User-defined functions
for user-defined data. Function types have general form:
where Ai called a type expression, is inductively defined as:
- a data type
- A1 u A2 , A1 x A2 or P(A1) where A1 and A2 are type expressions.
Also we need a collection of operators independent of application domain
and that operate on the system defined data types. Here are some examples:
- set operation symbols: isIn, isSubsetOf, intersection e.t.c.
- equality: is, isnot
- temporal synchronization: sim, before, meets, equals, starts_at, finishes
- spatial composition (for graphics, images and video): left, right,
bottom, up, showIn, arrange
- integer operation symbols: +, -, *, /, <, >, <=, >=, min,
max, ave, sum, prod
- string operators: concat, strLen
- logical operations: and, or, implies, not
- text: appendPar, cutPar, eqPar, keyword, isKeywordIn, parSim
- graphics: insPatch, pictureSum, fill, domain, colors, getPatch, getColor,
restriction, scale, translate, dot, lineSeg, box, coincident, contains,
disjoint, visible, bounded
- audio: intensity, extract, audioIns, audioLen, audioSim
- images: shift, zoom, superimpose, overlay, imageSim
- video: videoLen, pace, videoClip, videoIns
Image Data Type
Hierarchically structured data type:
- Physical image (digitized image representation). Intensity values to
pixels on a 2D grid.
- Object descriptors -- set of extracted boundaries: objects, regions,
chain codes
- Image features (for object recognition and further classification)
- Object semantics association -- semantics to objects or identified
features
- Semantic level -- real world description.
[Here goes a picture of the Image Data Type - Fig 4-2 pg 109]
Categorization of image operators:
- Low-Level Operators: image registration,enhancement, noise removal,
contrast, spatial filtering -- operators for image preprocessing. The image
then is ready for analysis and region/edge detection. Examples:
- Linear and nonlinear filtering: noise suppression, smoothing/sharpening,
edge/feature enhancement.
- Image arithmetic and logic operators: to detect differences between
images, etc.
- Intensity mapping: for contrast stretching/compression.
- Object-Representation (/extraction) Operators: extracting the shape,
boundary and skeleton of the objects inside the image. Also for associating
world-related info with the extracted features and objects. Examples:
- Area counting: determine presence/absence of an object
- Gray scale analysis: determine surface features, roughness
- Connectivity analysis: count objects, distances
- Edge detection: find features or objects
- Template matching: locate specified patterns
- Boundary detection: locate partially visible or low contrast objects
- Retrieval Operators: use all other obtained info. Output is boolean
or integer -- similarity with the input object.
Video Data Type
Editing operators:
- [Arrow_Down] -- finds first frame of a video sequence
- [Arrow_Up] -- result is video sequence without first frame
- similarly: [ADa] -- first portintion of a frames
- [AUa] -- rest of sequence w/o forst a frames
- circ -- appends one video sequence to another
Now we can introduce more elaborate operators [pg 111]:
- insert video stream in another
- extract a video clip
- video cut extraction: cuts: Video -> P(Integer)
Some more complex operators:
- Extract a set of motion icons for a video stream
- Extract a still image from a video sequence
Delivery operators:
Play and reverse
Video attribute operators:
- v_length:Video -> Integer
- frame_rate: Video -> Integer
- size: Video -> Integer
- resolution: Video -> String
- compression: Video -> String