Video clips
created July 11, 2003 and January 20, 2004
For the CAVIAR
project a number of video clips were recorded acting out the different scenarios
of interest. These include people walking alone, meeting with others, window
shopping, entering and exitting shops, fighting and passing out and last, but
not least, leaving a package in a public place.
The first section
of video clips were filmed for the CAVIAR project with
a wide angle camera lens in the entrance lobby of the INRIA Labs at Grenoble,
France. The resolution is half-resolution PAL standard (384 x 288 pixels, 25
frames per second) and compressed using MPEG2. The file sizes are mostly
between 6 and 12 MB, a few up to 21 MB.
A typical frame
from the image sequences is below. It shows three individual boxes (yellow) and
one group box (green). There are several people in the video sequence that are
not boxed because they do not move over the course of the sequence.
The second set of data also
used a wide angle lens along and across the hallway in a shopping centre in
Lisbon. For each sequence, there are are two time synchronised videos, one with the
view across and the other along the hallway. The resolution is half-resolution
PAL standard (384 x 288 pixels, 25 frames per second) and compressed using
MPEG2. The MPEG file sizes are mostly between 6 and 12 MB, a few up to 21 MB.
All data is
publicly available and on this page they can be downloaded in MPEG2 format or
split into JPEGs. If you publish results using the data, please acknowledge
the data as coming from the EC Funded CAVIAR project/IST 2001 37540, found at
Usability is Creative Commons BY-SA:
The ground truth for these
sequences was found by hand-labeling the images, as in the example shown above.
The JAVA programs for the interactive labeller can be found here (unsupported). This is the userguide.
The finite state automata
that describes the allowable roles, activities and sequence of situations in
each context is here.
The XML grammar
for the ground truth labeling files is here.
Note that these XML files are large (eg. averaging 1.5MB), so viewing one of the files in a browser window can take a minute
or more to load up because of the default formatting of the XML.
The groundtruth XML is based on the CVML language (CVML- An XML-based Computer Vision Markup Language), which was
presented at IPCR.
CVML is described in more
detail at
The CVML detail includes a C++ based
which supports
reading and writing the XML (amongst other things).
A discussion about the INRIA datasets was presented at PETS04. The ground truth labelling notation discussed in the paper has changed to XML and some minor details have changed, but most of the concepts and discussion are still useful.
Some tracked targets in 17 of the sequences have had additional information about their head, gaze direction, hand, feet and shoulder positions added. (Not all because we have no more time and money at the moment.)
INRIA Sequences Datafile Sequence Comments fomdgt2.xml Fight_OneManDown only targets marked: 0,1,4 mc1gt.xml Meet_Crowd all marked mwt2gt.xml Meet_WalkTogether2 all marked fcgt.xml Fight_Chase all marked mws1gt.xml Meet_WalkSplit all marked ms3ggt.xml Meet_Split_3rdGuy only targets marked: 0 fra1gt.xml Fight_RunAway1 only targets marked: 7 fra2gt.xml Fight_RunAway2 only targets marked: 4 Lisbon Sequences Datafile Sequence Comments c2es1gt.xml TwoEnterShop1cor no gaze directions annotated fsa1gt.xml ShopAssistant1front all marked csa1gt.xml ShopAssistant1cor all marked c3ps1gt.xml ThreePastShop1cor all marked f3ps1gt.xml ThreePastShop1front all marked f2es1gt.xml TwoEnterShop1front all marked fosow2gt.xml OneShopOneWait2front all marked cosow2gt.xml OneShopOneWait2cor all marked cosme2gt.xml OneStopMoveEnter2cor only targets marked: 0,2,11 c3ps2gt.xml ThreePastShop2cor all marked f3ps2gt.xml ThreePastShop2front all marked
An example of a marked up frame with heads, gaze, hands, feet and shoulders is:
Clips from INRIA (1st Set)
Six basic
scenarios were acted out by the CAVIAR team members. Most clips start with one
member showing in body sign language the scene number. This can be used for
calibration or be removed at will.
For people interested in the ground plane homography, the mapping can be computed from this information. The image pixel positions will depend on your image scaling. Here the image sizes are the jpg 384x288. For the image just below the corresponding pixel positions are:
Point | (Col,Row) (pixels) | (X,Y) (cm) |
1 | (64,88) | (0,671.5) |
2 | (211,40) | (1116,670) |
3 | (349,184) | (1545,190) |
4 | (39,187) | (0,0) |
One person
walking – straight line |
Walk1.mpg (7 Mb) JPEGS: Walk1_jpg.tar.gz (13 Mb) |
One person
walking – straight line and return |
Walk2.mpg (12 Mb) |
One person
walking – B-line |
Walk3.mpg (16 Mb) |
Person browsing
back and forth |
Browse1.mpg (12 Mb) |
Person browsing
and reading for a while |
Browse2.mpg (10 Mb) |
Person browsing
and reading with back turned |
Browse3.mpg (11 Mb) |
Person browsing
reception desk |
Browse4.mpg (13 Mb) JPEGS: Browse4_jpg.tar.gz (24 Mb) |
Person browsing
while waiting short |
Browse_WhileWaiting1.mpg (9 Mb) |
Person browsing
while waiting long |
Browse_WhileWaiting2.mpg (21 Mb) |
slumping or fainting
Person resting
in chair |
Rest_InChair.mpg (12 Mb) |
Person slump on
floor |
Rest_SlumpOnFloor.mpg (11 Mb) JPEGS: Rest_SlumpOnFloor_jpg.tar.gz (19 Mb) |
Person wiggle
on floor |
Rest_WiggleOnFloor.mpg (15 Mb) |
Person fall
down immobile |
Rest_FallOnFloor.mpg (12 Mb) |
Leaving bags
Person leaving
bag by wall |
LeftBag.mpg (17 Mb) |
Person leaving
bag at chairs |
LeftBag_AtChair.mpg (13 Mb) |
Person leaving
bag behind chairs |
LeftBag_BehindChair.mpg (13 Mb) |
Person leaving
box |
LeftBox.mpg (10 Mb) |
Person leaving
bag but then pick it up again |
LeftBag_PickedUp.mpg (16 Mb) |
meeting, walking together and splitting up
Two people meet
and walk together |
Meet_WalkTogether1.mpg (8 Mb) |
Two other
people meet and walk together |
Meet_WalkTogether2.mpg (9 Mb) |
Two people
meet, walk together and split |
Meet_WalkSplit.mpg (7 Mb) |
Two people meet,
walk, split with third person |
Meet_Split_3rdGuy.mpg (11 Mb) |
Crowd of four
people meet, walk and split |
Meet_Crowd.mpg (6 Mb) |
Two people
enter walk and split |
missing |
Two people
Two people
meet, fight and run away |
Fight_RunAway1.mpg (6 Mb) |
Two other
people meet, fight and run away |
Fight_RunAway2.mpg (6 Mb) |
Two people
meet, fight, one down, other runs away |
Fight_OneManDown.mpg (11 Mb) JPEGS: Fight_OneManDown_jpg.tar.gz (20 Mb) |
Two people
meet, fight and chase each other |
Fight_Chase.mpg (5 Mb) JPEGS: Fight_Chase_jpg.tar.gz (9 Mb) |
Clips from Shopping Center in Portugal (2nd Set)
In the second set
of experiments each clip was recorder from two different points of view. The
first one shows a view of the corridor, while the second shows a frontal view of
the scenario. The two video sequences should be time synchronised frame by frame.
However, each video set may start at a slightly different time, so you need to
figure out the frame correspondences. The time-code in the
upper left of the images gives the necessary information.
During capture and digitisation, a frame segmentation error means that some
frames are duplicated and the expected next frame is missing. Thus, the
overall rate is correct, but occasionally 2 consecutive frames are identical.
This set of
sequences are longer (1500 frames on average), containing
more individuals and groups than the first set. Example synchronized
images are shown here:
For people interested in the ground plane homography, the mapping can be
computed from this information. The image pixel positions will depend on
your image scaling. Here the image sizes are the jpg 384x288.
For the image just below the corresponding pixel
positions are:
Point | (Col,Row) (pixels) | (X,Y) (cm) |
1 | (91,163) | (000,975 ) |
2 | (241,163) | (290,975 ) |
3 | (98,266) | (000,-110 ) |
4 | (322,265) | (290,-110 ) |
5 | (60,153) | (000,000 ) |
6 | (359,153) | (000,975 ) |
7 | (50,201) | (382,098) |
8 | (367,200) | (382,878) |
Couple walking
along corridor browsing, persons going inside and coming out of stores |
WalkByShop1cor.mpg (14 Mb) |
WalkByShop1front.mpg (14 Mb) |
Two persons
cross paths at the entrance of a store, couple walking on the corridor |
EnterExitCrossingPaths1cor.mpg (2 Mb) Ground
truth XML version: ceecp1gt.xml |
EnterExitCrossingPaths1front.mpg ( 2 Mb) |
persons cross paths at the entrance of a store |
EnterExitCrossingPaths2cor.mpg ( 3 Mb) Ground truth
XML version: ceecp2gt.xml |
EnterExitCrossingPaths2front.mpg ( 3 Mb) |
goes outside a store. Visible
on corridor view only: Three
persons walking together in the corridor |
OneLeaveShop1cor.mpg (2 Mb) |
OneLeaveShop1front.mpg (2 Mb) |
goes outside a store. Visible on
corridor view only: Four
persons walking together in the corridor |
OneLeaveShop2cor.mpg (6 Mb) |
OneLeaveShop2front.mpg (6 Mb) |
comes out of store and later reenters. Visible
on corridor view only: Person coming
out of store and walking on the corridor |
OneLeaveShopReenter1cor.mpg ( 2 Mb) |
OneLeaveShopReenter1front.mpg (2 Mb) |
comes out of store and later reenters. Visible
on corridor view only: Four
persons walk on the corridor. |
OneLeaveShopReenter2cor.mpg ( 3 Mb) |
OneLeaveShopReenter2front.mpg (3 Mb) |
walking on the corridor, one goes inside a store, the other waits outside,
later they rejoin and leave together. |
OneShopOneWait1cor.mpg ( 8 Mb) |
OneShopOneWait1front.mpg ( 8 Mb) |
to OneShopOneWait1front, but contains various groups of persons walking along
the corridor. |
OneShopOneWait2cor.mpg ( 5 Mb) |
OneShopOneWait2front.mpg (8 Mb)
goes inside store |
OneStopEnter1cor.mpg ( 9 Mb) |
OneStopEnter1front.mpg (9 Mb) |
browses stores and goes inside and out, couple of walker along the corridor Visible on
corridor view only: Two
persons go inside stores. |
OneStopEnter2cor.mpg ( 16 Mb) |
OneStopEnter2front.mpg (16 Mb) |
Person stops outside store, goes inside and
out of store. Five groups of people walking along the corridor. |
OneStopMoveEnter1cor.mpg (9 Mb) |
OneStopMoveEnter1front.mpg (9 Mb) |
goes inside and out of a store twice. Visible
on corridor view only: Group of
4 people comes out store |
OneStopMoveEnter2cor.mpg (13 Mb) |
OneStopMoveEnter2front.mpg (13 Mb) |
stops out of store goes inside and out of store Visible
on corridor view only: Couple
come out of store |
OneStopMoveNoEnter1cor.mpg (10 Mb) |
OneStopMoveNoEnter1front.mpg (10 Mb) |
goes inside store, browses, and leaves store. Visible
on corridor view only: Couple
walking along the corridor |
OneStopMoveNoEnter2cor.mpg (6 Mb) |
OneStopMoveNoEnter2front.mpg (6 Mb) |
stops outside an store and continues walking along the corridor |
OneStopNoEnter1cor.mpg (4 Mb) |
OneStopNoEnter1front.mpg (4 Mb) |
stops outside a store and continues walking, another person goes inside a
store, browses and then leaves the store. |
OneStopNoEnter2cor.mpg (9 Mb) |
OneStopNoEnter2front.mpg (9 Mb) |
goes inside a store and browse, another person joins and they leave together
the store |
ShopAssistant1cor.mpg (10 Mb) |
ShopAssistant1front.mpg (10 Mb) |
goes inside a store, browse, another person joins later they split, 3 people
walking together along the corridor |
ShopAssistant2cor.mpg (21 Mb) |
ShopAssistant2front.mpg (21 Mb) |
3 persons
walking in the corridor |
ThreePastShop1cor.mpg (10 Mb) |
ThreePastShop1front.mpg (10 Mb) |
Another 3
persons walking in the corridor |
ThreePastShop2cor.mpg (9 Mb) |
ThreePastShop2front.mpg (9 Mb) |
goes inside a store and later comes out |
TwoEnterShop1cor.mpg (10 Mb) |
TwoEnterShop1front.mpg (10 Mb) |
goes inside a store and later comes out |
TwoEnterShop2cor.mpg (9 Mb) Ground
truth XML version: c2es2gt.xml |
TwoEnterShop2front.mpg (9 Mb) Ground truth XML version: f2es2gt.xml |
couples go inside store and later one comes out |
TwoEnterShop3cor.mpg (7 Mb) |
TwoEnterShop3front.mpg (6 Mb) |
leaves a store while browsing |
TwoLeaveShop1cor.mpg (8 Mb) |
TwoLeaveShop1front.mpg (8 Mb)
A couple
leaves a store |
TwoLeaveShop2cor.mpg ( 4 Mb) |
TwoLeaveShop2front.mpg (4 Mb)
There have been
accesses since March 2005.
Date of last change to this page: 09/02/2022 13:16:37
© 2007 Robert Fisher