The dataset comprises of two views of various scenario's of people acting out various interactions.

The data is captured at 25 frames per second. The resolution is 640x480. The videos are available either as AVI's or as a numbered set of JPEG single image files.

Researchers can freely use the dataset. If results based on the dataset appear in a publication, please include a citation to: S. J. Blunsden, R. B. Fisher, "The BEHAVE video dataset: ground truthed video for multi-person behavior classification" , Annals of the BMVA, Vol 2010(4), pp 1-12.

A lot (but not all) of the video sequences have ground truth bounding boxes of the pedestrians in the scene. The bounding boxes are given in the VIPER XML format. The site will be updated when more of the ground truth becomes available.

Ten basic scenarios were acted out by some of the Vision Group team members. These were called InGroup (IG), Approach (A), WalkTogether (WT), Split (S), Ignore (I), Following (FO), Chase (C), Fight (FI), RunTogether (RT), and Meet (M). Many of the interactions in the video sequence (Clip 1) are labelled accordingly. These are given in a seperate file called markup.txt. The annotation in markup.txt is as follows:

ID1 - This is the first group which is interacting. The numbers within the square brackets [] are the members of the group.
ID2 - This is the second group which is interacting with the first (in ID1). This entry is optional.
Start - The starting frame number.
End - The ending frame number.
Label - This is the class label which describes the activity taking place (the scenarios being enacted). A short description of each scenario follows:

InGroup - The people are in a group and not moving very much
Approach - Two people or groups with one (or both) approaching the other
WalkTogether - People walking together
Meet - Two or more people meeting one another
Split - Two or more people splitting from one another
Ignore - Ignoring of one another
Chase - one group chasing another
Fight - Two or more groups fighting
RunTogether - The group is running together
Following - Being followed

All entries are delimited by a semicolon (;)


ID1 ID2 Start End Label
[3,4] ;5826 ;5926 ;WalkTogether

This example is interpreted as persons with ids 3 and 4 are in a group (represented as [3,4]) which has been labelled as "WalkTogether" between frames 5826 and 5926.

ID1 ID2 Start End Label
[2] [0,1] ;60296 ;60349 ;Approach

This example shows that person 2 is being approached by persons 0 and 1 ([0,1]) between frames 60296 and 60349. Persons 0 and 1 are in the approaching group.

Corresponding pixel positions in image:

For people interested in the ground plane homography, the mapping can be computed from this information. The image pixel positions will depend on your image scaling. Here the image sizes are 640x480. For the image just below the corresponding pixel positions are:

Point (Col,Row)(pixels) (X,Y)(cm)
A (453,250) (913,-358)
B (416,317) (926,-454)
C (63,327) (348,-876)
D (44,155) (-148,-290)
E (125,158) (0,-253)
F (259,152) (346,-120)
G (249,130) (270,0)
H (158,113) (0,0)


There are 4 clips that can be downloaded as .wmv files.


Clip 1 has been devided into 8 sequences. The AVI files are named N-M.avi and contain video frames N through M. Similar naming conventions apply for the other files. Some segments of video with little or no action have been removed (eg. from 11201 through 11499). The table below contains information about the kinds of scenarios enacted in each clip along with links to each clips' video, ground truth file, archived JPEGs, and the .info file used by VIPER. The .xgtf extension is used by VIPER but these are essentially text files. Please note that the ground truth files are not complete yet. In order to run the .info files the .xgtf files need to be edited in order to reflect the location in which the .info files are saved. So for example if file is saved in drive D:\ then 1-11200.xgtf should reflect this as:

<sourcefile filename="file:/D:/">

Moreover, the .info file has to be in the same directory as the unzipped archived JPEGs for each sequence. Please also note that the .info files contain information about the whole sequence, therefore, when the .xgtf files are run using VIPER the screen will appear blank until the frame range of the ground truth data is reached.

Clips Scenarios AVI video Ground truth XML version JPEGs INFO file
Sequence0 IG,A,WT,I,S,FO 1-11200.avi(18.2MB) 1-11200.xgtf 1-11200.tar.gz(874MB)
Sequence1 WT,A,IG,S 11500-17450.avi(11.7MB) Not Available 11500-17450.tar.gz(970.5MB)
Sequence2 S,WT,A,IG 18000-23700.avi(9MB) 18000-23700F.xgtf 18000-23700.tar.gz(897.5MB)
Sequence3 Not Available 24300-35200.avi(17.2MB) 24300-35200F.xgtf 24300-35200.tar.gz(1.6GB)
Sequence4 C,RT,I,IG,WT,FI,S 35450-47160.avi(16.1MB) 35450-47160F.xgtf 35450-47160.tar.gz(1.7GB)
Sequence5 A,M,RT,IG,WT,FI,S 47300-58400.avi(16MB) 47300-58400F.xgtf 47300-58400.tar.gz(1.6GB)
Sequence6 A,IG,WT,FI,S 59800-66750.avi(9.2MB) 59800-66750F.xgtf 59800-66750.tar.gz(1.1GB)
Sequence7 Not Available 67210-76800.avi(11MB) 67210-76800F.xgtf 67210-76800.tar.gz(1.4GB)

