ETHZ PASCAL Stickmen V1.11

Annotated data and evaluation routines for 2D human pose estimation

Marcin Eichner, Vittorio Ferrari


Annotated example Dataset sticks distribution
Dataset sticks distribution

We release here annotations for 549 images from the PASCAL VOC 2008 trainval release [2]. The dataset consists mainly of amateur photographs with difficult illumination and low image quality. In each image one roughly upright and approximately frontal person is annotated by a 6-part stickman (i.e. one line segment indicating location, size and orientation for each part: head, torso, upper and lower arms). The annotated person is visible at least from the waist up. Results on this dataset have been first published in [1].

NOTE: this dataset has no overlap with VOC08/09/10 test sets.

In addition, the package includes official Matlab routines to evaluate the performance of your pose estimation system on this dataset and compare to our results from [3].

On the right, the scatter plot inspired by [5] depicts pose variability over this dataset. Stickmen are centered on the neck and scale normalized. Hence the plot captures only pose variability and does not show scale and location variability.

Clarification of the PCP evaluation criterion

The matlab code to evaluate PCP provided with this dataset represents the official evaluation protocol for the following datasets: Buffy Stickmen, ETHZ PASCAL Stickmen, We Are Family Stickmen. In our PCP implementation, a body part produced by an algorithm is considered correctly localized if its endpoints are closer to their ground-truth locations than a threshold (on average over the two endpoints). Using it ensures results comparable to the vast majority of results previously reported on these dataset.

Recently an alternative implementation of the PCP criterion, based on a stricter interpretation of its description in Ferrari et al CVPR 2008 has been used in some works, including Johnson et al. BMVC 2010 and Pishchulin et al CVPR 2012. In this implementation, a body part is considered correct only if both of its endpoints are closer to their ground-truth locations than a threshold. These two different PCP measures are the consequence of the ambiguous wording in the original verbal description of PCP in Ferrari et al CVPR 2008 (which did not mention averaging over endpoints). Importantly, the stricter PCP version has essentially been used only on other datasets than the ones mentioned above, and in particular on IIP (Iterative Image Parsing dataset, Ramanan NIPS 2006) and LSP (Leeds Sports Pose dataset, Johnson et al. BMVC 2010).

In order to keep a healthy research environment and guarantee the comparability of results across different research groups and different years, we recommend the following policy:

D. Ramanan. "Learning to Parse Images of Articulated Objects", In NIPS, 2006.
S. Johnson and M. Everingham "Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation", In BMVC, 2010
L. Pishchulin, A. Jain, M. Andriluka, T. Thormaehlen and B. Schiele "Articulated People Detection and Pose Estimation: Reshaping the Future", In CVPR, 2012


new in v1.11:

new in v1.1:


ETHZ_PASCAL_Stickmen_v1.11.tgz annotations for the included frames and matlab code to read and display the annotations and evaluate pose estimation performance. 79 MB
README.txt description of contents. 14 kB
PCP_techrep2010_Pascal.png plot showing pose estimation performance of [3] on this dataset 13 kB
PCP_techrep2010_Pascal.fig Matlab figure plot. You can overlay your performance curve on this plot in order to compare to our results from [3] 18 kB

Related Publications

[1] Eichner, M. and Ferrari, V.
Better Appearance Models for Pictorial Structures
Proceedings of British Machine Vision Conference (BMVC), 2009.
Document: PDF

[2] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman.
The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results.
Webpage, 2008.

[3] M.Eichner, M. Marin-Jimenez, A. Zisserman, V.Ferrari
2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images
International Journal of Computer Vision, 2012
Document: PDF

[4] Calvin upper-body detector
Webpage, 2010.

[5] D.Tran, D.Forsyth
Improved Human Parsing with a Full Relational Model
Proceedings of European Conference on Computer Vision (ECCV) 2010


This work is funded the Swiss National Science Foundation SNSF

Please report problems with this page to
Marcin Eichner
Last updated on Tuesday, 02nd February, 2016