Edinburgh Simulated Surgical Tools Dataset (RGBD)

Introduction

The dataset contains RGBD images of five simulated surgical tools (two kinds of scalpels, two kinds of clamps, and one tweezers), synthetic and real images, for a total number of 64,728 images (RGB 38,469, Depth 26,259). The tools are simulated because the main project was concerned with human-computer interaction rather than visual recognition, so the domain was simplified: non-specular, tool parts color-coded, larger than real-life so the depth sensor can acquire multiple points across the width. The real parts were 3D printed.

Ground-truth labels for the tool properties are provided for each synthetic image (slightly different from file to file, see detailed description in the tables below).

This dataset was created as part of a visual recognition subtask for the Advanced Autonomy Project at the University of Edinburgh, funded by the Turing Institute, Vision tasks include surgical tool classification, 6D pose estimation, tool attribute recognition (size, color, relative position, grasping points, etc.) for doctor-robot language interaction tasks and robotic arm picking tasks. Synthetic tools are created in blender using 3D meshes, while real tools are 3D printed using synthetic models. Original mesh files and point cloud files can also be downloaded below.


Synthetic Image	Real image

Dataset

Synthetic Images

See below for txt and json file formats.

File	Type	Images (RGB/D)	Size (MB)	Ground-truth Description
single_bbox_6000	rgb, single tool	5970/0	566	2D bounding boxes only (.txt)
multi_bbox_3000	rgb, multiple tools	3154/0	252	2D bounding boxes only (.txt)
multi_fullGT_500	rgb-d, multiple tools	500/500	76.3	ground truth description (.json)
multi_fullGT_1000	rgb-d, multiple tools	1110/1110	135	ground truth description (.json)
multi_spoon_3500	rgb-d, multiple tools	3500/3500	415	ground truth description (.json) + new object 'spoon.1', class index '5', one new color 'O' (orange)
multi_grasp_1000	rgb-d, multiple tools	1000/1000	114	ground truth description (.json) + grasp points for each tool
single_grasp_9000	rgb-d, single tool	9000/9000	931.5	ground truth description (.json) + grasp points for single tool

Real Images

As well as the images, the download files include detection boxes found by YoloV5. We estimate that the boxes are 99.5% correct (a few missing detections). No ground-truth on identity, color, or location is included.

File	Type	Image No. (rgb/d)	Size (MB)	Detection Label Description
multi_real_1000	rgb-d, multiple tools	1185/1185	686	detected 2D bounding boxes (.json) + black background
multi_real_1600	rgb-d, multiple tools	1685/1685	537	detected 2D bounding boxes (.json) + white background
multi_real_2200	rgb-d, multiple tools	2298/2298	953	detected 2D bounding boxes (.json) + normal background
single_paper_real_1000	rgb, single tool	1120/0	370	paper tools, detected 2D bounding boxes (.json) + white background
single_real_2000	rgb, single tool	1966/0	805	single tools, detected 2D bounding boxes (.json) + normal background
single_real_5000	rgb-d, single tool	5981/5981	3174	single tools, detected 2D bounding boxes (.json) + black background

Raw 3D Files

These are the source files for creating the synthetic tools. Examples of the parts are shown below.

File	Size (MB)	Description
ptc_and_mesh_files	1.82	(.pcd) 7 point cloud files for tools (clamps with half parts) (.blend) 1 file for tool meshes with grasp points

Acknowledgements

Contact

Email: Prof. Robert Fisher at rbf -a-t- inf.ed.ac.uk.
School of Informatics, Univ. of Edinburgh
1.11 Bayes Centre, 47 Potterrow, Edinburgh EH8 9BT, UK
Tel: +44-(131)-651-3441 (direct line), +44-(131)-651-3443 (secretary)

Ground-truth and detection JSON file format

Ground-truth is a .json file made for each synthetic image. Please see details below.
Detected 2D bounding boxes are provided for each tool in real images.

# Tool Classes

nc: 5          #number of classes  #predefined

"class_label":   0 1 2 3 4      #class indexes #predefined 

"type": ['scalpel', 'scalpel', 'clamp', clamp', 'tweezers']    #type names  #predefined

# Tool Attributes

- Full scene description (read 'gtxxx.json' and gtdata[0])
{
- I CAN SEE X OBJECTS ON THE TABLE. 
'object_indices': [0, 1, ...], 
'objects': ['spoon', 'clamp', ...], 
'object_size': ['small', 'small', ...], 
'object_colors': ['blue', 'red', ...], 
'which_side_on_table': ['middle', 'middle', ...], 
}

- Each tool description (read 'gtxxx.json' and gtdata[1])

"real_name": ['scalpel.1', 'scalpel.2', 'clamp.1', 'clamp.2', 'tweezers.1']    #object_names  #predefined

"size": ['big', 'big', 'small', 'big', 'small'];  #predefined #object_size

"maincolor": ['R', 'G', 'B', 'P', 'C', 'Y']   (i.e., 'red', 'green', 'blue', 'purple', 'cyan', 'yellow'); #random

"2D_box_image": [x_center   y_center   width   height]   (YOLOv5 format value 0-1); #random #2D_image_coordinate

"location_world": [x, y, z] (m) ;   "rotation_world": [x, y, z] (Euler) ;  # 6d_pose #random #3D_world_coordinate

"open_angle":  0-70 degree for clamps (counterclockwise); others 0 degree;  #random #Z_axis_3D_world_coordinate

"3D_box_local": eight vertices of 3D bounding box (m); #predefined #object_3D_size #3D_local_coordinate

(*partial synthetic files only) 

"grasp3D_handle", "grasp3D_joint_blade", [x, y, z] (m) ; #grasp_points #predefined  #3D_local_coordinate

"below": [], "above": [], "near": [], "which_side_of_table": [];   # Relative_location

*duplicated tools will be named as ['scapel.1x', 'scapel.2x', 'clamp.1x', 'clamp.2x', 'tweezers.1x'] in the ground-truth file

Ground-truth and detection TXT file format

    
In the *.txt files, each row is a bounding box (YoloV5 format 0-1)
[class x_center y_center width height]

where class is one of these values:  0 1 2 3 4
corresponding to these class names:
( ['scaple.1', 'scaple.2', 'clamp.1', 'clamp.2', 'tweezers.1'] )