The dataset contains RGBD images of five simulated surgical tools (two kinds of scalpels, two kinds of clamps, and one tweezers), synthetic and real images, for a total number of 64,728 images (RGB 38,469, Depth 26,259). The tools are simulated because the main project was concerned with human-computer interaction rather than visual recognition, so the domain was simplified: non-specular, tool parts color-coded, larger than real-life so the depth sensor can acquire multiple points across the width. The real parts were 3D printed.
Ground-truth labels for the tool properties are provided for each synthetic image (slightly different from file to file, see detailed description in the tables below).
This dataset was created as part of a visual recognition subtask for the Advanced Autonomy Project at the University of Edinburgh, funded by the Turing Institute, Vision tasks include surgical tool classification, 6D pose estimation, tool attribute recognition (size, color, relative position, grasping points, etc.) for doctor-robot language interaction tasks and robotic arm picking tasks. Synthetic tools are created in blender using 3D meshes, while real tools are 3D printed using synthetic models. Original mesh files and point cloud files can also be downloaded below.
Synthetic Image | Real image |
See below for txt and json file formats.
File | Type | Images (RGB/D) | Size (MB) | Ground-truth Description |
---|---|---|---|---|
single_bbox_6000 | rgb, single tool | 5970/0 | 566 | 2D bounding boxes only (.txt) |
multi_bbox_3000 | rgb, multiple tools | 3154/0 | 252 | 2D bounding boxes only (.txt) |
multi_fullGT_500 | rgb-d, multiple tools | 500/500 | 76.3 | ground truth description (.json) |
multi_fullGT_1000 | rgb-d, multiple tools | 1110/1110 | 135 | ground truth description (.json) |
multi_spoon_3500 | rgb-d, multiple tools | 3500/3500 | 415 | ground truth description (.json) + new object 'spoon.1', class index '5', one new color 'O' (orange) |
multi_grasp_1000 | rgb-d, multiple tools | 1000/1000 | 114 | ground truth description (.json) + grasp points for each tool |
single_grasp_9000 | rgb-d, single tool | 9000/9000 | 931.5 | ground truth description (.json) + grasp points for single tool |
As well as the images, the download files include detection boxes found by YoloV5. We estimate that the boxes are 99.5% correct (a few missing detections). No ground-truth on identity, color, or location is included.
File | Type | Image No. (rgb/d) | Size (MB) | Detection Label Description |
---|---|---|---|---|
multi_real_1000 | rgb-d, multiple tools | 1185/1185 | 686 | detected 2D bounding boxes (.json) + black background |
multi_real_1600 | rgb-d, multiple tools | 1685/1685 | 537 | detected 2D bounding boxes (.json) + white background |
multi_real_2200 | rgb-d, multiple tools | 2298/2298 | 953 | detected 2D bounding boxes (.json) + normal background |
single_paper_real_1000 | rgb, single tool | 1120/0 | 370 | paper tools, detected 2D bounding boxes (.json) + white background |
single_real_2000 | rgb, single tool | 1966/0 | 805 | single tools, detected 2D bounding boxes (.json) + normal background |
single_real_5000 | rgb-d, single tool | 5981/5981 | 3174 | single tools, detected 2D bounding boxes (.json) + black background |
These are the source files for creating the synthetic tools. Examples of the parts are shown below.
File | Size (MB) | Description |
---|---|---|
ptc_and_mesh_files |
1.82 | (.pcd) 7 point cloud files for tools (clamps with half parts) (.blend) 1 file for tool meshes with grasp points |
The data is freely available for research use. Acknowledge the University of Edinburgh and Turing Institute. The data is the property of the University of Edinburgh. All rights reserved.
Email: Prof. Robert Fisher at rbf -a-t- inf.ed.ac.uk.
Ground-truth is a .json file made for each synthetic image. Please see details below.
Detected 2D bounding boxes are provided for each tool in real images.
nc: 5 #number of classes #predefined "class_label": 0 1 2 3 4 #class indexes #predefined "type": ['scalpel', 'scalpel', 'clamp', clamp', 'tweezers'] #type names #predefined
- Full scene description (read 'gtxxx.json' and gtdata[0]) { - I CAN SEE X OBJECTS ON THE TABLE. 'object_indices': [0, 1, ...], 'objects': ['spoon', 'clamp', ...], 'object_size': ['small', 'small', ...], 'object_colors': ['blue', 'red', ...], 'which_side_on_table': ['middle', 'middle', ...], } - Each tool description (read 'gtxxx.json' and gtdata[1]) "real_name": ['scalpel.1', 'scalpel.2', 'clamp.1', 'clamp.2', 'tweezers.1'] #object_names #predefined "size": ['big', 'big', 'small', 'big', 'small']; #predefined #object_size "maincolor": ['R', 'G', 'B', 'P', 'C', 'Y'] (i.e., 'red', 'green', 'blue', 'purple', 'cyan', 'yellow'); #random "2D_box_image": [x_center y_center width height] (YOLOv5 format value 0-1); #random #2D_image_coordinate "location_world": [x, y, z] (m) ; "rotation_world": [x, y, z] (Euler) ; # 6d_pose #random #3D_world_coordinate "open_angle": 0-70 degree for clamps (counterclockwise); others 0 degree; #random #Z_axis_3D_world_coordinate "3D_box_local": eight vertices of 3D bounding box (m); #predefined #object_3D_size #3D_local_coordinate (*partial synthetic files only) "grasp3D_handle", "grasp3D_joint_blade", [x, y, z] (m) ; #grasp_points #predefined #3D_local_coordinate "below": [], "above": [], "near": [], "which_side_of_table": []; # Relative_location *duplicated tools will be named as ['scapel.1x', 'scapel.2x', 'clamp.1x', 'clamp.2x', 'tweezers.1x'] in the ground-truth file
In the *.txt files, each row is a bounding box (YoloV5 format 0-1) [class x_center y_center width height] where class is one of these values: 0 1 2 3 4 corresponding to these class names: ( ['scaple.1', 'scaple.2', 'clamp.1', 'clamp.2', 'tweezers.1'] )
© 2022 Robert Fisher