異なる, 写真, 時計, メーター が含まれている画像 自動的に生成された説明

Wide Surveillance Images from Different Heights:  Dataset Description

                                                          Vision and Robotics Lab. Wakayama University                                           Japanese  2023 4/1




Dataset Overview

This image dataset consists of images of people captured by a wide-angle ceiling camera with annotation information added. The images show multiple people in a scene with furniture such as desks and chairs, and luggage. A truck with a crane to change the height of the wide-angle camera is also in the image. The annotations are the tilted bounding box information and the posture information (sit/stand) of the figures.

A unique feature of this dataset is that it was captured from three different heights (3m, 4m, and 5m), and no similar dataset currently exists within the range of our research.

The reason why the height of the ceiling camera needs to be changed in the task of detecting people from a wide-angle ceiling camera is that the accuracy of detection can change significantly if the image data used for inference differs from the image data used to train the detector.




Conditions of use:

This dataset can be used unconditionally by non-profit organizations for research or implementation without financial support from commercial firms by submitting a written application. For all other uses, please contact with twada@ieee.org for permission.


Application for use:

At this time, we do not have an automated application system, so please send us an email with the following information.

----------------------------------------------------

To: twada@ieee.org

Subject: wsi-dh dataset download request

 

Purpose of Use:

Applicant Name:

Applicant Organization:

Applicant Address:

Applicant E-mail address:

----------------------------------------------------


Acknowledgements
:

We thank Giken Truststem Co., Ltd. for providing this data, and Ms. Sugisaki and Mr. Miura, second-year students at Wakayama University, for their help in annotating the data.

 

Toshikazu Wada



 

The following is detailed information.

  • There are three types of stored data: original video data, image data, and annotation data.
  • The original video data is stored under the directory "Video".
    • A total of 6 videos (2 types of scenes x 3 types of heights)
    • Two types of scene names: "store in shopping center" (sc) and "food court" (fc).
    • Installation heights: 3m, 4m, 5m (e.g. height name is "3000" for 3m)
    • Video name is scene name_height name and file format is .mp4.
    • The video is recorded in 1920x1080 at 30 fps.

 

  • The data divided into images and annotations are stored separately in directories with video names, such as "fc_3000".
  • The image data is stored under a directory named "Images" directly under each directory.
    • The image data is a jpeg file with each video data sampled every 10 or 15 frames and the image size changed to 640x640. The frame number is attached to the file name.
  • The annotation data is stored under the directory "Annotation" immediately below each directory, and two types of annotation data are stored: "Json" and "Text". "Jason" contains a json file, and "Text" contains a file that corresponds one-to-one with the image file. The file name is "video_frame_number.txt" if the image is "video_frame_number.jpg". The information under "Json" and the information in "Text" are essentially the same.

Directory Structure

dataset

├── Video

   ├── fc_3000.mp4

   ├── fc_4000.mp4

   ├── fc_5000.mp4

   ├── sc_3000.mp4

   ├── sc_4000.mp4

   └── sc_5000.mp4

├── fc_3000

   ├── Annotation

   └── Images

├── fc_4000

   ├── Annotation

   └── Images

├── fc_5000

   ├── Annotation

   └── Images

├── sc_3000

   ├── Annotation

   └── Images

├── sc_4000

   ├── Annotation

   └── Images

└── sc_5000

    ├── Annotation

    └── Images

*See the table below for details on each video and image data.


Annotation file description

  • Each bounding box is represented by 14 values [cx, cy, w, h, angle, lux, luy, rux, ruy, rbx, rby, lbx, lby, class]

cx, cyCenter coordinates of the bounding box with the upper left corner of the image as (0,0)
w, h
Width and height of the bounding box
angle
Clockwise rotation angle (in degrees) from the upward vertical axis, range -180 to 180
lux, luy
Upper left corner coordinates of the bounding box
rux, ruy
Upper right corner coordinates of the bounding box
rbx, rby
Lower right corner coordinates of bounding box
lbx, lby
Lower left corner coordinates of bounding box
class
"stand person"or "sit person"

  • The order of attribute description in text formatcx cy w h angle lux luy rux ruy rbx rby lbx lby class
  • JSON format style: Each video is stored in a single JSON file in a format that roughly conforms to the MS COCO format.


*See the table below for details on each video and image data.

Video name

Number of total frames

Number of annotated frames(frame interval)

Video frame resolution(FPS)

Image resolution

fc_3000

18097

1200(15)

1920×1080(30)

640×640

fc_4000

15262

1500(10)

1920×1080(30)

640×640

fc_5000

18420

1700(10)

1920×1080(30)

640×640

sc_3000

15101

1000(20:1-10000frame,10: 10001frame-)

1920×1080(30)

640×640

sc_4000

13770

1000(10)

1920×1080(30)

640×640

sc_5000

14369

1000(10)

1920×1080(30)

640×640


Annotation file description

cx, cyCenter coordinates of the bounding box with the upper left corner of the image as (0,0)
w, h
Width and height of the bounding box
angle
Clockwise rotation angle (in degrees) from the upward vertical axis, range -180 to 180
lux, luy
Upper left corner coordinates of the bounding box
rux, ruy
Upper right corner coordinates of the bounding box
rbx, rby
Lower right corner coordinates of bounding box
lbx, lby
Lower left corner coordinates of bounding box
class
"stand person"or "sit person"