Custom dataset training with PoseC3D

See original GitHub issue

I reviewed all the relevant pages for implementing PoseC3D with my custom dataset and got confused what annotation format should be prepared.

I have a 4-class category dataset for anomaly action recognition in videos which is action recognition task. The format of my dataset as a text file is as follows:

path/to/class_folder/train/1/arhstrjadefh.mp4 1
path/to/class_folder/train/1/ksitldguidjd.mp4 1
................................................
path/to/class_folder/train/2/dfgnkureae.mp4 2
path/to/class_folder/train/2/qgsjgldube.mp4 2
................................................
path/to/class_folder/train/3/adrwetrshjty.mp4 3
path/to/class_folder/train/3/ldjsrqggejhd.mp4 3
................................................
path/to/class_folder/train/4/jlshginrtdfhi.mp4 4
path/to/class_folder/train/4/jlshginrtdfhi.mp4 4

I have val and test annotations as well.

This abides by the VideoDataset rules given Tutorial 3: Adding New Dataset

Now, I would like to train my own dataset with PoseC3D, however The Format of PoseC3D Annotations reads that dataset should be annotated as

each item is a dictionary that is the skeleton annotation of one video

keypoint: The keypoint coordinates, which is a numpy array of the shape N (#person) x T (temporal length) x K (#keypoints, 17 in our case) x 2 (x, y coordinate). keypoint_score: The keypoint confidence scores, which is a numpy array of the shape N (#person) x T (temporal length) x K (#keypoints, 17 in our case). frame_dir: The corresponding video name. label: The action category. img_shape: The image shape of each frame. original_shape: Same as above. total_frames: The temporal length of the video.

I am confused here. If I want to have a model that takes raw video as input and outputs action category, do I have to have skeleton annotations of each video? If so how do I prepare skeleton annotation to implement PoseC3D?

While reading the paper of PoseC3D I understood that the model only requires videos as input and generates skeleton information within the model itself and subsequently recognizes actions. Then why do we need to provide skeleton information when training?

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:15 (6 by maintainers)

Top GitHub Comments

5reactions

bit-scientistcommented, Oct 21, 2021

Hi @kennymckormick. Let me outline what I got from what I read so far. So according to gym_train.pkl, its format is as shown there, but gym_train.pkl contains annotations of FineGYM dataset. Since I want to train my own dataset with PoseC3D model (skeleton based), I need to have my own so-called custom_dataset_train.pkl (and custom_dataset_val.pkl) annotation files first, right?

To achieve that, I may use ntu_pose_extraction.py as shown in Prepare Annotations like this:

python ntu_pose_extraction.py some_video_from_my_dataset.mp4 some_video_from_my_dataset.pkl

Doing so for every single video for training and val will give me .pkl files and I will gather them into a single list for training (and val) my dataset with PoseC3D’s skeleton framework. Am I right up until this point?

Assuming the above is true, I may use the following script (with a little change) for training as shown in PoseC3D/Train

python tools/train.py configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py --work-dir work_dirs/slowonly_r50_u48_240e_ntu120_xsub_keypoint --validate --test-best --gpus 2 --seed 0 --deterministic

Lastly, before running the above script, I need to modify the variables

ann_file_train = 'data/posec3d/custom_dataset_train.pkl'
ann_file_val = 'data/posec3d/custom_dataset_val.pkl'

and num_classes=4, pretrained=None, (when training from scratch, pretrained='torchvision://resnet50' when fine-tuning) and training schedule alterations as needed in configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py

Does this all make sense to you? Thank you 😃

3reactions

kennymckormickcommented, Nov 9, 2021

Hi, @kennymckormick. Today I was going to train my custom dataset based on what we discussed in this issue. It turns out that there should be some more changes in config file. Namely, class_prob={i: 1 + int(i >= 60) for i in range(120)} of slowonly_r50_u48_240e_ntu120_xsub_keypoint.py. I couldn’t find any related info about what class_prob means in the paper and didn’t understand why the half classes were assigned 1 and the rest were given 2. Could you please shed some light on that and tell what its value should be for n class-custom training? Thank you.

That’s just a trick for NTU120-XSub training: the training samples for the later 60 classes are much less than the first 60 classes. So we sample these videos more frequently. For custom training, you can set class_prob for all classes to 1, or to any values depending on your need.

Top Results From Across the Web

Custom Dataset Training with PoseC3D - mmaction2 - GitHub

We provide a step-by-step tutorial on how to train your custom dataset with PoseC3D. First, you should know that action recognition with PoseC3D...

How to Train YOLOv7 On a Custom Dataset - YouTube

In this video we walk through how to train YOLOv7 on your custom dataset. 1. What's New in YOLOv72. Exploring Roboflow Universe for...

Skeleton-based Action Recognition Models

For training with your custom dataset, you can refer to Custom Dataset Training. For more details, you can refer to Training setting part...

NTU RGB+D Dataset - Papers With Code

Task Dataset Variant Best Model Skeleton Based Action Recognition NTU RGB+D PoseC3D Skeleton Based Action Recognition NTU RGB+D 120 HD‑GCN Action Recognition NTU RGB+D PoseC3D

Revisiting skeleton-based action recognition

The training data (RGB) is scarce or highly biased. 3. When you need a very light action recognition model (skeleton models can.