Custom dataset training with PoseC3D
See original GitHub issueI reviewed all the relevant pages for implementing PoseC3D with my custom dataset and got confused what annotation format should be prepared.
I have a 4-class category dataset for anomaly action recognition in videos which is action recognition task. The format of my dataset as a text file is as follows:
path/to/class_folder/train/1/arhstrjadefh.mp4 1
path/to/class_folder/train/1/ksitldguidjd.mp4 1
................................................
path/to/class_folder/train/2/dfgnkureae.mp4 2
path/to/class_folder/train/2/qgsjgldube.mp4 2
................................................
path/to/class_folder/train/3/adrwetrshjty.mp4 3
path/to/class_folder/train/3/ldjsrqggejhd.mp4 3
................................................
path/to/class_folder/train/4/jlshginrtdfhi.mp4 4
path/to/class_folder/train/4/jlshginrtdfhi.mp4 4
I have val and test annotations as well.
This abides by the VideoDataset rules given Tutorial 3: Adding New Dataset
Now, I would like to train my own dataset with PoseC3D, however The Format of PoseC3D Annotations reads that dataset should be annotated as
each item is a dictionary that is the skeleton annotation of one video
keypoint: The keypoint coordinates, which is a numpy array of the shape N (#person) x T (temporal length) x K (#keypoints, 17 in our case) x 2 (x, y coordinate). keypoint_score: The keypoint confidence scores, which is a numpy array of the shape N (#person) x T (temporal length) x K (#keypoints, 17 in our case). frame_dir: The corresponding video name. label: The action category. img_shape: The image shape of each frame. original_shape: Same as above. total_frames: The temporal length of the video.
I am confused here. If I want to have a model that takes raw video as input and outputs action category, do I have to have skeleton annotations of each video? If so how do I prepare skeleton annotation to implement PoseC3D?
While reading the paper of PoseC3D I understood that the model only requires videos as input and generates skeleton information within the model itself and subsequently recognizes actions. Then why do we need to provide skeleton information when training?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:15 (6 by maintainers)
Top Related StackOverflow Question
Hi @kennymckormick. Let me outline what I got from what I read so far. So according to
gym_train.pkl, its format is as shown there, butgym_train.pklcontains annotations of FineGYM dataset. Since I want to train my own dataset with PoseC3D model (skeleton based), I need to have my own so-calledcustom_dataset_train.pkl(andcustom_dataset_val.pkl) annotation files first, right?To achieve that, I may use
ntu_pose_extraction.pyas shown in Prepare Annotations like this:python ntu_pose_extraction.py some_video_from_my_dataset.mp4 some_video_from_my_dataset.pklDoing so for every single video for training and val will give me
.pklfiles and I will gather them into a single list for training (and val) my dataset with PoseC3D’s skeleton framework. Am I right up until this point?Assuming the above is true, I may use the following script (with a little change) for training as shown in PoseC3D/Train
python tools/train.py configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py --work-dir work_dirs/slowonly_r50_u48_240e_ntu120_xsub_keypoint --validate --test-best --gpus 2 --seed 0 --deterministicLastly, before running the above script, I need to modify the variables
and
num_classes=4,pretrained=None,(when training from scratch,pretrained='torchvision://resnet50'when fine-tuning) and training schedule alterations as needed inconfigs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.pyDoes this all make sense to you? Thank you 😃
That’s just a trick for NTU120-XSub training: the training samples for the later 60 classes are much less than the first 60 classes. So we sample these videos more frequently. For custom training, you can set class_prob for all classes to 1, or to any values depending on your need.