Failing to train Pointnet2 segmentation on custom dataset
See original GitHub issue❓ Questions & Help
Hello,
I am quite new to pointcloud learning. I have did some tutorials in pytorch_geometric but now I encounter something that i cant quite understand so I appriciate your help on this. I have large pointcloud maps that I use for navigation of robots, The pointclouds maps are generated and labeled from simulations. I want to train networks to segment derivable and non derivable regions. I created a Dataset for my purpose on my fork named ; uneven_ground_dataset.py
I also modified the pointnet2_segmentaion.py
When I start training I encounter following prolem;
ros2-foxy@ros2foxy-Lenovo-ideapad-700-15ISK:~/pytorch_geometric$ python3 examples/pointnet2_segmentation.py
mm Intializing UnevenGroundDataset dataset
download function is void, makesure data is locally availabe and under provided root folder
Traceback (most recent call last):
File "examples/pointnet2_segmentation.py", line 125, in <module>
train()
File "examples/pointnet2_segmentation.py", line 86, in train
out = model(data)
File "/home/ros2-foxy/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "examples/pointnet2_segmentation.py", line 58, in forward
sa1_out = self.sa1_module(*sa0_out)
File "/home/ros2-foxy/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ros2-foxy/pytorch_geometric/examples/pointnet2_classification.py", line 21, in forward
row, col = radius(pos, pos[idx], self.r, batch, batch[idx],
File "/usr/local/lib/python3.8/dist-packages/torch_geometric-1.6.3-py3.8.egg/torch_geometric/nn/pool/__init__.py", line 173, in radius
return torch_cluster.radius(x, y, r, batch_x, batch_y, max_num_neighbors,
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "/home/ros2-foxy/.local/lib/python3.8/site-packages/torch_cluster/radius.py", line 53, in radius
if batch_x is not None:
assert x.size(0) == batch_x.numel()
batch_size = int(batch_x.max()) + 1
~~~ <--- HERE
deg = x.new_zeros(batch_size, dtype=torch.long)
RuntimeError: CUDA error: the launch timed out and was terminated
ros2-foxy@ros2foxy-Lenovo-ideapad-700-15ISK:~/pytorch_geometric$
I dont have a dedicted computer for DL at the moment I use minimal batch size. I searched for possible causes but I could not figure out why.
I have a few .pcd fle and I could provide them if you want to reproduce the issue.
Thank youu very much for your time.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Related StackOverflow Question
Using the training data during inference should also yield 0.92 accuracy. If it does not do so, there might be some differences in the code regarding training and inference computation, e.g., induced by BatchNorm or Dropout.
I down sampled the cloud and I get 0.92 accuracy at last epoch in the training phase. However when testing(with identical data) network cannot predict anything. The model is over fitting but is it normal to that I get no predictions at all to identical data?