DetectNet_v2 tensorrt License Plate Detection (LPDNet)
See original GitHub issueHi,
I have made a tensorrt engine of the model downloadable from here:
tlt-converter -k nvidia_tlt -d 3,480,640 -p image_input,1x3x480x640,4x3x480x640,16x3x480x640 usa_pruned.etlt -t fp16 -e lpd_engine.trt
The model is based off of DetectNet_v2. Has anyone managed to get this or a other DetectNet_v2 model working with tensorrt in python?
Here is the code I have so far:
import os
import time
import cv2
#import matplotlib.pyplot as plt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
import pdb
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
def load_engine(trt_runtime, engine_path):
with open(engine_path, "rb") as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine
# Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
def allocate_buffers(engine, batch_size=1):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
# pdb.set_trace()
size = trt.volume(engine.get_binding_shape(binding)) * batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
print(f"input: shape:{engine.get_binding_shape(binding)} dtype:{engine.get_binding_dtype(binding)}")
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
print(f"output: shape:{engine.get_binding_shape(binding)} dtype:{engine.get_binding_dtype(binding)}")
return inputs, outputs, bindings, stream
def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
# Transfer input data to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
context.execute_async(
batch_size=batch_size, bindings=bindings, stream_handle=stream.handle
)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
return [out.host for out in outputs]
# TensorRT logger singleton
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_engine_path = "lpd.trt"
trt_runtime = trt.Runtime(TRT_LOGGER)
# pdb.set_trace()
trt_engine = load_engine(trt_runtime, trt_engine_path)
# Execution context is needed for inference
context = trt_engine.create_execution_context()
# This allocates memory for network inputs/outputs on both CPU and GPU
inputs, outputs, bindings, stream = allocate_buffers(trt_engine)
# pdb.set_trace()
image = cv2.imread("car.jpg")
image = cv2.resize(image, (640, 480))
np.copyto(inputs[0].host, image.ravel())
outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
test1 = np.reshape(outputs[0], (4, 30, 40))
test2 = np.reshape(outputs[1], (1, 30, 40))
print(outputs)
Exporting the tensorrt engine and then running this code works. I am just unable to interpret this output. The output is supposedly a
40x30x12 bbox coordinate tensor and 40x30x3 class confidence tensor.
So I then reshape it into these dimensions but the bbox coordinate tensor has decimals which I can’t convert to pixel coords.
I hope this is relevant to the repo - I think it could be a nice addition once it works.
Issue Analytics
- State:
- Created 2 years ago
- Comments:15 (7 by maintainers)
Top Related StackOverflow Question
As to postprocessing, refer again to DetectNet code in jetson_inference repo: https://github.com/dusty-nv/jetson-inference/blob/19ed62150b3e9499bad2ed6be1960dd38002bb7d/c/detectNet.cpp#L813-L861
Output tensor 0 should be “conf”, while output tensor 1 “bbox”. So I think you should do something like:
Where “confs” are confidence scores of the 3 classes for each potential detection, and “bboxes” are coordinates of the potential detections. More specifically, “bboxes” are “(x1, y1, x2, y2)” (or (“Left”, “Top”, “Right”, “Bottom”)) coordinates ranged from 0.0 to 1.0. You’ll need to multiply them with image width and height, e.g. 640 and 480, to get the pixel coordinates on the original image.
I hope this helps and you’ll be able to fix the code by yourself.
Yeah, I also noticed that but the dimensions of the outputs only make sense the way it is now.
If the model is for tensorrt does that mean it will work with jetson_inference? From the models documentation:
But thank you for your time. I’ve managed to train a yolov4 license plate detector in the mean time that works sufficiently and will work with this library.