PyTorch 모델 을 TensorRT 로 돌 리 는 것 은 어떻게 이 루어 집 니까?

14777 단어 PyTorch 모형.TensorRT

전환 절차 개관
모델 정의 파일 준비(.py 파일)

훈련 이 완 료 된 가중치 파일(.pth 또는.pth.tar)을 준비 합 니 다

onnx 와 onnxruntime 을 설치 합 니 다

훈련 된 모델 을.onx 형식 으로 전환한다

tensorRT 설치환경 매개 변수


ubuntu-18.04
PyTorch-1.8.1
onnx-1.9.0
onnxruntime-1.7.2
cuda-11.1
cudnn-8.2.0
TensorRT-7.2.3.4

PyTorch ONNX
Step 1:ONNX 와 ONNXRUNTIME 설치
인터넷 에서 찾 은 설치 방식 은 pip 를 통 해


pip install onnx
pip install onnxruntime

Anaconda 환경 을 사용한다 면 conda 설치 도 가능 합 니 다.


conda install -c conda-forge onnx
conda install -c conda-forge onnxruntime

2 단계:netron 설치
netron 은 시각 화 된 네트워크 구조 로 debug 에 편리 합 니 다.


pip install netron

Step 3 PyTorch ONNX
설치 가 완료 되면 아래 코드 에 따라 변환 할 수 있 습 니 다.


#--*-- coding:utf-8 --*--
import onnx 
#       onnx    torch    ，     segmentation fault
import torch
import torchvision 

from model import Net

model= Net(args).cuda()#     
checkpoint = torch.load(checkpoint_path)
net.load_state_dict(checkpoint['state_dict'])#          
print ("Model and weights LOADED successfully")

export_onnx_file = './net.onnx'
x = torch.onnx.export(net,
					torch.randn(1,1,224,224,device='cuda'), #           dummy input
					export_onnx_file,
					verbose=False, #             
					input_names = ["inputs"]+["params_%d"%i for i in range(120)],#       ，        list，list                ，      
					output_names = ["outputs"],#        
					opset_version  = 10,#onnx      operator set,    pytorch    
					do_constant_folding = True,
					dynamic_axes = {"inputs":{0:"batch_size"}, 2:"h", 3:"w"}, "outputs":{0: "batch_size"},})

net = onnx.load('./erfnet.onnx') #  onnx    
onnx.checker.check_model(net) #           
onnx.helper.printable_graph(net.graph) #  onnx

dynamic_axes 는 입력,출력 의 가 변 차원 을 지정 하 는 데 사 용 됩 니 다.입 출력 batchsize 는 여기 서 모두 가 변 으로 설정 되 어 있 으 며,입력 한 2,3 차원 도 가 변 으로 설정 되 어 있 습 니 다.
단계 4:ONNX 모델 검증
아래 시각 화 된 onnx 모델 과 함께 모델 이 올 바 르 게 작 동 하 는 지 테스트 합 니 다.


import netron
import onnxruntime
import numpy as np
from PIL import Image
import cv2

netron.start('./net.onnx')
test_image = np.asarray(Image.open(test_image_path).convert('L'),dtype='float32') /255.
test_image = cv2.resize(np.array(test_image),(224,224),interpolation = cv2.INTER_CUBIC)
test_image = test_image[np.newaxis,np.newaxis,:,:]
session = onnxruntime.InferenceSession('./net.onnx')
outputs = session.run(None, {"inputs": test_image})
print(len(outputs))
print(outputs[0].shape)
#        outputs[0],        ，

ONNX 회전 TensorRT
Step 1:NVIDIA 에서 TensorRT 다운로드 설치 패키지https://developer.nvidia.com/tensorrt
자신의 cuda 버 전 선택 에 따라 저 는 TensorRT 7.2.3 을 선택 하여 로 컬 에 다운로드 하 였 습 니 다.


cd download_path
dpkg -i nv-tensorrt-repo-ubuntu1804-cuda11.1-trt7.2.3.4-ga-20210226_1-1_amd64.deb
sudo apt-get update
sudo apt-get install tensorrt

NVIDIA 의 공식 설치 튜 토리 얼 을 찾 아 보 았 습 니 다https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#installTensorRT Python API 를 호출 해 야 할 수도 있 기 때문에 PyCUDA 를 먼저 설치 해 야 합 니 다.여기 피 큐 다 설치 부터 해 주세요.


pip install 'pycuda<2021.1'

어떤 문제 에 부 딪 히 더 라 도 공식 설명 을 참고 하 세 요.
Python 3.X 를 사용한다 면 다음 설 치 를 실행 하 십시오.


sudo apt-get install python3-libnvinfer-dev

ONNX graphsurgeon 이 필요 하거나 Python 모듈 을 사용 하려 면 다음 명령 을 실행 해 야 합 니 다.


sudo apt-get install onnx-graphsurgeon

설치 성공 여 부 를 검증 합 니 다.


dpkg -l | grep TensorRT

위의 그림 과 비슷 한 결 과 를 얻 은 것 은 설치 에 성공 한 것 이다.
문제:이 때 python 에서 import tensort,Module NotFoundation Error:No module named'tensort'의 오류 메 시 지 를 받 았 습 니 다.
인터넷 에서 찾 아 봤 는데 dpkg 을 통 해 설 치 된 tensort 는 Anaconda 환경의 python 이 아 닌 시스템 python 에 기본적으로 설치 되 어 있 습 니 다.시스템 의 기본 python 은 3.6 이 고 Anaconda 에 서 는 3.8.8 을 사용 하기 때문에 export PYTHONPATH 방식 을 통 해 python 버 전이 일치 하지 않 는 문제 가 발생 할 수 있 습 니 다.
anaconda 환경 에 tensor RT 를 설치 하 는 방법 을 다시 검색 해 봤 습 니 다.


pip3 install --upgrade setuptools pip
pip install nvidia-pyindex
pip install nvidia-tensorrt

Anconda 환경의 python 이 import tensort 를 사용 할 수 있 는 지 확인 하 십시오.


import tensorrt
print(tensorrt.__version__)
#  8.0.0.3

2 단계:ONNX 회전 TensorRT
먼저 말씀 드 리 지만,이 단계 에서***AttributeError:'tensort.tensort.Builder'object has no attribute'max 를 만 났 습 니 다.workspace_size'의 오류 메시지.인터넷 에서 찾 아 봤 는데 8.0.0.3 버 전의 bug 로 7.2.3.4 로 되 돌아 가 야 합 니 다.
emmm…


pip unintall nvidia-tensorrt #  8.0.0.3     
pip install nvidia-tensorrt==7.2.* --index-url https://pypi.ngc.nvidia.com #   7.2.3.4banben

변환 코드


import pycuda.autoinit 
import pycuda.driver as cuda
import tensorrt as trt
import torch 
import time 
from PIL import Image
import cv2,os
import torchvision 
import numpy as np
from scipy.special import softmax

### get_img_np_nchw h postprocess_the_output          

TRT_LOGGER = trt.Logger()

def get_img_np_nchw(img_path):
	img = Image.open(img_path).convert('L')
	img = np.asarray(img, dtype='float32')
	img = cv2.resize(np.array(img),(224, 224), interpolation = cv2.INTER_CUBIC)
	img = img / 255.
	img = img[np.newaxis, np.newaxis]
	return image
class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        """host_mom  cpu  ，device_mem  GPU  
        """
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:
" + str(self.host) + "
Device:
" + str(self.device)

    def __repr__(self):
        return self.__str__()

def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream

def get_engine(max_batch_size=1, onnx_file_path="", engine_file_path="",fp16_mode=False, int8_mode=False,save_engine=False):
    """
    params max_batch_size:                 
    params onnx_file_path:      onnx    
    params engine_file_path:                  
    params fp16_mode:               FP16
    params int8_mode:               INT8
    params save_engine:               
    returns:                    ICudaEngine
    """
    #               ，         cudaEngine
    if os.path.exists(engine_file_path):
        print("Reading engine from file: {}".format(engine_file_path))
        with open(engine_file_path, 'rb') as f, \
            trt.Runtime(TRT_LOGGER) as runtime:
            return runtime.deserialize_cuda_engine(f.read())  #     
    else:  #  onnx  cudaEngine
        
        #   logger    builder 
        # builder        INetworkDefinition
        explicit_batch = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
        # In TensorRT 7.0, the ONNX parser only supports full-dimensions mode, meaning that your network definition must be created with the explicitBatch flag set. For more information, see Working With Dynamic Shapes.

        with trt.Builder(TRT_LOGGER) as builder, \
            builder.create_network(explicit_batch) as network,  \
            trt.OnnxParser(network, TRT_LOGGER) as parser, \
            builder.create_builder_config() as config: #   onnx         ，            
            profile = builder.create_optimization_profile()
            profile.set_shape("inputs", (1, 1, 224, 224),(1,1,224,224),(1,1,224,224))
            config.add_optimization_profile(profile)

            config.max_workspace_size = 1<<30  #            , ICudaEngine   GPU       
            builder.max_batch_size = max_batch_size #           batchsize
            builder.fp16_mode = fp16_mode
            builder.int8_mode = int8_mode

            if int8_mode:
                # To be updated
                raise NotImplementedError

            #   onnx  ，     
            if not os.path.exists(onnx_file_path):
                quit("ONNX file {} not found!".format(onnx_file_path))
            print('loading onnx file from path {} ...'.format(onnx_file_path))
            # with open(onnx_file_path, 'rb') as model: #            
            #     print("Begining onnx file parsing")
            #     parser.parse(model.read())  #   onnx  
            parser.parse_from_file(onnx_file_path) # parser         onnx   

            print("Completed parsing of onnx file")
            #         ，   builder       CudaEngine
            print("Building an engine from file{}' this may take a while...".format(onnx_file_path))

            #################
            # import pdb;pdb.set_trace()
            print(network.get_layer(network.num_layers-1).get_output(0).shape)
            # network.mark_output(network.get_layer(network.num_layers -1).get_output(0))
            engine = builder.build_engine(network,config)  #   ，   network INetworkDefinition  ，        
            print("Completed creating Engine")
            if save_engine:  #  engine           
                with open(engine_file_path, 'wb') as f:
                    f.write(engine.serialize())  #    
            return engine

def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    # Transfer data from CPU to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.
    return [out.host for out in outputs]

def postprocess_the_outputs(outputs, shape_of_output):
    outputs = outputs.reshape(*shape_of_output)
    out = np.argmax(softmax(outputs,axis=1)[0,...],axis=0)
    # import pdb;pdb.set_trace()
    return out
#   TensorRT      
onnx_model_path = './Net.onnx'
max_batch_size = 1
# These two modes are dependent on hardwares
fp16_mode = False
int8_mode = False
trt_engine_path = './model_fp16_{}_int8_{}.trt'.format(fp16_mode, int8_mode)
# Build an engine
engine = get_engine(max_batch_size, onnx_model_path, trt_engine_path, fp16_mode, int8_mode , save_engine=True)
# Create the context for this engine
context = engine.create_execution_context()
# Allocate buffers for input and output
inputs, outputs, bindings, stream = allocate_buffers(engine)  # input, output: host # bindings

# Do inference
img_np_nchw = get_img_np_nchw(img_path)
inputs[0].host = img_np_nchw.reshape(-1)
shape_of_output = (max_batch_size, 2, 224, 224)

# inputs[1].host = ... for multiple input
t1 = time.time()
trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream) # numpy data
t2 = time.time()
feat = postprocess_the_outputs(trt_outputs[0], shape_of_output)

print('TensorRT ok')
print("Inference time with the TensorRT engine: {}".format(t2-t1))

4https://wiki.tiker.net/PyCuda/Installation/Linux/#step-1-download-and-unpack-pycuda문장의 방법 에 따라 전환 할 때 다음 과 같은 오 류 를 보고 합 니 다.
在这里插入图片描述

원래 저 는 링크 에 있 는 대리 구 매 에 따라 전환 을 했 는데 나중에 수정 을 했 습 니 다.제 글 의 변환 코드 에 따라 문제 가 없 을 것 입 니 다.
수 정 된 부분 은:


with trt.Builder(TRT_LOGGER) as builder, \
            builder.create_network(explicit_batch) as network,  \
            trt.OnnxParser(network, TRT_LOGGER) as parser, \
            builder.create_builder_config() as config: #   onnx         ，            
            profile = builder.create_optimization_profile()
            profile.set_shape("inputs", (1, 1, 224, 224),(1,1,224,224),(1,1,224,224))
            config.add_optimization_profile(profile)

            config.max_workspace_size = 1<<30  #            , ICudaEngine   GPU       
            engine = builder.build_engine(network,config)

링크 에 해당 하 는 코드 를 수정 하거나 추가 하면 문제 가 없습니다.
PyTorch 모델 의 TensorRT 전환 은 어떻게 이 루어 졌 습 니까?이 글 은 여기까지 소개 되 었 습 니 다.더 많은 PyTorch 모델 이 TensorRT 로 전 환 된 내용 은 저희 의 이전 글 을 검색 하거나 아래 의 관련 글 을 계속 조회 하 시기 바 랍 니 다.앞으로 도 많은 응원 부 탁 드 리 겠 습 니 다!

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

IceVision에서 형식별 데이터를 읽는 방법

2021년에 가장 멋있는 물체 검출 프레임워크라고 해도 과언이 아닌 IceVision을 사용해, VOC format과 COCO format의 데이터 세트에 대해 Object Detection을 간단하게 실시하기 위한...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

python csv 기본 작업 요약

Python 스 크 립 트 자동 로그 인 캠퍼스 네트워크 구현

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다