Grid_sample operator is in CPU #5825

ruolinsss · 2023-12-25T10:53:53Z

ruolinsss
Dec 25, 2023

Question

I am converting a simple pytorch model including grid_sample operator into onnx and then into TRT. I've successfully executed the conversion to both ONNX and TensorRT. However, the runtime in both ONNX and TensorRT is notably lengthy. In ONNX, when employing the CUDAExecutionProvider, I encountered warnings stating, 'Some nodes were not assigned to the preferred execution providers, which may or may not have a negative impact on performance.' And in TensorRT, I observed that the grid_sample operator is executed on the CPU. Could there be any issues with my conversion to ONNX? Alternatively, could I be using an incorrect package version?

  import torch
  import torch.nn.functional as F
  from torch import nn
  import onnx
  import onnxruntime as ort
  
  class GridSampleTEST(nn.Module):
      def __init__(self):
          super().__init__()
  
      def forward(self, input_teatures,transform_matrix):
          feature = self.get_shifted_feature(input_teatures,transform_matrix)
          return feature
  
      def get_shifted_feature(self,input,transform_matrix):
          bs, t, c, h, w = input.shape
          xs = torch.linspace(
              0, w - 1, w, dtype=input.dtype,
              device=input.device).view(1, w).expand(h, w)
          ys = torch.linspace(
              0, h - 1, h, dtype=input.dtype,
              device=input.device).view(h, 1).expand(h, w)
          grid = torch.stack((ys, xs, torch.ones_like(xs), torch.ones_like(xs)), 2)
          grid = grid.view(1, 1, h, w, 4).expand(bs, t, h, w, 4).view(bs*t, h*w, 4).to(torch.float32)
          transform_matrix = transform_matrix.reshape(bs*t,transform_matrix.shape[2],transform_matrix.shape[3]).to(torch.float32)
  
          shifted_grid = torch.bmm(transform_matrix,grid.permute(0,2,1))[:,:2,:].permute(0,2,1).view(bs*t,h,w,2)
          normalize_factor = torch.tensor([h - 1, w - 1],
                                          dtype=input.dtype,
                                          device=input.device)
          shifted_grid = shifted_grid / normalize_factor.view(1, 1, 1, 2) * 2.0 - 1.0
          shifted_feature = F.grid_sample(input.view(bs*t,c,h,w), shifted_grid.flip(-1), align_corners=False)
          return shifted_feature.view(bs, t, c, h, w)
  
  input_names = [
        'input', 'transform_matrix'
    ]
  output_names = ['feature']
  model = GridSampleTEST()
  
  feature = torch.ones((1,1,100,100,100))
  transform_matrix = torch.ones((1,1,4,4))
  
  with torch.no_grad():
      torch.onnx.export(
          model, (feature,transform_matrix.to(torch.float32)),
          'grid_sample_test.onnx',
          opset_version=16,
          do_constant_folding=False,
          verbose=False,
          input_names=input_names,
          output_names=output_names)

Further information

torch 1.12.1+cu113
onnx 1.13.1
onnxruntime-gpu 1.13.1
cuda. 11.4
opset 16

Answered by liqunfu

Dec 27, 2023

@ruolinsss thank you for reporting this issue. Cuda implementation of GridSample will be implemented in onnxruntime.

View full answer

liqunfu · 2023-12-27T08:06:49Z

liqunfu
Dec 27, 2023
Maintainer

@ruolinsss thank you for reporting this issue. Cuda implementation of GridSample will be implemented in onnxruntime.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grid_sample operator is in CPU #5825

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Grid_sample operator is in CPU #5825

ruolinsss Dec 25, 2023

Question

Further information

Replies: 1 comment

liqunfu Dec 27, 2023 Maintainer

ruolinsss
Dec 25, 2023

liqunfu
Dec 27, 2023
Maintainer