RE: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support

"Xiong, Jianxin" <jianxin.xiong@xxxxxxxxx> · Tue, 24 Nov 2020 18:45:06 +0000

> -----Original Message-----
> From: Daniel Vetter <daniel@xxxxxxxx>
> Sent: Tuesday, November 24, 2020 7:17 AM
> To: Jason Gunthorpe <jgg@xxxxxxxx>
> Cc: Xiong, Jianxin <jianxin.xiong@xxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; dri-
> devel@xxxxxxxxxxxxxxxxxxxxx; Doug Ledford <dledford@xxxxxxxxxx>; Vetter, Daniel <daniel.vetter@xxxxxxxxx>; Christian Koenig
> <christian.koenig@xxxxxxx>
> Subject: Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support
> 
> On Mon, Nov 23, 2020 at 02:05:04PM -0400, Jason Gunthorpe wrote:
> > On Mon, Nov 23, 2020 at 09:53:02AM -0800, Jianxin Xiong wrote:
> >
> > > +cdef class DmaBuf:
> > > +    def __init__(self, size, unit=0):
> > > +        """
> > > +        Allocate DmaBuf object from a GPU device. This is done through the
> > > +        DRI device interface (/dev/dri/card*). Usually this
> > > +requires the
> 
> Please use /dev/dri/renderD* instead. That's the interface meant for unpriviledged rendering access. card* is the legacy interface with
> backwards compat galore, don't use.
> 
> Specifically if you do this on a gpu which also has display (maybe some testing on a local developer machine, no idea ...) then you mess with
> compositors and stuff.
> 
> Also wherever you copied this from, please also educate those teams that using /dev/dri/card* for rendering stuff is a Bad Idea (tm)

/dev/dri/renderD* is not always available (e.g. for many iGPUs) and doesn't support
mode setting commands (including dumb_buf). The original intention here is to
have something to support the new tests added, not for general compute. 

> 
> > > +        effective user id being root or being a member of the 'video' group.
> > > +        :param size: The size (in number of bytes) of the buffer.
> > > +        :param unit: The unit number of the GPU to allocate the buffer from.
> > > +        :return: The newly created DmaBuf object on success.
> > > +        """
> > > +        self.dmabuf_mrs = weakref.WeakSet()
> > > +        self.dri_fd = open('/dev/dri/card'+str(unit), O_RDWR)
> > > +
> > > +        args = bytearray(32)
> > > +        pack_into('=iiiiiiq', args, 0, 1, size, 8, 0, 0, 0, 0)
> > > +        ioctl(self.dri_fd, DRM_IOCTL_MODE_CREATE_DUMB, args)
> > > +        a, b, c, d, self.handle, e, self.size = unpack('=iiiiiiq',
> > > + args)
> 
> Yeah no, don't allocate render buffers with create_dumb. Every time this comes up I'm wondering whether we should just completely
> disable dma-buf operations on these. Dumb buffers are explicitly only for software rendering for display purposes when the gpu userspace
> stack isn't fully running yet, aka boot splash.
> 
> And yes I know there's endless amounts of abuse of that stuff floating around, especially on arm-soc/android systems.

One alternative is to use the GEM_CREATE method which can be done via the renderD*
device, but the command is vendor specific, so the logic is a little bit more complex. 

> 
> > > +
> > > +        args = bytearray(12)
> > > +        pack_into('=iii', args, 0, self.handle, O_RDWR, 0)
> > > +        ioctl(self.dri_fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, args)
> > > +        a, b, self.fd = unpack('=iii', args)
> > > +
> > > +        args = bytearray(16)
> > > +        pack_into('=iiq', args, 0, self.handle, 0, 0)
> > > +        ioctl(self.dri_fd, DRM_IOCTL_MODE_MAP_DUMB, args);
> > > +        a, b, self.map_offset = unpack('=iiq', args);
> >
> > Wow, OK
> >
> > Is it worth using ctypes here instead? Can you at least add a comment
> > before each pack specifying the 'struct XXX' this is following?
> >
> > Does this work with normal Intel GPUs, like in a Laptop? AMD too?
> >
> > Christian, I would be very happy to hear from you that this entire
> > work is good for AMD as well
> 
> I think the smallest generic interface for allocating gpu buffers which are more useful than the stuff you get from CREATE_DUMB is gbm.
> That's used by compositors to get bare metal opengl going on linux. Ofc Android has gralloc for the same purpose, and cros has minigbm
> (which isn't the same as gbm at all). So not so cool.

Again, would the "renderD* + GEM_CREATE" combination be an acceptable alternative? 
That would be much simpler than going with gbm and less dependency in setting up
the testing evrionment.

> 
> The other generic option is using vulkan, which works directly on bare metal (without a compositor or anything running), and is cross vendor.
> So cool, except not used for compute, which is generally the thing you want if you have an rdma card.
> 
> Both gbm-egl/opengl and vulkan have extensions to hand you a dma-buf back, properly.
> 
> Compute is the worst, because opencl is widely considered a mistake (maybe opencl 3 is better, but nvidia is stuck on 1.2). The actually used
> stuff is cuda (nvidia-only), rocm (amd-only) and now with intel also playing we have xe (intel-only).
> 
> It's pretty glorious :-/
> 
> Also I think we discussed this already, but for actual p2p the intel patches aren't in upstream yet. We have some internally, but with very
> broken locking (in the process of getting fixed up, but it's taking time).
> 
> Cheers, Daniel
> 
> > Edward should look through this, but I'm glad to see something like
> > this
> >
> > Thanks,
> > Jason
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch