Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface

Christoph Lameter <cl@xxxxxxxxx> · Mon, 25 Jul 2016 11:30:48 -0500 (CDT)

On Thu, 21 Jul 2016, Jason Gunthorpe wrote:

> > Ok why would strace check a filehandle in the first place? The descriptor
> > is the filehandle and you can simply find the operation that created that
> > file descriptor to find the device it refers to.
>
> strace is stateless and can attach to a running process, it can't
> watch for open() to figure things out. This is also why it doesn't
> inspect the filehandle...

Well you can still lookup in the file handle in /proc/pid/.... if you want
that. Not sure why you are so focused on this.

> We don't *need* strace to work, but it should would be nice :|

It *is* nice. And it works fine for devices. Lets ensure that devices
are used in a standard way in the IB subsystem so that we can take full
advantage of the syscall infrastructure and the standard system calls.

> > We could easily do that following naming conventions for partitions or so.
> > Why would doing so damage the API capabilities? Seems that they are
> > sufficiently screwed up already. Cleaning that up could help quite a bit.
>
> The current API is problematic because we try to both be like netdev
> in that all devices are accessible (rdma_cm) and at the same with with
> individual per-device chardevs (uverbs0).

Device? uverbs is not a device. A particular connectx3 connected to the
pci bus is. And it should follow establish naming conventions etc. Please
lets drop the crap that is there now. If you use the notion of a device
the way it is designed to then we would have less issues.

> So, if you want to move fully to the per-char-dev model then I think
> we'd give up the global netdev like behaviors, things like
> listen(0.0.0) and output route selection, and so forth. I doubt there
> is any support for that.

Can the official listen() syscall be made to work over
infiniband devices? That would be best maybe?

I think in general one does the connection initiation via TCP and IP
protocol regardless... So really infiniband does only matter as the
underlying protocol over which we have imposed IP semantics via IPoIB.

> If we go the other way to a full netdev-like module then we give up
> fine grained (currently mildly broken) file system permissions.

Maybe go with a device semantic and not with full netdev because this is
not a classic packet based network.

> You haven't explained how we can mesh the rdma_cm, netdev-like
> listen(0.0.0.0) type semantics, continue to implement multi-port APM
> functionality, share PDs across ports, etc, etc. These are all the
> actual things done today that break when we drop the multiplexors.

I am not not *the* expert on this. Frankly this whole RDMA request stuff
is not that interesting. The basic thing that the RDMA API needs to do for
my use case is fast messaging bypassing the kernel. And having gazillion
of special ioctls on the site is not that productive. Can we please reuse
the standard system calls and ioctls as much as possible?

No idea what you mean by multiport "APMs". There is an obvius way to
aggreate devices by creating a new one like done in the storage subsystem.

Sharing PDs? Those are from the same address space using multiple devices.
It would be natural to share that in such a case since they are more bound
to the memory layout of a single process and not so much to the devices.
So PDs could be per process instead of per device.

> This isn't a simple API that is 1:1 tied to a single physical object,
> it is a sprawling thing with lots of built-in cross-device semantics. :(

Yes please simplify this sprawl as much as possible. Follow standard
convention instead of reinvention things like device aggregation.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html