Re: [MAINTAINERS SUMMIT] Device Passthrough Considered Harmful?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2024-07-08 at 15:26 -0700, Dan Williams wrote:
> Early in my Linux career there was palpable concern around Linux
> being locked out of future computing platforms by hardware vendors
> who did not provide open drivers, or even documentation for their
> hardware. For the hardware vendors that did participate upstream,
> maintainers used code acceptance to influence them towards common
> Linux commands and cross-vendor cooperation.

Firstly could I say "passthrough" seems to be the wrong word here. 
When I see it I think of device pass through from host to VM (SRIOV and
the like), which is becoming the bedrock of the virtualization world. 
However, this proposal seems to be more about user space device drivers
and fat firmware cards.

> The internalized lesson from those days was: "Be wary of vendors
> pushing 'do anything you want and get away with it' passthrough
> tunnels. Demand open documentation of all interfaces."
> 
> Present day realities and discussions merit revisiting that lesson:
> 
> 1/ The truth of the matter is that until the Kernel Lockdown facility
>    arrived, device vendors *had* an unfettered passthrough tunnel via
>    userspace driver mechanisms like /dev/mem and pci-sysfs. The
> presence of
>    those facilities did not appear to injure the ascension of Linux.
> 
> 2/ Device passthrough, kernel passing opaque payloads, is already
> taken
>    for granted in many subsystems. USB and HID have "raw" interfaces,
> EFI
>    variables provide platform-specific configuration, and the oft-
> cited
>    examples of SCSI and NVME that provide facilities to marshal any
> command
>    payload whether mainline maintainers think the functionality is a
> good
>    idea or not. In the case of NVME, the specification continues to
> evolve
>    despite this Linux bypass.

Time was decades ago Oracle demanded raw access to the SCSI device
because they claimed it was easier for customers and faster for them if
they just talked to devices in their native protocol and got all of the
annoying kernel filesystems and page cache out of the way.  Fast
fowards to today and database vendors largely use filesystems thanks to
the evolution of interfaces (direct I/O) that support what they want to
do and the huge annoyance for customers of having to manage huge
numbers of unidentifiable raw devices.

For NVMe and net we do have SPDK and DPDK.  What I find is that people
tend to use them for niche use cases (like the NVMe KV command set) or
obscure network routers.  Even though the claim they both make is to
get the kernel out of the way and do stuff "way faster" the difficulty
they create by bypassing everything is quite a high burden.

For USB security tokens in the early days we had the huge problem of
everyone inventing their own interface, then they realised this was
unsustainable and came up with CTAP, but it's just a unified way for
user space applications to talk to FIDO tokens over raw USB ... is this
a problem?

> 
> 3/ The practice of requiring Linux commands to wrap all device
> commands
>    does not appear to have accelerated upstream participation in the
> CXL
>    subsystem. I.e. CXL, in contrast to NVME, relegates passthrough to
> a
>    build-time debug option. Some vendors are even shipping vendor
>    specific firmware update facilities even though mainline has
> support for
>    the CXL standard firmware update mechanism.
> 
>    With the impending arrival of CXL switch devices wanting to share
>    mailbox handling code with the CXL core, the prohibition of
>    device-specific commands is going to generate significant upstream
> work
>    to wrap all that in Linux commands with little perceivable long
> term
>    benefit to the subsystem.
> 
> CXL and RDMA are also foreshadowing conflicts across subsystems. It
> is not difficult to imagine a future CXL or RDMA device that supports
> mem, block, net, and drm/accel functionality. Which subsystem's
> device-command policy applies to such a thing?

We already have that today: pretty much every device protocol looks a
bit network like and has an Over Ethernet or Over RDMA equivalent.

What all of the prior pass through's taught us is that if the use case
is big enough it will get pulled into the kernel and the kernel will
usually manage it better (DB users).  If it remains a niche use case it
will likely remain out of the kernel, but we won't be hurt by it (NVME
KV protocol) and sometimes it doesn't really matter and the device
manufacturers will sort it out on their own (USB tokens).

> Enter the fwctl proposal [1]. From the CXL subsystem perspective it
> looks like a long-term solution to the problem of managing
> expectations between hardware vendors and mainline subsystems. It
> disclaims support for the fast-path (data-plane) and is targeted at
> the long tail of slow-path (config/debug plane) device-specific
> operations that are often uninteresting to mainline. It sets
> expectations that the device must advertise the effect of all
> commands so that the kernel can deploy reasonable Kernel Lockdown
> policy, or otherwise require CAP_SYS_RAWIO for commands that may
> affect user-data. It sets common expectations for device designers,
> distribution maintainers, and kernel developers. It is complimentary
> to the Linux-command path for operations that need
> deeper kernel coordination.

This proposal does look to me more like a tool for configuring highly
malleable fat firmware (or really mini-os) offload type devices (like
intelligent network cards) to interact correctly and be easier to
debug.  Every cloud vendor effectively has their own one of these, so I
think the problem isn't going away, so trying to bring some order to it
looks like a potentially good idea.

> The upstream discussion has yielded the full spectrum of positions on
> device specific functionality, and it is a topic that needs cross-
> kernel consensus as hardware increasingly spans cross-subsystem
> concerns. Please consider it for a Maintainers Summit discussion.

I'm with Greg on this ... can you point to some of the contrary
positions?

Regards,

James





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux