Re: [MAINTAINERS SUMMIT] Device Passthrough Considered Harmful?

Ricardo Ribalda Delgado <ricardo.ribalda@xxxxxxxxx> · Mon, 22 Jul 2024 13:56:11 +0200

Hi Laurent

On Mon, Jul 22, 2024 at 1:18 PM Laurent Pinchart
<laurent.pinchart@xxxxxxxxxxxxxxxx> wrote:
>
> Hi Ricardo,
>
> On Mon, Jul 22, 2024 at 12:42:52PM +0200, Ricardo Ribalda Delgado wrote:
> > On Sun, Jul 21, 2024 at 9:25 PM Laurent Pinchart wrote:
> > > On Tue, Jul 09, 2024 at 03:15:13PM -0700, Dan Williams wrote:
> > > > James Bottomley wrote:
> > > > > > The upstream discussion has yielded the full spectrum of positions on
> > > > > > device specific functionality, and it is a topic that needs cross-
> > > > > > kernel consensus as hardware increasingly spans cross-subsystem
> > > > > > concerns. Please consider it for a Maintainers Summit discussion.
> > > > >
> > > > > I'm with Greg on this ... can you point to some of the contrary
> > > > > positions?
> > > >
> > > > This thread has that discussion:
> > > >
> > > > http://lore.kernel.org/0-v1-9912f1a11620+2a-fwctl_jgg@xxxxxxxxxx
> > > >
> > > > I do not want to speak for others on the saliency of their points, all I
> > > > can say is that the contrary positions have so far not moved me to drop
> > > > consideration of fwctl for CXL.
> > > >
> > > > Where CXL has a Command Effects Log that is a reasonable protocol for
> > > > making decisions about opaque command codes, and that CXL already has a
> > > > few years of experience with the commands that *do* need a Linux-command
> > > > wrapper.
> > > >
> > > > Some open questions from that thread are: what does it mean for the fate
> > > > of a proposal if one subsystem Acks the ABI and another Naks it for a
> > > > device that crosses subsystem functionality? Would a cynical hardware
> > > > response just lead to plumbing an NVME admin queue, or CXL mailbox to
> > > > get device-specific commands past another subsystem's objection?
> > >
> > > My default answer would be to trust the maintainers of the relevant
> > > subsystems (or try to convince them when you disagree :-)). Not only
> > > should they know the technical implications best, they should also have
> > > a good view of the whole vertical stack, and the implications of
> > > pass-through for their ecosystem. This may result in a single NAK
> > > overriding ACKs, but we could also try to find technical solutions when
> > > we'll face such issues, to enforce different sets of rules for the
> > > different functions of a device.
> > >
> > > Subsystem hopping is something we're recently noticed for camera ISPs,
> > > where a vendor wanted to move from V4L2 to DRM. Technical reasons for
> > > doing so were given, and they were (in my opinion) rather excuses. The
> > > unspoken real (again in my opinion) reason was to avoid documenting the
> > > firmware interface and ship userspace binary blobs with no way for free
> > > software to use all the device's features. That's something we have been
> > > fighting against for years, trying to convince vendors that they can
> > > provide better and more open camera support without the world
> > > collapsing, with increasing success recently. Saying amen to
> > > pass-through in this case would be a huge step back that would hurt
> > > users and the whole ecosystem in the short and long term.
> >
> > In my view, DRM is a more suitable model for complex ISPs than V4L2:
>
> I know we disagree on this topic :-) I'm sure we'll continue the
> conversation, but I think the technical discussion likely belongs to a
> different mail thread.
>
> > - Userspace Complexity: ISPs demand a highly complex and evolving API,
> > similar to Vulkan or OpenGL. Applications typically need a framework
> > like libcamera to utilize ISPs effectively, much like Mesa for
> > graphics cards.
> >
> > - Lack of Standardization: There's no universal standard for ISPs;
> > each vendor implements unique features and usage patterns. DRM
> > addresses this through vendor-specific IOCTLs
> >
> > - Proprietary Architectures: Vendors often don't fully disclose their
> > hardware architectures. DRM cleverly only necessitates a Mesa
> > implementation, not comprehensive documentation.
>
> This point isn't technical and is more on-topic for this mail thread.
>
> V4L2 doesn't require hundreds of pages of comprehensive documentation in
> text form. An open-source userspace implementation that covers the
> feature set exposed by the driver is acceptable in place of
> documentation (provided, of course, that the userspace code wouldn't be
> deliberately obfuscated). This is similar in spirit to the rule for GPU
> DRM drivers.

In DRM vendors typically define a custom IOCTL per driver to pass
command buffers.
Only the command buffer structure, and a mesa implementation using
that command buffer to support the standard features is required.

In V4l2 custom IOCTLs are discouraged. Random command buffers cannot
be passed from userspace, they are typically formed in the driver from
a strictly checked struct.

>
> > Our current approach of pushing back against vendors, instead of
> > seeking compromise, has resulted in the vast majority of the market
> > (99% if not more) relying on out-of-tree drivers. This leaves users
> > with no options for utilizing their cameras outside of Android.
> >
> > DRM allows a hybrid model, where:
> > - Open Source Foundation: Standard use cases are covered by a fully
> > open-source stack.
> > - Vendor Differentiation: Vendors retain the freedom to implement
> > proprietary features (e.g., automatic makeup) as closed source.
>
> V4L2 does as well, you can implement all kind of closed-source ISP
> control algorithms in userspace, as long as there's an open-source
> implementation that exercises the same hardware features. A good analogy

Is it really mandatory to have an open-source 3A algorithm? I thought
defining the input and output from the algorithm was good enough.
AFAIK for some time there was no ipu3 open source algorithm, and the
driver has been upstream.

> for people less familiar with ISPs is shader compilers, GPU vendors are
> free to ship closed-source implementations that include more
> optimizations, as long as the open-source, less optimized implementation
> covers the same GPU ISA, so that open-source developers can also work on
> optimizing it.

I believe a more accurate description is that in v4l2 is that we
expect that all the registers, device architecture and behaviour to be
documented and accessed with standard IOCTLs. Anything not documented
cannot be accessed by userspace.

In DRM their concern is that there is a fully open source
implementation that the user can use. Vendors have custom IOCTLs and
they can offer proprietary software for some use cases.

>
> Thinking that DRM would offer a free pass-through path compared to V4L2
> doesn't seem realistic to me. Both subsystems will have similar rules.

DRM does indeed allow vendors to pass random command buffers and they
will be sent to the hardware. We cannot do that in v4l2.

I might be wrong, but GPU drivers do not deeply inspect the command
buffers to make sure that they do not use any feature not covered by
mesa.

>
> > This approach would allow billions of users to access their hardware
> > more securely and with in-tree driver support. Our current stubborn
> > pursuit of an idealistic goal has already negatively impacted both
> > users and the ecosystem.
> >
> > The late wins, in my opinion, cannot scale to the consumer market, and
> > Linux will remain a niche market for ISPs.
> >
> > If such a hybrid model goes against Linux goals, this is something
> > that should be agreed upon by the whole community, so we have the same
> > criteria for all subsystems.
>
> --
> Regards,
>
> Laurent Pinchart

-- 
Ricardo Ribalda