Re: [PATCH 0/8] Introduce fwctl subystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 05, 2024 at 09:56:14PM -0700, Dan Williams wrote:

> > If people come and say we need X and the maintainer says no, they
> > don't just give up and stop doing X, the go and do X anyhow out of
> > tree. This has become especially true now that the center of business
> > activity in server-Linux is driven by the hyperscale crowd that don't
> > care much about upstream.
> 
> "...don't care much about upstream...". This could be a whole separate
> thread unto itself.

Heh, it is a topic, but perhaps not one for polite company :)

> > Linux maintainer's don't actually have the power to force the industry
> > to do things, though people do keep trying.. Maintainers can only
> > lead, and productive leading is not done with a NO.
> > 
> > You will start to see this pain in maybe 5-10 years if CXL starts to
> > be something deployed in an enterprise RedHat/Dell/etc sort of
> > environment. Then that missing X becomes a critical issue because it
> > turns out the hyperscale folks long since figured out it is really
> > important but didn't do anything to enable it upstream.
> 
> This matches other feedback I have heard recently. Yes, distros hate
> contending with every vendor's userspace toolkit, that was the
> original

I'm not sure that is 100% true. Sure nobody likes that you have to
type 'abc X' and 'def Y' to do a similar thing, but from a distro
perpective if abc and def are both open sourced and packaged in the
distro it is still a far better outcome than users doing OOT drivers
and binary-only tools.

eg one of the long standing main Mellanox tools that is being ported
to fwctl is open source and in all distros:

 https://rpmfind.net/linux/rpm2html/search.php?query=mstflint

Projects have already experimented building tooling on top of it to
make a more cross-vendor experience in some areas.

In my view it is wrong to think the kernel is the only place we can
make generic things or that allowing userspace to see the raw device
interface immediately means fragmentation and chaos. The industry is
more robust than that. Giving people working in userspace room to
invent their own solutions is actually helpful to driving some
commonality. There are already soft targets in the K8S that people
need to fit into, if the first few steps are with abc/def tools and
that brings us to an eventual true commonality, then great.

> distro feedback motivating CONFIG_CXL_MEM_RAW_COMMANDS to have a poison
> pill of WARN() on use. However, allowing more vendor commands is more
> preferable than contending with vendor out-of-tree drivers that likely
> help keep the enterprise-distro-kernel stable-ABI train rolling. In
> other words, legalize it in order to centrally regulate it.

I also liked Jakub's idea of putting a taint in for things that were
likely to have an impact on support and debug, I included that concept
in fwctl.

> > >   Effects Log". In that "trust Command Effects" scenario the kernel still
> > >   has no idea what the command is actually doing, but it can at least
> > >   assert that the device does not claim that the command changes the
> > >   contents of system-memory. Now, you might say, "the device can just
> > >   lie", but that betrays a conceit of the kernel restriction. A device
> > >   could lie that a Linux wrapped command when passed certain payloads does
> > >   not in turn proxy to a restricted command.
> > 
> > Yeah, we have to trust the device. If the device is hostile toward the
> > OS then there are already big problems. We need to allow for
> > unintentional defects in the devices, but we don't need to be
> > paranoid.
> > 
> > IMHO a command effects report, in conjunction with a robust OS centric
> > defintion is something we can trust in.
> 
> So this is where I want to start and see if we can bridge the trust gap.
> 
> I am warming to your assertion that there is a wide array of
> vendor-specific configuration and debug that are not an efficient use of
> upstream's time to wrap in a shared Linux ABI. I want to explore fwctl
> for CXL for that use case, I personally don't want to marshal a Linux
> command to each vendor's slightly different backend CXL toggles.

Personally I think this idea to marshal/unmarshal everything in the
kernel is often misguided. If it is truely obvious and actually shared
multi-vendor capability then by all means go and do it.

But if you are spending weeks/months fighting about uAPI because all
the vendors are so different, it isn't obvious what is "generic" then
you've probably already lost. The very worst outcome is a per-device
uAPI masquerading as an obfuscated "generic" uAPI that wasted ages of
peoples time to argue out.

> At the same time, I also agree with the contention that a "do anything
> you want and get away with it" tunnel invites shenanigans from folks
> that may not care about the long term health of the Linux kernel vs
> their short term interests.

IMHO this is disproven by history. The above mstflint I linked to is
as old as as mlx5 HW, it runs today over PCI config space and an OOT
driver. Where is real the damage to the long term health of Linux or
the ecosystem?

Like I said before I view there is a difference between DRM wanting a
Vulkan stack and doing some device specific
configuration/debugging. One has vastly more open source value than
the other.

> So my questions to try to understand the specific sticking points more
> are:
> 
> 1/ Can you think of a Command Effect that the device could enumerate to
> address the specific shenanigan's that netdev is worried about? 

Nothing comes to mind..

> In other words if every command a device enables has the stated
> effect of "Configuration Change after Reset" does that cut out a
> significant portion of the concern?

Related to configuration - one of Saeed's oringinal ideas was to
implement a devlink command to set the configurables in the flash in a
way that mlx5 could implement all of its options, ideally with
configurables discovered dynamically from the running device. This LPC
presentation was so agressively rejected by Jakub that Saeed abandoned
it. In the discussion it was clear Jakub is requesting to review and
possibly reject every configurable.

On this topic, unfortunately, I don't see any technical middle ground
between "netdev is the gatekeeper for all FLASH configurables" and
"devices can be fully configured regardless of their design".

> 2/ About the "what if the device lies?" question. We can't revert code
> that used to work, but we can definitely work with enterprise distros to
> turn off fwctl where there is concern it may lead or is leading to
> shenanigans. 

Security is the one place where Linus has tolerated userspace
regressions. In this specific case I documented (or at least that was
the intent) there would be regression consequences to breaking the
security rules. Commands can be retroactively restricted to higher CAP
levels and rejected from lockdown if the device attracts a CVE.

IMHO the ecosystem is strongly motived to do security seriously these
days, I am not so worried.

> So, document what each subsystem's stance towards fwctl is,
> like maybe a distro only wants fwctl to front publicly documented vendor
> commands, or maybe private vendor commands ok, but only with a
> constrained set of Command Effects (I potentially see CXL here). 

I wouldn't say subsystem here, but techonology. I think it is
reasonable that a CXL fwctl driver have some kconfig tunables like you
already have. This idea works alot better if the underlying thing is
already standards based.

Linux subsystem isn't a meaningful concept for a multi-function device
like mlx5 and others.

Thanks,
Jason




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux