Re: [PATCH v2 00/17] Solve iommu probe races around iommu_fwspec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-11-16 4:17 am, Jason Gunthorpe wrote:
On Wed, Nov 15, 2023 at 08:23:54PM +0000, Robin Murphy wrote:
On 2023-11-15 3:36 pm, Jason Gunthorpe wrote:
On Wed, Nov 15, 2023 at 03:22:09PM +0000, Robin Murphy wrote:
On 2023-11-15 2:05 pm, Jason Gunthorpe wrote:
[Several people have tested this now, so it is something that should sit in
linux-next for a while]

What's the aim here? This is obviously far, far too much for a
stable fix,

To fix the locking bug and ugly abuse of dev->iommu?

Fixing the locking can be achieved by fixing the locking, as I have now
demonstrated.

Obviously. I rejected that right away because of how incredibly
wrongly layered and hacky it is to do something like that.

What, and dressing up the fundamental layering violation by baking it even further into the API flow, while still not actually fixing it or any of its *other* symptoms, is somehow better?

Ultimately, this series is still basically doing the same thing my patch does - extending the scope of the existing iommu_probe_device_lock hack to cover fwspec creation. A hack is a hack, so frankly I'd rather it be simple and obvious and look like one, and being easy to remove again is an obvious bonus too.

I haven't seen patches or an outline on what you have in mind though?

In my view I would like to get rid of of_xlate(), at a minimum. It is
a micro-optimization I don't think we need. I see a pretty
straightforward path to get there from here.

Micro-optimisation!? OK, I think I have to say it. Please stop trying to
rewrite code you don't understand.

I understand it fine. The list of (fwnode_handle, of_phandle_args)
tuples doesn't change between when of_xlate is callled and when probe
is called. Probe can have the same list. As best I can tell the extra
ops avoids maybe some memory allocation, maybe an extra iteration.

What it does do is screw up alot of the drivers that seem to want to
allocate the per-device data in of_xlate and make it convoluted and
memory leaking buggy on error paths.

So, I would move toward having the driver's probe invoke a helper like:

    iommu_of_xlate(dev, fwspec, &per_fwnode_function, &args);

Which generates the same list of (fwnode_handle, of_phandle_args) that
was passed to of_xlate today, but is ordered sensibly within the
sequence of probe for what many drivers seem to want to do.

Grep for of_xlate. It is a standard and well-understood callback pattern for a subsystem to parse a common DT binding and pass a driver-specific specifier to a driver to interpret. Or maybe you just have a peculiar definition of what you think "micro-optimisation" means? :/

So, it is not so much that that the idea of of_xlate goes away, but
the specific op->of_xlate does, it gets shifted into a helper that
invokes the same function in a more logical spot.

I'm curious how you imagine an IOMMU driver's ->probe function could be called *before* parsing the firmware to work out what, if any, IOMMU, and thus driver, a device is associated with. Unless you think we should have the horrible perf model of passing the device to *every* registered ->probe callback in turn until someone claims it. And then every driver has to have identical boilerplate to go off and parse the generic "iommus" binding... which is the whole damn reason for *not* going down that route and instead using an of_xlate mechanism in the first place.

The per-device data can be allocated at the top of probe and passed
through args to fix the lifetime bugs.

It is pretty simple to do.

I believe the kids these days would say "Say you don't understand the code without saying you don't understand the code."

Most of this series constitutes a giant sweeping redesign of a whole bunch
of internal machinery to permit it to be used concurrently, where that
concurrency should still not exist in the first place because the thing that
allows it to happen also causes other problems like groups being broken.
Once the real problem is fixed there will be no need for any of this, and at
worst some of it will then actually get in the way.

Not quite. This decouples two unrelated things into seperate
concerns. It is not so much about the concurrency but removing the
abuse of dev->iommu by code that has no need to touch it at all.

Sorry, the "abuse" of storing IOMMU-API-specific data in the place we intentionally created to consolidate all the IOMMU-API-specific data into? Yes, there is an issue with the circumstances in which this data is sometimes accessed, but as I'm starting to tire of repeating, that issue fundamentally dates back to 2017, and the implications were unfortunately overlooked when dev->iommu was later introduced and fwspec moved into it (since the non-DT probing paths still worked as originally designed). Pretending that dev->iommu is the issue here is missing the point.

Decoupling makes moving code around easier since the relationships are
easier to reason about.

Again with the odd definitions of "easier". You know what I think is easy? Having a thing be in the obvious place where it should be (but used in the way that was intended). What I would consider objectively less easy is having a thing sometimes be there but sometimes be somewhere else with loads more API machinery to juggle between the two. Especially when once again, that machinery is itself prone to new bugs.

Once again you've got hung up on one particular detail of one symptom of the *real* issue, so although I can see and follow your chain of reasoning, the fact that it starts from the wrong place makes it not particularly useful in the bigger picture.

You can still allocated a fwnode, populate it, and do the rest of the
flow under a probe function just fine.
I feel like I've explained it many times already, but what needs to happen
is for the firmware parsing and of_xlate stage to be initiated by
__iommu_probe_device() itself.

Yes, OK I see. I don't see a problem, I think this still a good
improvement even in that world it is undesirable to use dev->iommu as
a temporary, even if the locking can work.

ever allowed to get it landed...) which gets to the state of
expecting to

Repost it? Rc1 is out and you need to add one hunk to the new user
domain creation in iommufd.

Well yeah, I'm trying to get that rebase finished (hence why I'm finding broken IOMMUFD selftests), but as always I'm also busy with a lot of other non-upstream things, and every time I have managed to do it so far this year it's ended up being blocked by conflicting changes, so I reserve my optimism...

start from a fwspec. Then it's a case of shuffling around what's currently
in the bus_type dma_configure methods such that point is where the fwspec is
created as well, and the driver-probe-time work is almost removed except for
still deferring if a device is waiting for its IOMMU instance (since that
instance turning up and registering will retrigger the rest itself). And
there at last, a trivial lifecycle and access pattern for dev->iommu (with
the overlapping bits of iommu_fwspec finally able to be squashed as well),
and finally an end to 8 long and unfortunate years of calling things in the
wrong order in ways they were never supposed to be.

Having just gone through this all in detail I don't think it is as
entirely straightforward as this, the open coded callers to
of_dma_configure() are not going to be so nice to unravel.

I've only had this turning over in the back of my mind for about the last 4 years now, so I think I have a good understanding of what to expect, thanks.

Robin.




[Index of Archives]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux