On Fri, 10 Feb 2023 12:57:40 +0000, Johan Hovold <johan@xxxxxxxxxx> wrote: > > On Fri, Feb 10, 2023 at 11:38:58AM +0000, Marc Zyngier wrote: > > On Fri, 10 Feb 2023 09:56:03 +0000, > > Johan Hovold <johan@xxxxxxxxxx> wrote: > > > > > > On Thu, Feb 09, 2023 at 04:00:55PM +0000, Marc Zyngier wrote: > > > > On Thu, 09 Feb 2023 13:23:23 +0000, > > > > Johan Hovold <johan+linaro@xxxxxxxxxx> wrote: > > > > I went back and forth over that a bit, but decided to only use > > > domain->root->mutex in paths that can be called for hierarchical > > > domains (i.e. the "shared code paths" mentioned above). > > > > > > Using it in paths that are clearly only called for non-hierarchical > > > domains where domain->root == domain felt a bit lazy. > > > > My concern here is that as this code gets further refactored, it may > > become much harder to reason about what is the correct level of > > locking. > > Yeah, that's conceivable. > > > > The counter argument is of course that using domain->root->lock allows > > > people to think less about the code they are changing, but that's not > > > necessarily always a good thing. > > > > Eventually, non-hierarchical domains should simply die and be replaced > > with a single level hierarchy. Having a unified locking in place will > > definitely make the required work clearer. > > > > > Also note that the lockdep asserts in the revmap helpers would catch > > > anyone using domain->mutex where they should not (i.e. using > > > domain->mutex for an hierarchical domain). > > > > Lockdep is great, but lockdep is a runtime thing. It doesn't help > > reasoning about what gets locked when changing this code. > > Contributers are expected to test their changes with lockdep enabled, > right? > > But sure, using root->domain->mutex throughout may prevent prevent > people from getting this wrong. > > I'll update this for v6. > > > > > > @@ -1132,6 +1147,7 @@ struct irq_domain *irq_domain_create_hierarchy(struct irq_domain *parent, > > > > > else > > > > > domain = irq_domain_create_tree(fwnode, ops, host_data); > > > > > if (domain) { > > > > > + domain->root = parent->root; > > > > > domain->parent = parent; > > > > > domain->flags |= flags; > > > > > > > > So we still have a bug here, as we have published a domain that we > > > > keep updating. A parallel probing could find it in the interval and do > > > > something completely wrong. > > > > > > Indeed we do, even if device links should make this harder to hit these > > > days. > > > > > > > Splitting the work would help, as per the following patch. > > > > > > Looks good to me. Do you want to submit that as a patch that I'll rebase > > > on or should I submit it as part of a v6? > > > > Just take it directly. > > Ok, thanks. > > I guess this turns the "Use irq_domain_create_hierarchy()" patches into > fixes that should be backported as well. Maybe. Backports are not my immediate concern. > But note that your proposed diff may not be sufficient to prevent > lookups from racing with domain registration generally. Many drivers > still update the bus token after the domain has been added (and > apparently some still set flags also after creating hierarchies I just > noticed, e.g. amd_iommu_create_irq_domain). The bus token should only rarely be a problem, as it is often set on an intermediate level which isn't directly looked-up by anything else. And if it did happen, it would probably result in a the domain not being found. Flags, on the other hand, are more problematic. But I consider this a driver bug which should be fixed independently. > It seems we'd need to expose a separate allocation and registration > interface, or at least pass in the bus token to a new combined > interface. Potentially, yes. But this could come later down the line. I'm more concerned in getting this series into -next, as the merge window is fast approaching. Thanks, M. -- Without deviation from the norm, progress is not possible.