Re: [RFC PATCH] of: device: Support 2nd sources of probeable but undiscoverable devices

Rob Herring <robh+dt@xxxxxxxxxx> · Fri, 22 Sep 2023 09:14:15 -0500

On Thu, Sep 21, 2023 at 12:26 PM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Support for multiple "equivalent" sources for components (also known
> as second sourcing components) is a standard practice that helps keep
> cost down and also makes sure that if one component is unavailable due
> to a shortage that we don't need to stop production for the whole
> product.
>
> Some components are very easy to second source. eMMC, for instance, is
> fully discoverable and probable so you can stuff a wide variety of
> similar eMMC chips on your board and things will work without a hitch.
>
> Some components are more difficult to second source, specifically
> because it's difficult for software to probe what component is present
> on any given board. In cases like this software is provided
> supplementary information to help it, like a GPIO strap or a SKU ID
> programmed into an EEPROM. This helpful information can allow the
> bootloader to select a different device tree. The various different
> "SKUs" of different Chromebooks are examples of this.
>
> Some components are somewhere in between. These in-between components
> are the subject of this patch. Specifically, these components are
> easily "probeable" but not easily "discoverable".
>
> A good example of a probeable but undiscoverable device is an
> i2c-connected touchscreen or trackpad. Two separate components may be
> electrically compatible with each other and may have compatible power
> sequencing requirements but may require different software. If
> software is told about the different possible components (because it
> can't discover them), it can safely probe them to figure out which
> ones are present.
>
> On systems using device tree, if we want to tell the OS about all of
> the different components we need to list them all in the device
> tree. This leads to a problem. The multiple sources for components
> likely use the same resources (GPIOs, interrupts, regulators). If the
> OS tries to probe all of these components at the same time then it
> will detect a resource conflict and that's a fatal error.
>
> The fact that Linux can't handle these probeable but undiscoverable
> devices well has had a few consequences:
> 1. In some cases, we've abandoned the idea of second sourcing
>    components for a given board, which increases cost / generates
>    manufacturing headaches.
> 2. In some cases, we've been forced to add some sort of strapping /
>    EEPROM to indicate which component is present. This adds difficulty
>    to manufacturing / refurb processes.
> 3. In some cases, we've managed to make things work by the skin of our
>    teeth through slightly hacky solutions. Specifically, if we remove
>    the "pinctrl" entry from the various options then it won't
>    conflict. Regulators inherently can have more than one consumer, so
>    as long as there are no GPIOs involved in power sequencing and
>    probing devices then things can work. This is how
>    "sc8280xp-lenovo-thinkpad-x13s" works and also how
>    "mt8173-elm-hana" works.
>
> Let's attempt to do something better. Specifically, we'll allow
> tagging nodes in the device tree as mutually exclusive from one
> another. This says that only one of the components in this group is
> present on any given board. To make it concrete, in my proposal this
> looks like:
>
>   / {
>     tp_ex_group: trackpad-exclusion-group {
>     };

Interesting way to just get a unique identifier. But it could be any
phandle not used by another group. So just point all the devices in a
group to one of the devices in the group.

>   };
>
>   &i2c_bus {
>     tp1: trackpad@10 {
>       ...
>       mutual-exclusion-group = <&tp_ex_group>;
>     };
>     tp2: trackpad@20 {
>       ...
>       mutual-exclusion-group = <&tp_ex_group>;
>     };
>     tp3: trackpad@30 {
>       ...
>       mutual-exclusion-group = <&tp_ex_group>;
>     };
>   };
>
> In Linux, we can make things work by simply only probing one of the
> devices in the group at a time. We can make a mutex per group and
> enforce locking that mutex around probe. If the first device that gets
> the mutex fails to probe then it won't try again. If it succeeds then
> it will acquire the shared resources and future devices (which we know
> can't be present) will fail to get the shared resources. Future
> patches could quiet down errors about failing to acquire shared
> resources or failing to probe if a device is in a
> mutual-exclusion-group.

This seems like overkill to me. Do we really need groups and a mutex
for each group? Worst case is what? 2-3 groups of 2-3 devices?
Instead, what about extending "status" with another value
("fail-needs-probe"? (fail-xxx is a documented value)). Currently, the
kernel would just ignore nodes with that status. Then we can process
those nodes separately 1-by-1. You may just have to change "status"
via a changeset as there's already some support in some buses (I2C,
SPI IIRC) for new devices showing up with overlays. I'm not really a
fan of adding the probe mutex and would prefer if we can serialize
this with just controlling "status". The challenge at that level is
knowing if/when you have probed especially if we have to wait on
modules to load. But if we must serialize with a mutex, with 1 group
it could be a global mutex and a 1 bit flag in struct device instead.

Rob