On Thu, Sep 21, 2023 at 12:26 PM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote: > > Support for multiple "equivalent" sources for components (also known > as second sourcing components) is a standard practice that helps keep > cost down and also makes sure that if one component is unavailable due > to a shortage that we don't need to stop production for the whole > product. > > Some components are very easy to second source. eMMC, for instance, is > fully discoverable and probable so you can stuff a wide variety of > similar eMMC chips on your board and things will work without a hitch. > > Some components are more difficult to second source, specifically > because it's difficult for software to probe what component is present > on any given board. In cases like this software is provided > supplementary information to help it, like a GPIO strap or a SKU ID > programmed into an EEPROM. This helpful information can allow the > bootloader to select a different device tree. The various different > "SKUs" of different Chromebooks are examples of this. > > Some components are somewhere in between. These in-between components > are the subject of this patch. Specifically, these components are > easily "probeable" but not easily "discoverable". > > A good example of a probeable but undiscoverable device is an > i2c-connected touchscreen or trackpad. Two separate components may be > electrically compatible with each other and may have compatible power > sequencing requirements but may require different software. If > software is told about the different possible components (because it > can't discover them), it can safely probe them to figure out which > ones are present. > > On systems using device tree, if we want to tell the OS about all of > the different components we need to list them all in the device > tree. This leads to a problem. The multiple sources for components > likely use the same resources (GPIOs, interrupts, regulators). If the > OS tries to probe all of these components at the same time then it > will detect a resource conflict and that's a fatal error. > > The fact that Linux can't handle these probeable but undiscoverable > devices well has had a few consequences: > 1. In some cases, we've abandoned the idea of second sourcing > components for a given board, which increases cost / generates > manufacturing headaches. > 2. In some cases, we've been forced to add some sort of strapping / > EEPROM to indicate which component is present. This adds difficulty > to manufacturing / refurb processes. > 3. In some cases, we've managed to make things work by the skin of our > teeth through slightly hacky solutions. Specifically, if we remove > the "pinctrl" entry from the various options then it won't > conflict. Regulators inherently can have more than one consumer, so > as long as there are no GPIOs involved in power sequencing and > probing devices then things can work. This is how > "sc8280xp-lenovo-thinkpad-x13s" works and also how > "mt8173-elm-hana" works. > > Let's attempt to do something better. Specifically, we'll allow > tagging nodes in the device tree as mutually exclusive from one > another. This says that only one of the components in this group is > present on any given board. To make it concrete, in my proposal this > looks like: > > / { > tp_ex_group: trackpad-exclusion-group { > }; Interesting way to just get a unique identifier. But it could be any phandle not used by another group. So just point all the devices in a group to one of the devices in the group. > }; > > &i2c_bus { > tp1: trackpad@10 { > ... > mutual-exclusion-group = <&tp_ex_group>; > }; > tp2: trackpad@20 { > ... > mutual-exclusion-group = <&tp_ex_group>; > }; > tp3: trackpad@30 { > ... > mutual-exclusion-group = <&tp_ex_group>; > }; > }; > > In Linux, we can make things work by simply only probing one of the > devices in the group at a time. We can make a mutex per group and > enforce locking that mutex around probe. If the first device that gets > the mutex fails to probe then it won't try again. If it succeeds then > it will acquire the shared resources and future devices (which we know > can't be present) will fail to get the shared resources. Future > patches could quiet down errors about failing to acquire shared > resources or failing to probe if a device is in a > mutual-exclusion-group. This seems like overkill to me. Do we really need groups and a mutex for each group? Worst case is what? 2-3 groups of 2-3 devices? Instead, what about extending "status" with another value ("fail-needs-probe"? (fail-xxx is a documented value)). Currently, the kernel would just ignore nodes with that status. Then we can process those nodes separately 1-by-1. You may just have to change "status" via a changeset as there's already some support in some buses (I2C, SPI IIRC) for new devices showing up with overlays. I'm not really a fan of adding the probe mutex and would prefer if we can serialize this with just controlling "status". The challenge at that level is knowing if/when you have probed especially if we have to wait on modules to load. But if we must serialize with a mutex, with 1 group it could be a global mutex and a 1 bit flag in struct device instead. Rob