On 12/6/19 5:40 PM, Segher Boessenkool wrote: > Hi, > > On Thu, Dec 05, 2019 at 07:37:24PM -0600, Frank Rowand wrote: >> On 12/3/19 12:35 PM, Segher Boessenkool wrote: >>> Btw. Some OFs mangle the phandles some way, to make it easier to catch >>> people using it as an address (and similarly, mangle ihandles differently, >>> so you catch confusion between ihandles and phandles as well). Like a >>> simple xor, with some odd number preferably. You should assume *nothing* >>> about phandles, they are opaque identifiers. >> >> For arm32 machines that use dtc to generate the devicetree, which is a >> very large user base, we certainly can make assumptions about phandles. > > I was talking about OF. Phandles are explicitly defined to be opaque > tokens. If there is an extra meaning to them in flattened device trees, > well, the kernel should then only depend on that there, not for more > general phandles. Where is this documented btw? And dtc generated devicetrees are a huge proportion of the OF systems. It is not documented. As an aside, overlays also depend upon the current dtc implementation. If an extremely large value is used for a phandle then overlay application will fail. > >> Especially because the complaints about the overhead of phandle based >> lookups have been voiced by users of this specific set of machines. >> >> For systems with a devicetree that does not follow the assumptions, the >> phandle cache should not measurably increase the overhead of phandle >> based lookups. > > It's an extra memory access and extra code to execute, for not much gain > (if anything). While with a reasonable hash function it will be good > for everyone. > >> If you have measurements of a system where implementing the phandle >> cache increased the overhead, > > Are you seriously saying you think this code can run in zero time and > space on most systems? No. I made no such claim. Note the additional words in the following sentences. >> and the additional overhead is a concern >> (such as significantly increasing boot time) then please share that >> information with us. Otherwise this is just a theoretical exercise. > > The point is that this code could be easily beneficial for most (or all) > users, not just those that use dtc-constructed device trees. It is The point is that the cache was implemented to solve a specific problem for certain specific systems. There had been a few reports of various machines having the same issue, but finally someone measures a **significant** improvement in boot time for a specific machine. The boot time with the cache was **measured** to be much shorter. The boot time for all systems with a dtc generated devicetree is expected to be faster. No one responded to the implementation when it was proposed with a **measurement** that showed increased boot time. A concern of using more memory was raised and discussed, with at least on feature added as a result (freeing the cache in late init if modules are not enabled). Being "beneficial for most (or all) users" has to be balanced against whether the change would remove the benefit for the system that the feature was originally implemented to solve a problem for. There was no performance data supplied to answer this question. (Though eventually Rob did some measurements of the impact on hash efficiency for such a system.) > completely obvious that having a worse cache hash function results in > many more lookups. Whether that results in something expressed as > milliseconds on tiny systems or microseconds on bigger systems is > completely beside the point. There was no performance data accompanying the proposed change that started this thread. There was no data showing whether the systems that this feature was created for would suffer. There was no data showing that the boot time of the pseries systems would improve. There was no assertion made that too much memory was being used by the cache (there was an implied assertion that a large percentage of the memory used for the cache was not used, thus the performance benefit of the cache could be improved by changing to using a hash instead of mask). We had rejected creating a cache for several years until finally some solid data was provided showing an actual need for it. It is not a question of "milliseconds on tiny systems or microseconds on bigger systems". I agree with that. But it does matter whether the performance impact of various implementations is large enough to either solve a problem or create a problem. On the other hand, if the amount of memory used by the cache is a problem (which is _not_ what was asserted by the patch submitter) then we can have a conversation about how to resolve that. -Frank > > > Segher >