On Fri, Sep 01, 2023 at 09:52:54AM +0200, Andreas Herrmann wrote: > On Fri, Sep 01, 2023 at 08:50:31AM +0200, Andreas Herrmann wrote: > > On Fri, Aug 04, 2023 at 06:24:18PM -0700, Ricardo Neri wrote: > > > Hi, > > > > Hi Ricardo, > > > > > This is v3 of a patchset to set the number of cache leaves independently > > > for each CPU. v1 and v2 can be found here [1] and here [2]. > > > > I am on CC of your patch set and glanced through it. > > Long ago I've touched related code but now I am not really up-to-date > > to do a qualified review in this area. First, I would have to look > > into documentation to refresh my memory etc. pp. > > > > I've not seen (or it escaped me) information that this was tested on a > > variety of machines that might be affected by this change. And there > > are no Tested-by-tags. > > > > Even if changes look simple and reasonable they can cause issues. > > > > Thus from my POV it would be good to have some information what tests > > were done. I am not asking to test on all possible systems but just > > knowing which system(s) was (were) used for functional testing is of > > value. > > Doing a good review -- trying to catch every flaw -- is really hard to > do. Especially when you are not actively doing development work in an > area. > > For example see > > commit e33b93650fc5 ("blk-iocost: Pass gendisk to ioc_refresh_params") > [Breno Leitao <leitao@xxxxxxxxxx>, Tue Feb 28 03:16:54 2023 -0800] > > This fixes commit > > ce57b558604e ("blk-rq-qos: make rq_qos_add and rq_qos_del more > useful") [Christoph Hellwig <hch@xxxxxx>, Fri Feb 3 16:03:54 2023 > +0100] > > I had reviewed the latter (see > https://marc.info/?i=Y8plg6OAa4lrnyZZ@suselix) and the entire patch > series. I've compared the original code with the patch and walked > through every single hunk of the diff and tried to think it > through. The changes made sense to me. Then came the bug report(s) and > I felt that I had failed tremendously. To err is human. > > What this shows (and it is already known): with every patch new errors > are potentially introduced in the kernel. Functional, and higher > level testing can help to spot them before a kernel is deployed in the > field. > > At a higher level view this proves another thing. > Linux kernel development is a transparent example of > "peer-review-process". > > In our scientific age it is often postulated that peer review is the > way to go[1] and that it kind of guarantees that what's published, or > rolled out, is reasonable and "works". > > The above sample shows that this process will not catch all flaws and > that proper, transparent and reliable tests are required before > anything is deployed in the field. > > This is true for every branch of science. > > If it is purposely not done something is fishy. > > > [1] Also some state that it is "the only way to go" and every thing > figured out without a peer-review-process is false and can't be > trusted. Of course this is a false statement. Hi Andreas, Agreed. Testing is important. For the specific case of these patches, I booted CONFIG_PREEMPT_RT and !CONFIG_PREEMPT_RT kernels. Then I a) Ensured that the splat reported in commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") was not observed. b) Ensured that /sys/devices/system/cpu/cpuX/cache is present. c) Ensured that the contents /sys/devices/system/cpu/cpuX/cache is the same before and after my patches. I tested on the following systems: Intel Alder Lake, Intel Meteor Lake, 2-socket Intel Icelake server, 2-socket Intel Cascade Lake server, 2-socket Intel Skylake server, 4-socket Intel Broadwell server, 2-socket Intel Haswell server, 2-socket AMD Rome server, and 2-socket AMD Milan server. Thanks and BR, Ricardo