On Thu, 10 Sep 2020 12:20:34 +0200 David Hildenbrand <david@xxxxxxxxxx> wrote: > Hi everybody, > > I was just exploring how /sys/devices/system/memory/memoryX/phys_device > is/was used. It's one of these interfaces that most probably never > should have been added but now we are stuck with it. > > "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3], > back when they were still part of s390x-tools. They were later replaced > [5] by the variants in linux-utils. For example, RHEL6 and RHEL7 contain > lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux > on s390x [4]. > > "phys_device" was added with sysfs support for memory hotplug in commit > 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions") > in 2005. It always returned 0. > > s390x started returning something != 0 on some setups (if sclp.rzm is > set by HW) in 2010 via commit 57b552ba0b2f("memory hotplug/s390: set > phys_device"). > > For s390x, it allowed for identifying which memory block devices belong > to the same memory increment (RZM). Only if all memory block devices > comprising a single memory increment were offline, the memory could > actually be removed in the hypervisor. > > Since commit e5d709bb5fb7 ("s390/memory hotplug: provide > memory_block_size_bytes() function") in 2013 a memory block devices > spans at least one memory increment - which is why the interface isn't > really helpful/used anymore (except by old lsmem/chmem tools). Correct, so I do not see any problem for s390 with removing / changing that for the upstream kernel. BTW, that commit also gave some relief on the scaling issue, at least for s390. With increasing total memory size, we also have increasing increment and thus memory block size. Of course, that also has some limitations, IIRC max. 1 GB increment size, but still better than the 256 MB default size. > > There were once RFC patches to make use of it in ACPI, but it could be > solved using different interfaces [1]. > > > While I'd love to rip it out completely, I think it would break old > lsmem/chmem completely - and I assume that's not acceptable. I was > wondering what would be considered safe to do now/in the future: > > 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on > s390x). This will make old lsmem/chmem behave differently after > switching to a new kernel, like if sclp.rzm would not be set by HW - > AFAIU, it will assume all memory is in a single memory increment. Do we > care? No, at least not until that kernel change would be backported to some old distribution level where we still use lsmem/chmem from s390-tools. Given that this is just some clean-up w/o any functional benefit, and hopefully w/o any negative impact, I think we can safely assume that no distributor will do that "just for fun". Even if there would be good reasons for backports, then I guess we also have good reasons for backporting / switching to the util-linux version of lsmem / chmem for such distribution levels. Alternatively, adjust the s390-tools lsmem / chmem there. But I would rather "rip it out completely" than just return 0. You'd need some lsmem / chmem changes anyway, at least in case this would ever be backported. > 2. Restrict it to s390x only. It always returned 0 on other > architectures, I was not able to find any user. > > I think 2 should be safe to do (never used on other archs). I do wonder > what the feelings are about 1. Please don't add any s390-specific workarounds here, that does not really sound like a clean-up, rather the opposite. That being said, I do not really see the benefit of this change at all. As Michal mentioned, there really should be some more fundamental change. And from the rest of this thread, it also seems that phys_device usage might not be the biggest issue here.