On Sat, Aug 26, 2023 at 1:01 AM Andrew Jones <ajones@xxxxxxxxxxxxxxxx> wrote: > > On Fri, Aug 25, 2023 at 03:51:28PM -0700, Evan Green wrote: > > On Fri, Aug 25, 2023 at 1:16 AM Andrew Jones <ajones@xxxxxxxxxxxxxxxx> wrote: > > > > > > On Thu, Aug 24, 2023 at 03:06:53PM -0700, Evan Green wrote: > > > > On Thu, Aug 24, 2023 at 10:29 AM Conor Dooley <conor@xxxxxxxxxx> wrote: > > > ... > > > > > Do you want to have this new thing in cpuinfo tell the user "this hart > > > > > has xyz extensions that are supported by a kernel, but maybe not this > > > > > kernel" or to tell the user "this hart has xyz extensions that are > > > > > supported by this kernel"? Your text above says "understood by the > > > > > kernel", but I think that's a poor definition that needs to be improved > > > > > to spell out exactly what you mean. IOW does "understood" mean the > > > > > kernel will parse them into a structure, or does it mean "yes you can > > > > > use this extension on this particular hart". > > > > > > > > I'm imagining /proc/cpuinfo being closer to "the CPU has it and the > > > > kernel at least vaguely understands it, but may not have full support > > > > for it enabled". I'm assuming /proc/cpuinfo is mostly used by 1) > > > > humans wanting to know if they have hardware support for a feature, > > > > and 2) administrators collecting telemetry to manage fleets (ie do I > > > > have any hardware deployed that supports X). > > > > > > Is (2) a special case of (1)? (I want to make sure I understand all the > > > cases.) > > > > More or less, yes. In bucket two are also folks wondering things like > > "are all these crash reports I'm getting specific to machines with X". > > > > > > > > > Programmers looking to > > > > see "is the kernel support for this feature ready right now" would > > > > ideally not be parsing /proc/cpuinfo text, as more direct mechanisms > > > > like specific hwprobe bits for "am I fully ready to go" would be > > > > easier to work with. Feel free to yell at me if this overall vision > > > > seems flawed. > > > > > > > > I tried to look to see if there was consensus among the other > > > > architectures. Aarch64 seems to go with "supported and fully enabled", > > > > as their cpu_has_feature() directly tests elf_hwcap, and elements in > > > > arm64_elf_hwcaps[] are Kconfig gated. X86 is complicated, but IIRC is > > > > more along the lines of "hardware has it". They have two macros, > > > > cpu_has() for "raw capability" and cpu_feature_enabled() for "kernel > > > > can do it too", and they use cpu_has() for /proc/cpuinfo flags. > > > > > > > > > > I'd lean more towards the way AArch64 goes, because, unless /proc/cpuinfo > > > is just a blind regurgitation of an isa string from DT / ACPI, then the > > > kernel must at least know something about it. Advertising a feature which > > > is known, but also known not to work, seems odd to me. So my vote is that > > > only features which are present and enabled in the kernel or present and > > > not necessary to be enabled in the kernel in order for userspace or > > > virtual machines to use be advertised in /proc/cpuinfo. > > > > > > We still have SMBIOS (dmidecode) to blindly dump what the hardware > > > supports for cases (1) and (2) above. > > > > Yeah, there's an argument to be made for that. My worry is it's a > > difficult line to hold. The bar you're really trying to describe (or > > at least what people might take away from it) is "if it's listed here > > then it's fully ready to be used in userspace". But then things get > > squishy when there are additional ways to control the use of the > > feature (prctls() in init to turn it on, usermode policy to turn it > > off, security doodads that disable it, etc). I'm assuming nobody wants > > a version of /proc/cpuinfo that changes depending on which process is > > asking. So then the line would have to be more carefully described as > > "well, the hardware can do it, and the kernel COULD do it under some > > circumstances, but YMMV", which ends up falling somewhat short of the > > original goal. In my mind keeping /proc/cpuinfo as close to "here's > > what the hardware can do" seems like a more defensible position. > > -Evan > > I agree with that. I was actually even trying to say the same thing, > but only by bringing up virtual machines. Once we decide we'll expose > extensions to VMs, whether or not the host kernel enables them, then > none of the other host kernel configurations matter with respect to > advertising the feature, since the guest kernel may have a completely > different set of configurations. My head spins a little when I try to picture a feature which 1) requires kernel support to use, 2) has that kernel support turned off in the host kernel, but 3) is passed down into guest kernels. Generally though, I agree that trying to tie the guarantees of features in /proc/cpuinfo too much to software gets confusing when viewed through the double lens of virtualization. > > So I think we should only be filtering out extensions that are disabled > because they're broken (have a detected erratum), have been "hidden" > (have a kernel command line allowing them to be treated as if not > present), or cannot be used at all due to missing accompanying hardware > descriptions (such as block size info for CBO extensions). In all cases, > I presume we'd wire checks up in riscv_isa_extension_check() and no > checks would be gated on Kconfigs or anything else. And, since > /proc/cpuinfo gets its list from the bitmap that's already filtered by > riscv_isa_extension_check(), then, long story short, we're good to go :-) > > But maybe we can try to spell that policy out a bit more in > Documentation/riscv/uabi.rst. Right, that sounds reasonable to me, and is consistent with the behavior we already have. With this documentation change I was only trying to define the lower bound, rather than the complete definition for every case. In other words, seeing a feature in cpuinfo guarantees only that the hardware (or virtualized hardware) supports the feature, but that's all the language says. So for instance NOT seeing a feature in cpuinfo doesn't necessarily mean the hardware doesn't support it. Software turning it off for the reasons you describe IMO doesn't contradict what's written here. I was planning to leave that tacit, but if you have suggestions on how to spell that out I'd take them. -Evan