On Fri, Dec 6, 2024, at 12:23, Ferry Toth wrote: > Op 04-12-2024 om 19:55 schreef Andy Shevchenko: >> >> It's all other way around (from SW point of view). For unknown reasons >> Intel decided to release only 32-bit SW and it became the only thing >> that was heavily tested (despite misunderstanding by some developers >> that pointed finger to the HW without researching the issue that >> appears to be purely software in a few cases) _that_ time. Starting >> ca. 2017 I enabled 64-bit for Merrifield and from then it's being used >> by both 32- and 64-bit builds. >> >> I'm totally fine to drop 32-bit defaults for Merrifield/Moorefield, >> but let's hear Ferry who might/may still have a use case for that. > > Do to the design of SLM if found (and it is also documented in Intel's > HW documentation) > > that there is a penalty introduced when executing certain instructions > in 64b mode. The one I found > > is crc32di, running slower than 2 crc32si in series. Then there are > other instructions seem to runs faster in 64b mode. > > And there is of course the usual limited memory space than could benefit > for 32b mode. I never tried the mixed (x86_32?) > > mode. But I am building and testing both i686 and x86_64 for each Edison > image. Hi Ferry, Thanks a lot for the detailed reply, this is exactly the kind of information I was hoping to get out of my series, in particular since we have a lot of the same tradeoffs on low-end 64-bit Arm platforms, and I've been trying to push users toward running 64-bit kernels on those. I generally think that it makes a lot of sense to run 32-bit userspace on memory limited devices, in particular with less than 512MB, but it's often still useful on devices with 1GB. Running a 32-bit kernel is usually not worth it if you can avoid it, and with 1GB of RAM you definitely run into limits either from using HIGHMEM (with CONFIG_VMSPLIT_3G) or in user addressing (with any other VMPLIT_*), in addition to the 32-bit kernels just being less well maintained and missing security features. Using a 64-bit kernel with CONFIG_COMPAT for 32-bit userspace tends to be the best combination for a large number of embedded workloads. As a rough estimate on Arm hardware, I found that a 64-bit kernel tends to use close to twice the amount of RAM for itself (vmlinux, slab caches, page tables, mem_map[]) compared to a 32-bit kernel, but this should be no more than 10-20% of the total RAM for sensible workloads as all the interesting bits happen in userland. I expect the numbers to be similar for x86, but have not looked in detail. In userspace there is more variation depending on the type of application: the base system has a similar 2x ratio, but once you get into data intensive tasks (file server, networking, image/video processing, ...) the overhead of 64-bit userspace is lower because the size of the actual data is the same on both. For the specific case of the crc32di instruction, I suspect the in-kernel version of this can be trivially changed like diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c index 52c5d47ef5a1..60b9b3cab679 100644 --- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -60,10 +60,10 @@ static u32 __pure crc32c_intel_le_hw(u32 crc, unsigned char const *p, size_t len { unsigned int iquotient = len / SCALE_F; unsigned int iremainder = len % SCALE_F; - unsigned long *ptmp = (unsigned long *)p; + unsigned int *ptmp = (unsigned int *)p; while (iquotient--) { - asm(CRC32_INST + asm("crc32l %1, %0" : "+r" (crc) : "rm" (*ptmp)); ptmp++; } to get you the faster version, plus some form of configurability to make sure other CPUs still get the crc32q version by default. > I think that should at minimum be useful to catch 32b errors in the > kernel in certain areas (shared with other 32b > archs. So, I would prefer 32b support for this platform to continue. I can certainly see this both ways, on the one hand I do care a lot about 32-bit Arm platforms and appreciate the help in finding issues on 32-bit kernels. On the other hand I really don't want anyone to waste time testing something that should never be used in practice and keeping a feature in the kernel only for the purpose of regression testing that feature. The platform is also special enough that I don't see testing it in 32-bit mode as particularly helpful to others, and it's unlikely to catch bugs that testing in KVM won't. Testing your 32-bit userland with a 64-bit kernel would be helpful of course to ensure it keeps working for anyone that had been using 32-bit kernel+userspace if we drop 32-bit kernel support for it. One related idea that I've discussed before is to have 32-bit kernels refuse to boot on 64-bit hardware and instead print the URL of a wiki page to explain all of the above. There would probably have to be whitelist of platforms that are buggy in 64-bit mode, and a command line option to revert back to the previous behavior to allow testing. Arnd