On Fri, Apr 15, 2022 at 03:10:51PM -0700, Linus Torvalds wrote: > Adding PeterZ and Borislav (who seem to be the last ones to have > worked on the copy and clear_page stuff respectively) and the x86 > maintainers in case somebody gets the urge to just fix this. I guess if enough people ask and keep asking, some people at least try to move... > Because memory clearing should be faster than copying, and the thing > that makes copying fast is that FSRM and ERMS logic (the whole > "manually unrolled copy" is hopefully mostly a thing of the past and > we can consider it legacy) So I did give it a look and it seems to me, if we want to do the alternatives thing here, it will have to look something like arch/x86/lib/copy_user_64.S. I.e., the current __clear_user() will have to become the "handle_tail" thing there which deals with uncopied rest-bytes at the end and the new fsrm/erms/rep_good variants will then be alternative_call_2 or _3. The fsrm thing will have only the handle_tail thing at the end when size != 0. The others - erms and rep_good - will have to check for sizes smaller than, say a cacheline, and for those call the handle_tail thing directly instead of going into a REP loop. The current __clear_user() is still a lot better than that copy_user_generic_unrolled() abomination. And it's not like old CPUs would get any perf penalty - they'll simply use the same code. And then you need the labels for _ASM_EXTABLE_UA() exception handling. Anyway, something along those lines. And then we'll need to benchmark this on a bunch of current machines to make sure there's no funny surprises, perf-wise. I can get cracking on this but I would advise people not to hold their breaths. :) Unless someone has a better idea or is itching to get hands dirty her-/himself. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette