On Mon, May 23, 2022 at 09:12:32AM -0600, Jens Axboe wrote: > > There's several more, AFAICS (cifs, ceph, fuse, gfs2)... The check in > > /dev/fuse turned out to be fine - it's only using primitives, so we > > can pass ITER_UBUF ones there. mm/shmem.c check... similar, but I > > really wonder if x86 clear_user() really sucks worse than > > copy_to_user() from zero page... > > Yep, not surprised if it isn't complete, I just tackled the ones I > found. I do like the idea of having a generic check for that rather > than implicit knowledge about which iter types may contain user memory. > > I haven't looked at clear_user() vs copy_to_user() from the zero page. > But should be trivial to benchmark and profile. I'll try and do that > when I find some time. FWIW, having looked at __clear_user() in arch/x86/lib/usercopy_64.c... I'm not at all surprised. It should be parallel to memset_64.S; as it is, we have loop with one 64bit store per iteration + loop with 8bit stores for the tail, with no attempts to align anything vs. rep stosb if CPU has optimized rep stosb; otherwise rep stosq + rep stosb on CPUs that don't suck at rep sto* in general; otherwise align, then do loop with 8*64bit stores, then loop with 64bit stores, then loop with 8bit stores Shouldn't be hard to do uaccess parallel for that. Reads from /dev/zero, if nothing else, would benefit... Might be worth doing what sparc does, with shared asm for both - it's an out-of-line code anyway, with most of the payload trivially shared...