On Thu, Jan 25, 2018 at 9:42 AM, David Laight <David.Laight@xxxxxxxxxx> wrote: > From: Dmitry Vyukov [mailto:dvyukov@xxxxxxxxxx] >> Sent: 25 January 2018 08:33 >> >> On Wed, Jan 24, 2018 at 6:52 PM, Linus Torvalds >> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >> > On Wed, Jan 24, 2018 at 12:54 AM, Rasmus Villemoes >> > <rasmus.villemoes@xxxxxxxxx> wrote: >> >> >> >> I see something similar, but at the 30->31 transition, and the >> >> branch-misses remain at 1-3% for higher values, until 42 where it drops >> >> back to 0%. Anyway, I highly doubt we do a lot of string copies of >> >> strings longer then 32. >> > >> > So I really dislike that microbenchmark, because it just has the same >> > length all the time. Which is very wrong, and makes the benchmark >> > pointless. A big part of this all is branch mispredicts, you shouldn't >> > just hand it the pattern on a plate. >> > >> > Anyway, the reason I really dislike the patch is not because I think >> > strscpy() is all that important, but I *do* think that the >> > word-at-a-time thing is conceptually something we do care about, and I >> > hate removing it just because of KASAN not understanding it. >> > >> > So I'd *much* rather have some way to tell KASAN that word-at-a-time >> > is going on. Because that approach definitely makes a difference in >> > other places. >> >> >> The other option was to use READ_ONCE_NOCHECK(). Not sure if the "read >> once" part will affect codegen here, though. >> But if word-at-a-time thing is conceptually something we do care >> about, we could also introduce something like READ_PARTIALLY_VALID(), >> which would check that at least first byte of the read is valid and >> that it does not cross heap block boundary (but outside of KASAN is a >> normal read). > > The first byte might not have been written either. > For example, doing a strlen() on a misaligned string you might read > the aligned word containing the first byte and adjust the value so > that the initial byte(s) are not zero. > After scanning for a zero byte the length would be corrected. Was the first byte at least kmalloc-ed? That's what KASAN checks, it does not care about "written". KMSAN can detect uses of uninit data, but that's another story.