On Thu, Jan 18, 2018 at 09:26:25AM -0800, Linus Torvalds wrote: > On Thu, Jan 18, 2018 at 8:56 AM, Kirill A. Shutemov > <kirill@xxxxxxxxxxxxx> wrote: > > > > I can't say I fully grasp how 'diff' got this value and how it leads to both > > checks being false. > > I think the problem is that page difference when they are in different sections. > > When you do > > pte_page(*pvmw->pte) - pvmw->page > > then the compiler takes the pointer difference, and then divides by > the size of "struct page" to get an index. > > But - and this is important - it does so knowing that the division it > does will have no modulus: the two 'struct page *' pointers are really > in the same array, and they really are 'n*sizeof(struct page)' apart > for some 'n'. > > That means that the compiler can optimize the division. In fact, for > this case, gcc will generate > > subl %ebx, %eax > sarl $3, %eax > imull $-858993459, %eax, %eax > > because 'struct page' is 40 bytes in size, and that magic sequence > happens to divide by 40 (first divide by 8, then that magical "imull" > will divide by 5 *IFF* the thing is evenly divisible by 5 (and not too > big - but the shift guarantees that). > > Basically, it's a magic trick, because real divides are very > expensive, but you can fake them more quickly if you can limit the > input domain. > > But what does it mean if the two "struct page *" are not in the same > array, and the two arrays were allocated not aligned exactly 40 bytes > away, but some random number of pages away? > > You get *COMPLETE*GARBAGE* when you do the above optimized divide. > Suddenly the divide had a modulus (because the base of the two arrays > weren't 40-byte aligned), and the "trick" doesn't work. > > So that's why you can't do pointer diffs between two arrays. Not > because you can't subtract the two pointers, but because the > *division* part of the C pointer diff rules leads to issues. Thanks a lot for the explanation! I wounder if this may be a problem in other places? For instance, perf uses address of a mutex to determinate the lock ordering. See mutex_lock_double(). The mutex is embedded into struct perf_event_context, which is allocated with kzalloc() so I don't see how we can presume that alignment is consistent between them. I don't think it's the only example in kernel. Are we just lucky? -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>