Re: [mm 4.15-rc8] Random oopses under memory pressure.

"Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> · Fri, 19 Jan 2018 02:49:55 +0300

On Thu, Jan 18, 2018 at 09:26:25AM -0800, Linus Torvalds wrote:
> On Thu, Jan 18, 2018 at 8:56 AM, Kirill A. Shutemov
> <kirill@xxxxxxxxxxxxx> wrote:
> >
> > I can't say I fully grasp how 'diff' got this value and how it leads to both
> > checks being false.
> 
> I think the problem is that page difference when they are in different sections.
> 
> When you do
> 
>      pte_page(*pvmw->pte) - pvmw->page
> 
> then the compiler takes the pointer difference, and then divides by
> the size of "struct page" to get an index.
> 
> But - and this is important - it does so knowing that the division it
> does will have no modulus: the two 'struct page *' pointers are really
> in the same array, and they really are 'n*sizeof(struct page)' apart
> for some 'n'.
> 
> That means that the compiler can optimize the division. In fact, for
> this case, gcc will generate
> 
>         subl    %ebx, %eax
>         sarl    $3, %eax
>         imull   $-858993459, %eax, %eax
> 
> because 'struct page' is 40 bytes in size, and that magic sequence
> happens to divide by 40 (first divide by 8, then that magical "imull"
> will divide by 5 *IFF* the thing is evenly divisible by 5 (and not too
> big - but the shift guarantees that).
> 
> Basically, it's a magic trick, because real divides are very
> expensive, but you can fake them more quickly if you can limit the
> input domain.
> 
> But what does it mean if the two "struct page *" are not in the same
> array, and the two arrays were allocated not aligned exactly 40 bytes
> away, but some random number of pages away?
> 
> You get *COMPLETE*GARBAGE* when you do the above optimized divide.
> Suddenly the divide had a modulus (because the base of the two arrays
> weren't 40-byte aligned), and the "trick" doesn't work.
> 
> So that's why you can't do pointer diffs between two arrays. Not
> because you can't subtract the two pointers, but because the
> *division* part of the C pointer diff rules leads to issues.

Thanks a lot for the explanation!

I wounder if this may be a problem in other places?

For instance, perf uses address of a mutex to determinate the lock
ordering. See mutex_lock_double(). The mutex is embedded into struct
perf_event_context, which is allocated with kzalloc() so I don't see how
we can presume that alignment is consistent between them.

I don't think it's the only example in kernel. Are we just lucky?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>