Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 15 Aug 2013 11:00:03 -0700

On Thu, Aug 15, 2013 at 5:02 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@xxxxxxxxxxxxxx> wrote:
>>
>> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
>> Unfortunately this does _not resolve_ my issue (too good to be true) :-(
>
> Ho humm. I've found at least one other bug, but that one only affects
> hugepages. Do you perhaps have transparent hugepages enabled? But even
> then it looks quite unlikely.
>
> I'll think about this some more. I'm not happy with how that
> particular whole TLB flushing hack was done, but I need to sleep on
> this.

Ok, so I've slept on it, and here's my current thinking.

The bugs in the TLB handling were all about missing or confused
updates to the TLB range, and the thing is, they were missing or
confused because you had to do really confusing things, and remember
to set the range properly.

And that is because the interface is horrible.

This patch tries to fix the interface instead of trying to patch up
the individual places that *should* set the range some particular way.
Sadly, that means that I had to change the calling convention for
"tlb_gather_mmu()", so the patch is larger than I'd like. But it's all
very straightforward:

 (1) instead of passing "fullmm" to tlb_gather_mmu(), pass the
start/end address.

     A range of 0 to ~0ul implies "fullmm", and we calculate that with
"!(start | (end+1))"

 (2) Because access to start/end now becomes an internal API, the
patch makes *all* TLB gather implementations do this.

     So I added start/end fields to the tlb_gather structure as necessary.

     Note that some architectures already had "start_range/end_range"
values, and I left those alone (because the new start/end might work a
bit differently), but it's very possible that those could be removed,
and they'd just use the "generic" start/end values. I'm cc'ing the
arch list to see what the reaction to this all is.

 (3) I removed all the other games with start/end, because now
start/end is _always_ valid.

     Notably, if any caller of "tlb_flush_mmu()" forgets to update the
start/end fields (like I think the hugetlb case did), it is no longer
a bug. The start/end will have been set up by the initialization of
TLB gather, so we're all good.

 (4) The ONE exception to (3) is the zap_pte_range() case in
mm/memory.c, which used to do all the special start/end games, and now
instead just updates start/end to be the "chunk" it just worked on
before flushing the TLB, and the "rest of the area" afterwards.

Even that special (4) case is simpler now, imho, exactly because
start/end is a valid range at all points (it used to be that it wasn't
a valid range the first time, since it wasn't set up initially). So
now that code in case (4) makes more sense, but more importantly, now
it should be just an optimization - we *could* have dropped all the
start/end updates, but then we'd just ask the TLB to be flushed for
the whole original range every time.

Anyway. I've booted this, and I'm writing this email with a kernel
running this, BUT:

 - I have not compile-tested anything but x86-64, so the non-generic
TLB gather changes are all just done blindly. They are very
straightforward, but still..

 - I have no idea whether this will fix the problem Ben sees, but I
feel happier about the code, because now any place that forgets to set
up start/end will work just fine, because they are always valid. Ben,
please test. I'm worried that the problem you see is something even
more fundamentally wrong with the whole "oops, must flush in the
middle" logic, but I'm _hoping_ this fixes it.

 - This patch is against current git, so to apply you need to have
that commit e6c495a96ce0 cherry-picked to older kernels first. But
other than that I don't think this code has changed, so it should
apply cleanly.

Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...

                          Linus
Attachment:
patch.diff

Description: Binary data