On Thu, Aug 15, 2013 at 5:02 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@xxxxxxxxxxxxxx> wrote: >> >> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10. >> Unfortunately this does _not resolve_ my issue (too good to be true) :-( > > Ho humm. I've found at least one other bug, but that one only affects > hugepages. Do you perhaps have transparent hugepages enabled? But even > then it looks quite unlikely. > > I'll think about this some more. I'm not happy with how that > particular whole TLB flushing hack was done, but I need to sleep on > this. Ok, so I've slept on it, and here's my current thinking. The bugs in the TLB handling were all about missing or confused updates to the TLB range, and the thing is, they were missing or confused because you had to do really confusing things, and remember to set the range properly. And that is because the interface is horrible. This patch tries to fix the interface instead of trying to patch up the individual places that *should* set the range some particular way. Sadly, that means that I had to change the calling convention for "tlb_gather_mmu()", so the patch is larger than I'd like. But it's all very straightforward: (1) instead of passing "fullmm" to tlb_gather_mmu(), pass the start/end address. A range of 0 to ~0ul implies "fullmm", and we calculate that with "!(start | (end+1))" (2) Because access to start/end now becomes an internal API, the patch makes *all* TLB gather implementations do this. So I added start/end fields to the tlb_gather structure as necessary. Note that some architectures already had "start_range/end_range" values, and I left those alone (because the new start/end might work a bit differently), but it's very possible that those could be removed, and they'd just use the "generic" start/end values. I'm cc'ing the arch list to see what the reaction to this all is. (3) I removed all the other games with start/end, because now start/end is _always_ valid. Notably, if any caller of "tlb_flush_mmu()" forgets to update the start/end fields (like I think the hugetlb case did), it is no longer a bug. The start/end will have been set up by the initialization of TLB gather, so we're all good. (4) The ONE exception to (3) is the zap_pte_range() case in mm/memory.c, which used to do all the special start/end games, and now instead just updates start/end to be the "chunk" it just worked on before flushing the TLB, and the "rest of the area" afterwards. Even that special (4) case is simpler now, imho, exactly because start/end is a valid range at all points (it used to be that it wasn't a valid range the first time, since it wasn't set up initially). So now that code in case (4) makes more sense, but more importantly, now it should be just an optimization - we *could* have dropped all the start/end updates, but then we'd just ask the TLB to be flushed for the whole original range every time. Anyway. I've booted this, and I'm writing this email with a kernel running this, BUT: - I have not compile-tested anything but x86-64, so the non-generic TLB gather changes are all just done blindly. They are very straightforward, but still.. - I have no idea whether this will fix the problem Ben sees, but I feel happier about the code, because now any place that forgets to set up start/end will work just fine, because they are always valid. Ben, please test. I'm worried that the problem you see is something even more fundamentally wrong with the whole "oops, must flush in the middle" logic, but I'm _hoping_ this fixes it. - This patch is against current git, so to apply you need to have that commit e6c495a96ce0 cherry-picked to older kernels first. But other than that I don't think this code has changed, so it should apply cleanly. Comments? Especially s390, ARM, ia64, sh and um that I edited blindly... Linus
Attachment:
patch.diff
Description: Binary data