Adding more people, and linux-mm. On Fri, Aug 5, 2016 at 10:13 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > > Bisected to nowhere :( > > Anyone has an idea ? How easy is it for you to reproduce? It must be *fairly* easy since you tried bisecting it, and presumably all the ones you marked "bad" really are reliably bad. Which means that I would expect that since the bisect failed, some of the "good" ones (particularly at the end) might really be bad, just didn't have the time/load to reproduce. So maybe you could re-test the good ones for a bit longer? Trust all the ones you've marked bad (and presumably trust v4.7 itself as good), and re-try the bisection. That said, it looks pretty bleak. If you don't trust any of the good ones, you'd start out with git bisect start git bisect bad 5bbea66bf8d9ba898abbe5499f06998a993364f6 git bisect good v4.7 and that's still almost 6000 commits. So let's narrow it down by looking at the details: > [ 32.666450] BUG: Bad page state in process swapper/0 pfn:1fd945c > [ 32.672542] page:ffffea007f651700 count:0 mapcount:-511 mapping: (null) index:0x0 > [ 32.680823] flags: 0x1000000000000000() > [ 32.684655] page dumped because: nonzero mapcount > ... > [ 43.477693] BUG: Bad page state in process S05containers pfn:1ff02a3 > [ 43.484417] page:ffffea007fc0a8c0 count:0 mapcount:-511 mapping: (null) index:0x0 > [ 43.492737] flags: 0x1000000000000000() > [ 43.496602] page dumped because: nonzero mapcount Hmm. The _mapcount field is a union with other fields, but that number doesn't make sense for any of the other fields. So it's almost certainly related to "PAGE_KMEMCG_MAPCOUNT_VALUE". So something presumably mapped such a page into an address space, and incremented the number. That should never have happened, of course. Oh. Actually, I guess it *is* PAGE_KMEMCG_MAPCOUNT_VALUE, and what happens is that __page_mapcount() returns "_mapcount+1", so no other increment needed. The fact that one of the trces comes from tlb_flush_mmu_free() still does mean that it has been mapped into a VM, though. Unrelated to that, the "flags" field has bit 60 set, which is presumably just part of the zone/node/section number. Maybe the page_debug() code should print out that information too, not just the flag bit names? Anyway, the PAGE_KMEMCG_MAPCOUNT_VALUE connection makes me blame Vladimir Davydov and commit 4949148ad433. Maybe you could center your testing around that one (ie rather than bisection, try the immediate parent, and then that commit). And maybe the page mapping code could have some debug code for "am I mapping a page that has a mapcount < -1", and alert people to that? To more easily find the path that triggers this? Vladimir? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>