On Tue, Dec 13, 2016 at 11:34 AM, Nikolay Borisov <n.borisov.lkml@xxxxxxxxx> wrote: > > > On 13.12.2016 20:51, Eric W. Biederman wrote: >> Nikolay Borisov <n.borisov.lkml@xxxxxxxxx> writes: >> >>> So this thing resurfaced again and I took a hard look into the code but >>> couldn't find anything suspicious. So the allocating and freeing >>> contexts leads me to believe it's the 'tbl' pointer that is being >>> corrupted. The only thing which I do with it is to increase it by two. >>> >>> Perhaps some liveness issues. >> >> To me it feels like a double free somewhere. Like we call dec_ucount >> and thus put_ucount multiple times in a way that goes to 0. >> >> Perhaps there is a peculiarity in the existing code which allows the >> count to go to zero which we don't notice because we don't free anything >> when the count goes to zero today. >> >> Perhaps there is some subtle semantic mismatch between your conversion >> and the inotify code. >> >> I don't know if you made a subtle misreading of the code, or if >> there is an existing bug that your changes took from harmless to >> problematic, but the evidence is overwhelming that something >> is going wrong and it is your patch that brings it out. >> >> If it helps the openvz folks apparently reproduced this with the criu >> regression tests and the appropriate kernel debug options, and confirmed >> the failure was your patch. > > Great but I think I missed this conversation, care to send relevant > threads? I'd like to get to the bottom of this and have it merged? > > @openvz guys - if you care to shout with more details I'd love to work > on getting this fixed! Hi Nikolay, We execute CRIU tests for linux-next and a few days ago they triggered a kernel bug: http://www.spinics.net/lists/linux-mm/msg118204.html If you want to execute these tests to reproduce a bug, you need to do these steps: $ apt-get install gcc make protobuf-c-compiler libprotobuf-c0-dev libaio-dev \ libprotobuf-dev protobuf-compiler python-ipaddr libcap-dev \ libnl-3-dev gdb bash python-protobuf $ git clone https://github.com/xemul/criu.git $ cd criu $ make $ python test/zdtm.py run -a -p 4 Here is a config file, which we use to compile a kernel: https://github.com/avagin/criu-jenkins-digitalocean/blob/master/jenkins-scripts/config I recommend to boot the kernel with slub_debug=FZ. Don't hesitate to ask me if you will have any questions. Thanks, Andrei > >> >> The current state of play is that I would love to merge this if we can >> track down this issue. I dropped this from my tree before I sent my pull >> request to Linus so there is no emergency to get this fixed. >> >> Eric >> >> > _______________________________________________ > Containers mailing list > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx > https://lists.linuxfoundation.org/mailman/listinfo/containers _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers