On Thu, Jun 06, 2013 at 02:12:37PM -0400, Dave Jones wrote: > On Thu, Jun 06, 2013 at 12:30:31PM +1000, Michael Ellerman wrote: > > Hi folks, > > > > Has anyone else seen trinity appear to control-c itself? > > > > I've seen this a few times now, and I'm _really_ sure I didn't control-c > > this one manually. It was running overnight in a screen session. > > > > Tail of output is: > > > > [13645] [120] mbind(start=0, len=0, mode=0, nmask=0x1fffb9cf0010, maxnode=0x80000, flags=0x4000) [13655] [182] add_key(_type=0x7fffffff00001001, _description=0x7fffffff00001009, _payload=0x1ffffe2d0000, plen=0, ringid=0x1000000000) = -1 (Bad address) > > = -1 (Invalid argument) > > [13655] [183] getrusage(who=0x8010658612110214, ru=0xc000000000000000) [13645] [121] symlink(oldname="/proc/87/task/87/wchan", newn̳̣��o��s̤�.��̭̣̳̼� ̢̻��̬�̰̦W̮̲�̼̩��i��͡ > > t�̯�h̷̬���̬Ì�̺��e��[13655] [188] readahead(fd=414, offset=1, count=4096) = 196608 > > = -1 (Bad file descriptor)645] [1i�n�̩̹��̹g� ̠̥=0x1ffffe310000) = 13655 > > child 13585 exiting�̠̲̫�fe̤��̱e�̮̠̹̭��l�̲��̠̪i̢�Ì��̯�̩n̸̰g�̱���̬�̦��I̠child 13655 exiting̷�=413, mode=2047) = 0 > > = 0655] [186] setfsgid(gid=0̣�ḥi̼̦�̼v�̩���̩�n̢�̪��̰̠̦t̺�̰i�n�̮̦��g̮�d̴̺child 13639 exiting�͢ep�r�̯���Ì�e̴s̥e̵�̳� nr_segs=976, flags=6) = 2̪44 > > child 13640 exitingnkat(oldname="", newdfd=413, newname="./pro�̹�̼e̦�̪�latency����child 13590 exiting �T̫̺̳o̬� ì̬Ì��nv��̻̣̹�o��̠�̤k > > child 13536 exiting > > child 13613 exiting > > [3791] Bailing main loop. Exit reason: ctrl-c > > I've seen it in the past, but not since last summer when I merged > commit dbad5389a1d5d413e533a85f914f3eeef03a3ebe > > I wonder if there's some other way we're sending signals to child pids that > currently isn't marked AVOID. Looks like it. I added some instrumentation to the kernel in the signal sending path, here's a call trace of the process (trinity-child5) doing the sending: [c0000002ea663640] [c000000000099f5c] .do_send_sig_info+0x5c/0xa0 [c0000002ea6636f0] [c0000000001f578c] .send_sigio_to_task+0x1cc/0x370 [c0000002ea663830] [c0000000001f6498] .send_sigio+0xd8/0x230 [c0000002ea6638f0] [c0000000001f6730] .kill_fasync+0x140/0x380 [c0000002ea6639a0] [c0000000001eb44c] .pipe_write+0x3ec/0x610 [c0000002ea663ac0] [c0000000001df75c] .do_sync_readv_writev+0x9c/0x120 [c0000002ea663c20] [c0000000001e1320] .do_readv_writev+0xf0/0x320 [c0000002ea663d80] [c0000000001e1790] .SyS_writev+0x60/0xe0 [c0000002ea663e30] [c000000000009e60] syscall_exit+0x0/0x98 Looking at the code for send_sigio_to_task() at the very top it says: /* * F_SETSIG can change ->signum lockless in parallel, make * sure we read it once and use the same value throughout. */ int signum = ACCESS_ONCE(fown->signum); So despite it being called send_sigio_to_task() it's really send_fown_signum_to_task() and can potentially send any signal. And I see that trinity does fuzz F_SETSIG, and it looks like it's just passing a rand32() as the signal number argument. So that looks like it to me. I'll try sanitising the arg to F_SETSIG to avoid SIGINT and run for a while. cheers -- To unsubscribe from this list: send the line "unsubscribe trinity" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html