On Wed, Nov 25, 2015 at 4:31 PM, Andrey Ryabinin <ryabinin.a.a@xxxxxxxxx> wrote: > 2015-11-25 18:02 GMT+03:00 Peter Zijlstra <peterz@xxxxxxxxxxxxx>: >> On Wed, Nov 25, 2015 at 03:43:54PM +0100, Peter Zijlstra wrote: >>> On Mon, Sep 21, 2015 at 04:01:41PM -0400, Tejun Heo wrote: >>> > So, the only way the patch could have caused the above is if someone >>> > who isn't the task itself is writing to the bitfields while the task >>> > is running. Looking through the fields, ->sched_reset_on_fork seems a >>> > bit suspicious. __sched_setscheduler() looks like it can modify the >>> > bit while the target task is running. Peter, am I misreading the >>> > code? >>> >>> Nope, that's quite possible. Looks like we need to break up those >>> bitfields a bit. All the scheduler ones should be serialized by >>> scheduler locks, but the others are fair game. >> >> Maybe something like so; but my brain is a complete mess today. >> >> --- >> include/linux/sched.h | 11 ++++++----- >> 1 file changed, 6 insertions(+), 5 deletions(-) >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index f425aac63317..b474e0f05327 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -1455,14 +1455,15 @@ struct task_struct { >> /* Used for emulating ABI behavior of previous Linux versions */ >> unsigned int personality; >> >> - unsigned in_execve:1; /* Tell the LSMs that the process is doing an >> - * execve */ >> - unsigned in_iowait:1; >> - >> - /* Revert to default priority/policy when forking */ >> + /* scheduler bits, serialized by scheduler locks */ >> unsigned sched_reset_on_fork:1; >> unsigned sched_contributes_to_load:1; >> unsigned sched_migrated:1; >> + unsigned __padding_sched:29; > > AFAIK the order of bit fields is implementation defined, so GCC could > sort all these bits as it wants. > You could use unnamed zero-widht bit-field to force padding: > > unsigned :0; //force aligment to the next boundary. > >> + >> + /* unserialized, strictly 'current' */ >> + unsigned in_execve:1; /* bit to tell LSMs we're in execve */ >> + unsigned in_iowait:1; >> #ifdef CONFIG_MEMCG >> unsigned memcg_may_oom:1; >> #endif >> I've gathered some evidence that in my case the guilty bit is sched_reset_on_fork: https://groups.google.com/d/msg/syzkaller/o8VqvYNEu_I/I0pXGx79DQAJ This patch should help. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html