Hugh Dickins wrote: > On Tue, 5 Aug 2008, Balbir Singh wrote: >> Hugh Dickins wrote: >> [snip] >>> BUG: unable to handle kernel paging request at 6b6b6b8b >>> IP: [<7817078f>] memrlimit_cgroup_uncharge_as+0x18/0x29 >>> Pid: 22500, comm: swapoff Not tainted (2.6.26-rc8-mm1 #7) >>> [<78161323>] ? exit_mmap+0xaf/0x133 >>> [<781226b1>] ? mmput+0x4c/0xba >>> [<78165ce3>] ? try_to_unuse+0x20b/0x3f5 >>> [<78371534>] ? _spin_unlock+0x22/0x3c >>> [<7816636a>] ? sys_swapoff+0x17b/0x37c >>> [<78102d95>] ? sysenter_past_esp+0x6a/0xa5 >> I am unable to reproduce the problem, > > Me neither, I've spent many hours trying 2.6.27-rc1-mm1 and then > back to 2.6.26-rc8-mm1. But I've been SO stupid: saw it originally > on one machine with SLAB_DEBUG=y, have been trying since mostly on > another with SLUB_DEBUG=y, but never thought to boot with > slub_debug=P,task_struct until now. > Unfortunately, I've not tried on 32 bit and not at all with SLAB_DEBUG=y. I'll give the latter a trial run and see what I get. >> but I do have an initial hypothesis >> >> CPU0 CPU1 >> try_to_unuse >> task 1 stars exiting look at mm = task1->mm >> .. increment mm_users >> task 1 exits >> mm->owner needs to be updated, but >> no new owner is found >> (mm_users > 1, but no other task >> has task->mm = task1->mm) >> mm_update_next_owner() leaves >> >> grace period >> user count drops, call mmput(mm) >> task 1 freed >> dereferencing mm->owner fails > > Yes, that looks right to me: seems obvious now. I don't think your > careful alternation of CPU0/1 events at the end matters: the swapoff > CPU simply dereferences mm->owner after that task has gone. > > (That's a shame, I'd always hoped that mm->owner->comm was going to > be good for use in mm messages, even when tearing down the mm.) > The problem we have is that tasks are independent of mm_struct's (in some ways) and are associated almost like a database associates two entities through keys. >> I do have a potential solution in mind, but I want to make sure my >> hypothesis is correct. > > It seems wrong that memrlimit_cgroup_uncharge_as should be called > after mm->owner may have been changed, even if it's to something safe. > But I forget the mm/task exit details, surely they're tricky. > The fix would be to uncharge when a new owner can no longer be found (I am yet to code/test it though). > By the way, is the ordering in mm_update_next_owner the best? > Would there be less movement if it searched amongst siblings before > it searched amongst children? Ought it to make a first pass trying > to stay within the same cgroup? Yes, we need to make a first pass at keeping it in the same cgroup. You might be right about the sibling optimization. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers