On Thu, Feb 17, 2011 at 8:36 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > Le jeudi 17 février 2011 à 08:13 -0800, Linus Torvalds a écrit : >> >> Nope, that's roughly what I did to (in addition to doing all the .ko >> files and checking for 0xe68 too). Which made me worry that the 0x1e68 >> offset is actually just the stack offset at some random code-path (it >> would stay constant for a particular kernel if there is only one way >> to reach that code, and it's always reached through some stable >> non-irq entrypoint). >> >> People do use on-stack lists, and if you do it wrong I could imagine a >> stale list entry still pointing to the stack later. And while >> INIT_LIST_HEAD() is one pattern to get that "two consecutive words >> pointing to themselves", so is doing a "list_del()" on the last list >> entry that the head points to. >> >> So _if_ somebody has a list_head on the stack, and leaves a stale list >> entry pointing to it, and then later on, when the stack has been >> released that stale list entry is deleted with "list_del()", you'd see >> the same memory corruption pattern. But I'm not aware of any new code >> that would do anything like that. >> >> So I'm stumped, which is why I'm just hoping that extra debugging >> options would catch it closer to the place where it actually occurs. >> The "2kB allocation with a nice compile-time structure offset" sounded >> like _such_ a great way to catch it, but it clearly doesn't :( >> >> > > Hmm, this rings a bell here. > > Unfortunately I have to run so cannot check right now. > > Please take a look at commit 443457242beb6716b43db4d (net: factorize > sync-rcu call in unregister_netdevice_many) > > CC David and Octavian > > dev_close_many() can apparently return with an non empty list Uhhuh. That does look scary. This would also explain why so few people see it, and why it's often close to exit. That __dev_close() looks very scary. When it does static int __dev_close(struct net_device *dev) { LIST_HEAD(single); list_add(&dev->unreg_list, &single); return __dev_close_many(&single); } it leaves that "dev->unreg_list" entry on the on-stack "single" list. "dev_close()" does the same. So if "dev->unreg_list" is _ever_ touched afterwards (without being re-initialized), you've got list corruption. And it does look like default_device_exit_batch() does that by doing a "unregister_netdevice_queue(dev, &dev_kill_list);" which in turn does "list_move_tail(&dev->unreg_list, head);" which is basically just an optimized list_del+list_add. I haven't looked through the cases all that closely, so I might be missing something that re-initializes the queue. But it does look likely, and would explain why it's seen after a suspend (that takes down the networking), and I presume Eric has been doing various network device actions too, no? Even if there is some guarantee that "dev->unreg_list" is never used afterwards, I would _still_ strongly suggest that nobody ever leave random pending on-stack list entries around when the function (that owns the stack) exits. So at a minimum, you'd do something like the attached. TOTALLY UNTESTED PATCH! And I repeat: I don't know the code. I just know "that looks damn scary". [ Btw, that also shows another problem: "list_move()" doesn't trigger the debugging checks when it does the __list_del(). So CONFIG_DEBUG_LIST would never have caught the fact that the "list_move()" was done on a list-entry that didn't have valid list pointers any more. ] Linus
net/core/dev.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 8e726cb..a18c164 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1280,10 +1280,13 @@ static int __dev_close_many(struct list_head *head) static int __dev_close(struct net_device *dev) { + int retval; LIST_HEAD(single); list_add(&dev->unreg_list, &single); - return __dev_close_many(&single); + retval = __dev_close_many(&single); + list_del(&single); + return retval; } int dev_close_many(struct list_head *head) @@ -1325,7 +1328,7 @@ int dev_close(struct net_device *dev) list_add(&dev->unreg_list, &single); dev_close_many(&single); - + list_del(&single); return 0; } EXPORT_SYMBOL(dev_close);