On Fri, May 23, 2014 at 5:46 PM, DX Cui <rijcos@xxxxxxxxx> wrote: > Hi all, > I'm debugging a strange memory hotplug issue on CentOS 6.5(2.6.32-431.17.1.el6): > when a chunk of memory is hot-added, it seems the kernel *occasionally* can send > a MEMORY ADD event to the udev daemon before the kernel actually creates the > sys file 'state'! > As a result, udev can't reliably make new memory online by this udev rule: > SUBSYSTEM=="memory", ACTION=="add", ATTR{state}="online" > > Please see the end of the mail for the strace log of udevd when I run udevd > manually: > > When udevd gets a MEMORY ADD event for > /sys/devices/system/memory/memory23, it tries to write "online" to > /sys/devices/system/memory/memory23/state, but the file hasn't been created by > the kernel yet. In this case, when I manually check the file at once with ls, it has > been created, and I can manually echo online into it to make it online correctly. > > Please note: this bad behavior of the kernel is only occasional, which may imply > there is a race condition somewhere? > > BTW, it looks the issue does't exist in 3.10+ kernels. Is this a known issue > already fixed in new kernels? Hi all, I think I found out the root cause: when memory hotplug was introduced in 2005: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3947be1969a9ce455ec30f60ef51efb10e4323d1 there was a race condition in: + static int add_memory_block(unsigned long node_id, struct mem_section *section, + unsigned long state, int phys_device) +{ ... + ret = register_memory(mem, section, NULL); + if (!ret) + ret = mem_create_simple_file(mem, phys_index); + if (!ret) + ret = mem_create_simple_file(mem, state); Here, first, add_memory_block() invokes register_memory() -> sysdev_register() -> sysdev_add()-> kobject_uevent(&sysdev->kobj, KOBJ_ADD) to notify udev daemon, then invokes mem_create_simple_file(). If the current execution is preempted between the 2 steps, the issue I reported in the previous mail can happen. Luckily a commit in 2013 has fixed this issue undesignedly: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=96b2c0fc8e74a615888e2bedfe55b439aa4695e1 It looks the new "register_memory() --> ... -> device_add()" path has the correct order for sysfs creation and notification udev. It would be great if you can confirm my analysis. :-) -- DX -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>