Re: memory hot-add: the kernel can notify udev daemon before creating the sys file state?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 23, 2014 at 5:46 PM, DX Cui <rijcos@xxxxxxxxx> wrote:
> Hi all,
> I'm debugging a strange memory hotplug issue on CentOS 6.5(2.6.32-431.17.1.el6):
> when a chunk of memory is hot-added, it seems the kernel *occasionally* can send
> a MEMORY ADD event to the udev daemon before the kernel actually creates the
> sys file 'state'!
> As a result, udev can't reliably make new memory online by this udev rule:
> SUBSYSTEM=="memory", ACTION=="add", ATTR{state}="online"
>
> Please see the end of the mail for the strace log of udevd when I run udevd
> manually:
>
> When udevd gets a MEMORY ADD event for
> /sys/devices/system/memory/memory23, it tries to write "online" to
> /sys/devices/system/memory/memory23/state, but the file hasn't been created by
> the kernel yet. In this case, when I manually check the file at once with ls, it has
> been created, and I can manually echo online into it to make it online correctly.
>
> Please note: this bad behavior of the kernel is only occasional, which may imply
> there is a race condition somewhere?
>
> BTW, it looks the issue does't exist in 3.10+ kernels. Is this a known issue
> already fixed in new kernels?

Hi all,
I think I found out the root cause: when memory hotplug was introduced in 2005:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3947be1969a9ce455ec30f60ef51efb10e4323d1
there was a race condition in:

+ static int add_memory_block(unsigned long node_id, struct
mem_section *section,
+ unsigned long state, int phys_device)
+{
...
+ ret = register_memory(mem, section, NULL);
+ if (!ret)
+        ret = mem_create_simple_file(mem, phys_index);
+ if (!ret)
+        ret = mem_create_simple_file(mem, state);

Here, first, add_memory_block() invokes register_memory() ->
sysdev_register() -> sysdev_add()->
kobject_uevent(&sysdev->kobj, KOBJ_ADD) to notify udev daemon, then
invokes mem_create_simple_file(). If the current execution is preempted
between the 2 steps, the issue I reported in the previous mail can happen.

Luckily a commit in 2013 has fixed this issue undesignedly:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=96b2c0fc8e74a615888e2bedfe55b439aa4695e1

It looks the new "register_memory() --> ... -> device_add()" path has the
correct order for sysfs creation and notification udev.

It would be great if you can confirm my analysis. :-)

 -- DX

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]