Oscar Salvador <osalvador@xxxxxxx> writes: > On Fri, Mar 11, 2022 at 01:06:26PM +0800, Huang, Ying wrote: >> Oscar Salvador <osalvador@xxxxxxx> writes: >> > -static int __init migrate_on_reclaim_init(void) >> > -{ >> > - int ret; >> > - >> > node_demotion = kmalloc_array(nr_node_ids, >> > sizeof(struct demotion_nodes), >> > GFP_KERNEL); >> > WARN_ON(!node_demotion); >> > >> > - ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline", >> > - NULL, migration_offline_cpu); >> > /* >> > - * In the unlikely case that this fails, the automatic >> > - * migration targets may become suboptimal for nodes >> > - * where N_CPU changes. With such a small impact in a >> > - * rare case, do not bother trying to do anything special. >> > + * At this point, all numa nodes with memory/CPus have their state >> > + * properly set, so we can build the demotion order now. >> > */ >> > - WARN_ON(ret < 0); >> > - ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online", >> > - migration_online_cpu, NULL); >> > - WARN_ON(ret < 0); >> > - >> > + set_migration_target_nodes(); >> >> If my understanding were correct, we should enclose >> set_migration_target_nodes() here with cpus_read_lock(). And add some >> comment before set_migration_target_nodes() for this. I don't know >> whether the locking order is right. > > Oh, I see that cpuhp_setup_state() holds the cpu-hotplug lock while > calling in, so yeah, we might want to hold in there. > > The thing is, not long ago we found out that we could have ACPI events > like memory-hotplug operations at boot stage [1], so I guess it is > safe to assume we could also have cpu-hotplug operations at that stage > as well, and so we want to hold cpus_read_lock() just to be on the safe > side. > > But, unless I am missing something, that does not apply to > set_migration_target_nodes() being called from a callback, > as the callback (somewhere up the chain) already holds that lock. > e.g: (_cpu_up takes cpus_write_lock()) and the same for the down > operation. > > So, to sum it up, we only need the cpus_read_lock() in > migrate_on_reclaim_init(). Yes. That is what I want to say. Sorry for confusing. >> > hotplug_memory_notifier(migrate_on_reclaim_callback, 100); >> >> And we should register the notifier before calling set_migration_target_nodes()? > > I cannot made my mind here. > The primary reason I placed the call before registering the notifier is > because the original code called set_migration_target_nodes() before > doing so: > > <-- > ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online", > migration_online_cpu, NULL); > WARN_ON(ret < 0); > > hotplug_memory_notifier(migrate_on_reclaim_callback, 100); > --> > > I thought about following the same line. Why do you think it should be > called afterwards? > > I am not really sure whether it has a different impact depending on the > order. > Note that memory-hotplug acpi events can happen at boot time, so by the > time we register the memory_hotplug notifier, we can have some hotplug > memory coming in, and so we call set_migration_target_nodes(). > > But that is fine, and I cannot see a difference shufling the order > of them. > Do you see a problem in there? Per my understanding, the race condition as follows may be possible in theory, CPU1 CPU2 ---- ---- set_migration_target_nodes() <-- // a new node is hotplugged, and missed hotplug_memory_notifier() During boot, this may be impossible in practice. But I still think it's good to make the order correct in general. And it's not hard to do that. Best Regards, Huang, Ying > [1] https://patchwork.kernel.org/project/linux-mm/patch/20200915094143.79181-3-ldufour@xxxxxxxxxxxxx/