on 2010-3-9 5:46, David Rientjes wrote: [snip] >> Considering the change of task->mems_allowed is not frequent, so in this patch, >> I use two variables as a tag to indicate whether task->mems_allowed need be >> update or not. And before setting the tag, cpuset caches the new mask of every >> task at its task_struct. >> > > So what exactly is the benefit of 58568d2 from last June that caused this > issue to begin with? It seems like this entire patchset is a revert of > that commit. So why shouldn't we just revert that one commit and then add > the locking and updating necessary for configs where > MAX_NUMNODES > BITS_PER_LONG on top? I worried about the consistency of task->mempolicy with task->mems_allowed for configs where MAX_NUMNODES <= BITS_PER_LONG. The problem that I worried is fowllowing: When the kernel allocator allocates pages for tasks, it will access task->mempolicy first and get the allowed node, then check whether that node is allowed by task->mems_allowed. But, Without this patch, ->mempolicy and ->mems_allowed is not updated at the same time. the kernel allocator may access the inconsistent information of ->mempolicy and ->mems_allowed, sush as the allocator gets the allowed node from old mempolicy, but checks whether that node is allowed by new mems_allowed which does't intersect old mempolicy. So I made this patchset. >> +/** >> + * cpuset_update_task_mems_allowed - update task memory placement >> + * >> + * If the current task's mems_allowed_for_update and mempolicy_for_update are >> + * changed by cpuset behind our backs, update current->mems_allowed, >> + * mems_generation and task NUMA mempolicy to the new value. >> + * >> + * Call WITHOUT mems_lock held. >> + * >> + * This routine is needed to update the pre-task mems_allowed and mempolicy >> + * within the tasks context, when it is trying to allocate memory. >> + */ >> +static __always_inline void cpuset_update_task_mems_allowed(void) >> +{ >> + struct task_struct *tsk = current; >> + unsigned long flags; >> + >> + if (unlikely(tsk->mems_generation != tsk->mems_generation_for_update)) { >> + task_mems_lock_irqsave(tsk, flags); >> + tsk->mems_allowed = tsk->mems_allowed_for_update; >> + tsk->mems_generation = tsk->mems_generation_for_update; >> + task_mems_unlock_irqrestore(tsk, flags); > > By this synchronization, you're guaranteeing that no other kernel code > ever reads tsk->mems_allowed when tsk != current? Otherwise, you're > simply protecting the store to tsk->mems_allowed here and not serializing > on the loads that can return empty nodemasks. I guarantee that no other kernel code changes tsk->mems_allowed when tsk != current. so every task can be safe to read tsk->mems_allowed without lock. I will use mems_lock to protect it when other task reads. >> + /* Protection of ->mems_allowed_for_update */ >> + spinlock_t mems_lock; >> + /* >> + * This variable(mems_allowed_for_update) are just used for caching >> + * memory placement information. >> + * >> + * ->mems_allowed are used by the kernel allocator. >> + */ >> + nodemask_t mems_allowed_for_update; /* Protected by mems_lock */ > > Another nodemask_t in struct task_struct for this? And for all configs, > including those that can do atomic updates to mems_allowed? Yes, for all configs. > >> + >> + /* >> + * Increment this integer everytime ->mems_allowed_for_update is >> + * changed by cpuset. Task can compare this number with mems_generation, >> + * and if they are not the same, mems_allowed_for_update is changed and >> + * ->mems_allowed must be updated. In this way, tasks can avoid having >> + * to lock and reload mems_allowed_for_update unless it is changed. >> + */ >> + int mems_generation_for_update; >> + /* >> + * After updating mems_allowed, set mems_generation to >> + * mems_generation_for_update. >> + */ >> + int mems_generation; > > I don't see why you need two mems_generation numbers, one should belong in > the task's cpuset. Then you can compare tsk->mems_generation to > task_cs(tsk)->mems_generation at cpuset_update_task_memory_state() if you > set tsk->mems_generation = task_cs(tsk)->mems_generation on > cpuset_attach() or update_nodemask(). In this way, we must use rcu_read_lock() to protect task's cs, and the performance will slowdown though rcu read lock's spending is very small. Thanks! Miao -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>