On 01/05/2011 02:11 PM, Vasileios Karakasis wrote: > > > On 01/05/2011 12:20 AM, Andi Kleen wrote: >>> Hi, >>> >>> I am sending you the updated patch (against the latest 2.0.6 version). I >>> call numa_police_memory_int() only for the newly allocated pages, when >>> the area is expanded. I also added a numa_realloc_onnode() function in >>> the same fashion as that of the numa_alloc_onnode(), which sets a >>> specific memory binding. I pass the MPOL_MF_MOVE flag to mbind(), but I >>> am not sure if this is worth it, since the call becomes too slow even >>> in the case of no page migration. Without the MPOL_MF_MOVE flag, of >>> course, if the policy changes between realloc's, previously allocated >>> pages won't be affected. >> >> Thinking about it more police_* is likely still the wrong semantics. >> That will always set the current policy. >> >> But the user more likely wants the same policy the original >> mapping had, right? > > I agree with that. In my use case at least, I start with an > alloc_on_node() and keep realloc'ing assuming all new pages will be > allocated on the node I specified. Of course, this questions more the > existence of a realloc_onnode() function, since its functionality > overlaps with that of migrating/moving pages. So adopting these > semantics, I think we can drop the numa_realloc_onnode(). > >> >> This could be implemented by calling get_mempolicy() on the old >> mapping with MPOL_F_ADDR and setting it on the new pages in >> the new mapping. >> > > I will come up with a patch in the next few days. Peeking inside the mremap() source, I can see that the kernel already does this, i.e., mremap() preserves the policy of the original vm area. The problem is when the user has not specified a binding for the original mapping (default policy), in which case copying explicitly the policy from the old to the new pages won't work either; the new pages will still have MPOL_DEFAULT. So realloc() cannot guarantee that the new pages will be allocated on the same node as the preceding alloc(), unless there is a way to obtain the actual node that the pages of the original allocation were allocated on. In my opinion, this isn't a real problem, because even the simple numa_alloc() using the default policy, cannot guarantee that the pages will be allocated on the node of the calling cpu: what if the task is migrated to a different cpu on a different node, while touching (i.e., allocating) the pages with the police_memory_int()? However, if the user calls one of the functions that call mbind(), e.g., alloc_onnode(), then just mremap() will work fine. > >> -Andi >> >> > > Regards, -- V.K.
Attachment:
signature.asc
Description: OpenPGP digital signature