Re: realloc function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 01/05/2011 02:11 PM, Vasileios Karakasis wrote:
> 
> 
> On 01/05/2011 12:20 AM, Andi Kleen wrote:
>>> Hi,
>>>
>>> I am sending you the updated patch (against the latest 2.0.6 version). I
>>> call numa_police_memory_int() only for the newly allocated pages, when
>>> the area is expanded. I also added a numa_realloc_onnode() function in
>>> the same fashion as that of the numa_alloc_onnode(), which sets a
>>> specific memory binding. I pass the MPOL_MF_MOVE flag to mbind(), but I
>>> am not sure if this is worth it, since  the call becomes too slow even
>>> in the case of no page migration. Without the MPOL_MF_MOVE flag, of
>>> course, if the policy changes between realloc's, previously allocated
>>> pages won't be affected.
>>
>> Thinking about it more police_* is likely still the wrong semantics.
>> That will always set the current policy.
>>
>> But the user more likely wants the same policy the original
>> mapping had, right?
> 
> I agree with that. In my use case at least, I start with an
> alloc_on_node() and keep realloc'ing assuming all new pages will be
> allocated on the node I specified. Of course, this questions more the
> existence of a realloc_onnode() function, since its functionality
> overlaps with that of migrating/moving pages. So adopting these
> semantics, I think we can drop the numa_realloc_onnode().
> 
>>
>> This could be implemented by calling get_mempolicy() on the old
>> mapping with MPOL_F_ADDR and setting it on the new pages in
>> the new mapping.
>>
> 
> I will come up with a patch in the next few days.

Peeking inside the mremap() source, I can see that the kernel already
does this, i.e., mremap() preserves the policy of the original vm area.

The problem is when the user has not specified a binding for the
original mapping (default policy), in which case copying explicitly the
policy from the old to the new pages won't work either; the new pages
will still have MPOL_DEFAULT. So realloc() cannot guarantee that the new
pages will be allocated on the same node as the preceding alloc(),
unless there is a way to obtain the actual node that the pages of the
original allocation were allocated on. In my opinion, this isn't a real
problem, because even the simple numa_alloc() using the default policy,
cannot guarantee that the pages will be allocated on the node of the
calling cpu: what if the task is migrated to a different cpu on a
different node, while touching (i.e., allocating) the pages with the
police_memory_int()?

However, if the user calls one of the functions that call mbind(), e.g.,
alloc_onnode(), then just mremap() will work fine.


> 
>> -Andi
>>
>>
> 
> Regards,

-- 
V.K.

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Kernel]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]     [Devices]

  Powered by Linux