Re: [BUG] potential hugetlb css refcounting issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Mike,

I really appreciate the quick reply

Mike Kravets wrote: 
> There have been other hugetlb cgroup fixes since 5.10.  I do not believe
> they are related to the underflow issue you have seen.  Just FYI.

Yes, I am aware. Actually I did my best to look at all recent changes
not backported to 5.10 and couldn't find anything related. I tried to
cherry-pick a couple of fixes in case but the bug did not go away.

> However, when a vma is split both resulting vmas would be 'owners' of
> private mapping reserves without incrementing the refcount which would
> lead to the underflow you describe.

Indeed and I do know that programs running on my reproducer machines do
split vmas.

>> 2. After 08cf9faf75580, __free_huge_page() decrements the css
>> refcount for _each_ page unconditionally by calling
>> hugetlb_cgroup_uncharge_page_rsvd().  But a per-page reference count
>> is only taken *per page* outside the reserve case in
>> alloc_huge_page() (i.e hugetlb_cgroup_charge_cgroup_rsvd() is called
>> only if deferred_reserve is true).  In the reserve case, there is
>> only one css reference linked to the resv map (taken in
>> hugetlb_reserve_pages()).  This also leads to an underflow of the
>> counter.  A similar scheme to HPageRestoreReserve can be used to
>> track which pages were allocated in the deferred_reserve case and
>> call hugetlb_cgroup_uncharge_page_rsvd() only for these during
>> freeing.

> I am not sure about the above analysis.  It is true that
> hugetlb_cgroup_uncharge_page_rsvd is called unconditionally in
> free_huge_page.  However, IIUC hugetlb_cgroup_uncharge_page_rsvd will
> only decrement the css refcount if there is a non-NULL hugetlb_cgroup
> pointer in the page.  And, the pointer in the page would only be set
> in the 'deferred_reserve' path of alloc_huge_page.  Unless I am
> missing something, they seem to balance.

Now that you explain, I am pretty sure that you're right and I was
wrong.

I'll confirm that I can't reproduce without my change for 2.

Thank you,

Guillaume.

-- 
Guillaume Morin <guillaume@xxxxxxxxxxx>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux