Re: [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 16-12-16 09:22:02, Vegard Nossum wrote:
> Reference counting bugs are hard to debug by their nature since the actual
> manifestation of one can occur very far from where the error is introduced
> (e.g. a missing get() only manifest as a use-after-free when the reference
> count prematurely drops to 0, which could be arbitrarily long after where
> the get() should have happened if there are other users). I wrote this patch
> to try to track down a suspected 'mm_struct' reference counting bug.

I definitely agree that hunting these bugs is a royal PITA, no question
about that. I am just wondering whether this has been motivated by any
particular bug recently. I do not seem to remember any such an issue for
quite some time.

> The basic idea is to keep track of all references, not just with a reference
> counter, but with an actual reference _list_. Whenever you get() or put() a
> reference, you also add or remove yourself, respectively, from the reference
> list. This really helps debugging because (for example) you always put a
> specific reference, meaning that if that reference was not yours to put, you
> will notice it immediately (rather than when the reference counter goes to 0
> and you still have an active reference).

But who is the owner of the reference? A function/task? It is not all
that uncommon to take an mm reference from one context and release it
from a different one. But I might be missing your point here.

> The main interface is in <linux/mm_ref_types.h> and <linux/mm_ref.h>, while
> the implementation lives in mm/mm_ref.c. Since 'struct mm_struct' has both
> ->mm_users and ->mm_count, we introduce helpers for both of them, but use
> the same data structure for each (struct mm_ref). The low-level rules (i.e.
> the ones we have to follow, but which nobody else should really have to
> care about since they use the higher-level interface) are:
> 
>  - after incrementing ->mm_count you also have to call get_mm_ref()
> 
>  - before decrementing ->mm_count you also have to call put_mm_ref()
> 
>  - after incrementing ->mm_users you also have to call get_mm_users_ref()
> 
>  - before decrementing ->mm_users you also have to call put_mm_users_ref()
> 
> The rules that most of the rest of the kernel will care about are:
> 
>  - functions that acquire and return a mm_struct should take a
>    'struct mm_ref *' which it can pass on to mmget()/mmgrab()/etc.
> 
>  - functions that release an mm_struct passed as a parameter should also
>    take a 'struct mm_ref *' which it can pass on to mmput()/mmdrop()/etc.
> 
>  - any function that temporarily acquires a mm_struct reference should
>    use MM_REF() to define an on-stack reference and pass it on to
>    mmget()/mmput()/mmgrab()/mmdrop()/etc.
> 
>  - any structure that holds an mm_struct pointer must also include a
>    'struct mm_ref' member; when the mm_struct pointer is modified you
>    would typically also call mmget()/mmgrab()/mmput()/mmdrop() and they
>    should be called with this mm_ref
> 
>  - you can convert (for example) an on-stack reference to an in-struct
>    reference using move_mm_ref(). This is semantically equivalent to
>    (atomically) taking the new reference and dropping the old one, but
>    doesn't actually need to modify the reference count

This all sounds way too intrusive to me so I am not really sure this is
something we really want. A nice thing for debugging for sure but I am
somehow skeptical whether it is really worth it considering how many
those ref. count bugs we've had.

[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]