Re: [PATCH 2/2] mm: add a field to store names for private anonymous memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 12, 2013 at 1:51 PM, Colin Cross <ccross@xxxxxxxxxxx> wrote:
> On Fri, Jul 12, 2013 at 2:49 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> On Fri, Jul 12, 2013 at 11:40:44AM +0200, Ingo Molnar wrote:
>>> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>
>>> > On Fri, Jul 12, 2013 at 11:15:06AM +0200, Ingo Molnar wrote:
>>> > >
>>> > > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>> > >
>>> > > > We need those files anyway.. The current proposal is that the entire VMA
>>> > > > has a single userspace pointer in it. Or rather a 64bit value.
>>> > >
>>> > > Yes but accessible via /proc/<PID>/mem or so?
>>> >
>>> > *shudder*.. yes. But you're again opening two files. The only advantage
>>> > of this over userspace writing its own files is that the kernel cleans
>>> > things up for you.
>>>
>>> Opening of the files only occurs in the instrumentation case, which is
>>> rare. But temporary files would be forced upon the regular usecase when no
>>> instrumentation goes on.
>>
>> Well, Colin didn't describe the intended use, but I can imagine a case where
>> its not all that rare. System health monitors might frequently want to update
>> this.
>>
>>> > However from what I understood android runs apps as individual users,
>>> > and I think we can do per user tmpfs mounts. So app dies, user exits,
>>> > mount goes *poof*.
>>>
>>> Yes, user-space could be smarter about temporary files.
>>>
>>> Just like big banks could be less risk happy.
>>>
>>> Yet the reality is that if left alone both apps and banks mess up, I don't
>>> think libertarianism works for policy: we are better off offering a
>>> framework that is simple, robust, self-contained, low risk and hard to
>>> mess up?
>>
>> Fair enough; but I still want Colin to tell me why he can't do this in
>> userspace. And what all he wants to go do with this information etc.
>>
>> He's basically not told us much at all.
>
> I covered it a little in the thread on the previous version of the
> patch, but I'll try to give more detail (and include it in a patch
> stack description if I post another version).
>
> In many userspace applications, and especially in VM based
> applications like Android uses heavily, there are multiple different
> allocators in use.  At a minimum there is libc malloc and the stack,
> and in many cases there are libc malloc, the stack, direct syscalls to
> mmap anonymous memory, and multiple VM heaps (one for small objects,
> one for big objects, etc.).  Each of these layers usually has its own
> tools to inspect its usage; malloc by compiling a debug version, the
> VM through heap inspection tools, and for direct syscalls there is
> usually no way to track them.
>
> On Android we heavily use a set of tools that use an extended version
> of the logic covered in Documentation/vm/pagemap.txt to walk all pages
> mapped in userspace and slice their usage by process, shared (COW) vs.
> unique mappings, backing, etc.  This can account for real physical
> memory usage even in cases like fork without exec (which Android uses
> heavily to share as many private COW pages as possible between
> processes), Kernel SamePage Merging, and clean zero pages.  It
> produces a measurement of the pages that only exist in that process
> (USS, for unique), and a measurement of the physical memory usage of
> that process with the cost of shared pages being evenly split between
> processes that share them (PSS).  We need the feature to be efficient
> enough to be left on at all times because app developers and end users
> can use similar tools exposed through system reports and bugreports to
> determine the memory usage of apps
>
> If all anonymous memory is indistinguishable then figuring out the
> real physical memory usage of each heap requires either a pagemap
> walking tool that can understand the heap debugging of every layer, or
> for every layer's heap debugging tools to implement the pagemap
> walking logic, in which case it is hard to get a consistent view of
> memory across the whole system.
>
> Tracking the information in userspace leads to all sorts of problems.
> It either needs to be stored inside the process, which means every
> process has to have an API to export its current heap information upon
> request, or it has to be stored externally in a filesystem that
> somebody needs to clean up on crashes.  It needs to be readable while
> the process is still running, so it has to have some sort of
> synchronization with every layer of userspace.  Efficiently tracking
> the ranges requires reimplementing something like the kernel vma
> trees, and linking to it from every layer of userspace.  It requires
> more memory, more syscalls, more runtime cost, and more complexity to
> separately track regions that the kernel is already tracking.
>
> This feature is considered critical enough that Dalvik (Android's VM)
> uses ashmem, which is effectively deleted tmpfs files, solely to name
> their heaps.   I'd like to get rid of as much ashmem use within
> Android as possible, with an eye towards deprecating it.  ashmem heaps
> work reasonably well for a VM, which is likely to want a single
> contiguous region of address space that it will manage on its own, but
> falls apart for malloc, which often wants small kernel-allocated
> address space regions that may or may not merge with adjacent regions.
>  Blindly using ashmem/deleted tmpfs files instead of anonymous mmaps
> in malloc doubled the number of vmas in our main system process and
> was worse for the GLBenchmark process.
>
> As a concrete example of its usefulness (which should not be
> considered the extent of its usefulness, it's just what I happened to
> be looking at), I was recently tracking down why we were seeing many
> dirty private pages that were all zeroes being merged by KSM.  Using a
> mixture of ashmem naming and an early version of this patch, I could
> slice the the number of KSM merged pages per process and per heap,
> which then told me which heap debugging tools I should use to find who
> was dirtying large regions of zeroes.

Peter, any thoughts on this?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]