Re: [RFC PATCH 0/4] Support vranges on files

Minchan Kim <minchan@xxxxxxxxxx> · Thu, 4 Apr 2013 15:55:09 +0900

Hey John,

First of all, I should confess I just glanced your code and poped
several questions. If I miss something, please slap me.

On Wed, Apr 03, 2013 at 04:52:19PM -0700, John Stultz wrote:
> This patchset is against Minchan's vrange work here:
> 	https://lkml.org/lkml/2013/3/12/105
> 
> Extending it to support volatile ranges on files. In effect
> providing the same functionality of my earlier file based
> volatile range patches on-top of Minchan's anonymous volatile
> range work.
> 
> Volatile ranges on files are different then on anonymous memory,
> because the volatility state can be shared between multiple
> applications. This makes storing the volatile ranges exclusively
> in the mm_struct (or in vmas as in Minchan's earlier work)
> inappropriate.
> 
> The patchset starts with some minor cleanup.
> 
> Then we introduce the idea of a vrange_root, which provides a
> interval-tree root and a lock to protect the tree. This structure
> can then be stored in the mm_struct or in an addres_space. Then the
> same infrastructure can be used to manage volatile ranges on both
> anonymous and file backed memory.

Thanks for the above two patches. It is a nice cleanup.

> 
> Next we introduce a parallel fvrange() syscall for creating
> volatile ranges directly against files.

Okay. It seems you want to replace ashmem interface with fvrange.
I dobut we have to eat a slot for system call. Can't we add "int fd"
in vrange systemcall without inventing new wheel?

> 
> And finally, we change the range pruging logic to be able to
> handle both anonymous and file volatile ranges.

Okay. Then, what's the semantic file-vrange?

There is a file F. Process A mapped some part of file into his
address space. Then, Process B calls fvrange same part.
As I looked over your code, it purges the range although process B
is using now. Right? Is it your intention? Maybe isn't.

Let's define fvrange's semantic same with anon-vrange.
If there is a process using range with non-volatile, at least,
we shouldn't purge at all.

So your [4/4] should investigate all processes mapped the page
atomically. You could do it with i_mmap_mutex and vrange_lock
and percolate the logic into try_to_discard_vpage.

> 
> Now there are some quirks still to be resolved with the approach
> used here. The biggest one being the vrange() call can't be used to
> create volatile ranges against mmapped files. Instead only the

Why?

> fvrange() can be used to create file backed volatile ranges.

I could't understand your point. It would be better to explain
my thought firstly then, you could point out something I am missing
now. Look below.

> 
> This could be overcome by iterating across all the process VMAs to
> determine if they're anonymous or file based, and if file-based,
> create a VMA sized volatile range on the mapping pointed to by the
> VMA.

It needs just when we start to discard pages. Simply, it is related
to reclaim path, NOT system call path so it's not a problem.

> 
> But this would have downsides, as Minchan has been clear that he wants
> to optmize the vrange() calls so that it is very cheap to create and
> destroy volatile ranges. Having simple per-process ranges be created
> means we don't have to iterate across the vmas in the range to
> determine if they're anonymous or file backed. Instead the current
> vrange() code just creates per process ranges (which may or may not
> cover mmapped file data), but will only purge anonymous pages in
> that range. This keeps the vrange() call cheap.

Right.

> 
> Additionally, just creating or destroying a single range is very
> simple to do, and requires a fixed amount of memory known up front.
> Thus we can allocate needed data prior to making any modifications.
> 
> But If we were to create a range that crosses anonymous and file
> backed pages, it must create or destroy multiple per-process or
> per-file ranges. This could require an unknown number of allocations,

This is a part I can fail to parse your opinion.

> opening the possibility of getting an ENOMEM half-way through the
> operation, leaving the volatile range partially created or destroyed.
> 
> So to keep this simple for this first pass, for now we have two
> syscalls for two types of volatile ranges.

My idea is following as

        vrange(fd, start, len, mode, behavior)

A) fd = 0

1) system call context - vrange system call registers new vrange
   in mm_struct.
2) Add new vrange into LRU
3) reclaim context - walk with rmap to confirm all processes make
   the range with volatile -> discard

B) fd = 1

1) system call context - vrange system call registers new vrange
   in address_space
2) Add new vrange into LRU
3) reclaim context - walk with rmap to confirm all processes make
   the range with volatile -> discard

What's the problem in this logic?

> 
> Let me know if you have any thoughts or comments. I'm sure there's
> plenty of room for improvement here.
> 
> In the meantime I'll be playing with some different approaches to
> try to handle single volatile ranges that cross file and anonymous
> vmas.
> 
> The entire queue, both Minchan's changes and mine can be found here:
> git://git.linaro.org/people/jstultz/android-dev.git dev/vrange-minchan
> 
> thanks
> -john
> 
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>