Re: [PATCH RFC] hugetlbfs 'noautofill' mount option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/2/17 4:43 PM, Dave Hansen wrote:

On 05/02/2017 04:34 PM, Prakash Sangappa wrote:
Similarly, a madvise() option also requires additional system call by every
process mapping the file, this is considered a overhead for the database.
How long-lived are these processes?  For a database, I'd assume that
this would happen a single time, or a single time per mmap() at process
startup time.  Such a syscall would be doing something on the order of
taking mmap_sem, walking the VMA tree, setting a bit per VMA, and
unlocking.  That's a pretty cheap one-time cost...
Plus a call into the filesystem (a_ops?) to check if the underlying filesystem
supports not filling holes to mapped access before setting the bit per vma.
Although the overhead may not be that bad.

Database processes can exit and new once started, for instance, depending on
database activity.


If we do consider a new madvise() option, will it be acceptable
since this will be specifically for hugetlbfs file mappings?
Ideally, it would be something that is *not* specifically for hugetlbfs.
  MADV_NOAUTOFILL, for instance, could be defined to SIGSEGV whenever
memory is touched that was not populated with MADV_WILLNEED, mlock(), etc...

If this is a generic advice type, necessary support will have to be implemented
in various filesystems which can support this.

The proposed behavior for 'noautofill' was to not fill holes in files(like sparse files).
In the page fault path, mm would not know if the mmapped address on which
the fault occurred, is over a hole in the file or just that the page is not available in the page cache. The underlying filesystem would be called and it determines if it is a hole and that is where it would fail and not fill the hole, if this support is added. Normally, filesystem which support sparse files(holes in file) automatically fill the hole when accessed. Then there is the issue of file system block size and page size. If the block sizes are smaller then page size, it could mean the noautofill would only work
if the hole size is equal to  or a multiple of, page size?

In case of hugetlbfs it is much straight forward. Since this filesystem is not like a normal filesystems and and the file sizes are multiple of huge pages. The hole will be a multiple of the huge page size. For this reason then should the advise be specific to hugetlbfs?



If so,
would a new flag to mmap() call itself be acceptable, which would
define the proposed behavior?. That way no additional system calls
need to be made.
I don't feel super strongly about it, but I guess an mmap() flag could
work too.


Same goes with the mmap call, if it is a generic flag.

-Prakash.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux