On Sep 15, 2014, at 12:51 PM, Theodore Ts'o <tytso@xxxxxxx> wrote: > In ext4, we currently use the page cache to store the allocation > bitmaps. The pages are associated with an internal, in-memory inode > which is located in EXT4_SB(sb)->s_buddy_cache. Since the pages can be > reconstructed at will, either by reading them from disk (in the case of > the actual allocation bitmap), or by calculating the buddy bitmap from > the allocation bitmap, normally we allow the VM to eject the pags as > necessary. > > For a specialty use case, I've been requested to have an optional mode > where the on-disk bitmaps are pinned into memory; this is a situation > where the file system size is known in advance, and the user is willing > to trade off the locked-down memory for the latency gains required by > this use case. As discussed in http://lists.openwall.net/linux-ext4/2013/03/25/15 the bitmap pages were being evicted under memory pressure even when they are active use. That turned out to be an MM problem and not an ext4 problem in the end, and was fixed in commit c53954a092d in 3.11, in case you are running an older kernel. There was a discussion on whether we were doing all of the right calls to mark_page_accessed() in the ext4 code to ensure that these bitmaps were being kept at the hot end of the LRU. > It seems that the simplest way to do that is to use mlock_vma_page() > when the file system is first mounted, and then use munlock_vma_page() > when the file system is unmounted. However, these functions are in > mm/internal.h, so I figured I'd better ask permission before using > them. Does this sound like a sane way to do things? > > The other approach would be to keep an elevated refcount on the pages in > question, but it seemed it would be more efficient use the mlock > facility since that keeps the pages on an unevictable list. It doesn't seem unreasonable to just grab an extra refcount on the pages when they are first loaded. However, the memory usage may be fairly high (32MB per 1TB of disk) so this definitely can't be generally used, and it would be nice to make sure that ext4 is already doing the right thing to keep these important pages in cache. The other option is to improve the in-memory description of free blocks and use an extent map or rbtree to handle this instead of bitmaps. That may also speed up allocation in general, but is a lot more work... > Does using the mlock/munlock_vma_page() functions make sense? Any > pitfalls I should worry about? Note that these pages are never mapped > into userspace, so there is no associated vma; fortunately the functions > don't take a vma argument, their name notwithstanding..... > > Thanks, > > - Ted Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail