On Wed, Aug 7, 2013 at 6:40 AM, Jan Kara <jack@xxxxxxx> wrote: > On Mon 05-08-13 12:43:58, Andy Lutomirski wrote: >> My application fallocates and mmaps (shared, writable) a lot (several >> GB) of data at startup. Those mappings are mlocked, and they live on >> ext4. The first write to any given page is slow because >> ext4_da_get_block_prep can block. This means that, to get decent >> performance, I need to write something to all of these pages at >> startup. This, in turn, causes a giant IO storm as several GB of >> zeros get pointlessly written to disk. >> >> This series is an attempt to add madvise(..., MADV_WILLWRITE) to >> signal to the kernel that I will eventually write to the referenced >> pages. It should cause any expensive operations that happen on the >> first write to happen immediately, but it should not result in >> dirtying the pages. >> >> madvice(addr, len, MADV_WILLWRITE) returns the number of bytes that >> the operation succeeded on or a negative error code if there was an >> actual failure. A return value of zero signifies that the kernel >> doesn't know how to "willwrite" the range and that userspace should >> implement a fallback. >> >> For now, it only works on shared writable ext4 mappings. Eventually >> it should support other filesystems as well as private pages (it >> should COW the pages but not cause swap IO) and anonymous pages (it >> should COW the zero page if applicable). >> >> The implementation leaves much to be desired. In particular, it >> generates dirty buffer heads on a clean page, and this scares me. >> >> Thoughts? > One question before I look at the patches: Why don't you use fallocate() > in your application? The functionality you require seems to be pretty > similar to it - writing to an already allocated block is usually quick. I do use fallocate, and, IIRC, the problem was worse before I added the fallocate call. This could be argued to be a filesystem problem -- perhaps page_mkwrite should never block. I don't expect that to be fixed any time soon (if ever). --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html