On Sun, Aug 14, 2016 at 01:20:12AM -0600, Andreas Dilger wrote: > On Aug 12, 2016, at 12:37 PM, Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> wrote: > > > > Here's stabilized version of my patchset which intended to bring huge pages > > to ext4. > > > > The basics are the same as with tmpfs[1] which is in Linus' tree now and > > ext4 built on top of it. The main difference is that we need to handle > > read out from and write-back to backing storage. > > > > Head page links buffers for whole huge page. Dirty/writeback tracking > > happens on per-hugepage level. > > > > We read out whole huge page at once. It required bumping BIO_MAX_PAGES to > > not less than HPAGE_PMD_NR. I defined BIO_MAX_PAGES to HPAGE_PMD_NR if > > huge pagecache enabled. > > > > On split_huge_page() we need to free buffers before splitting the page. > > Page buffers takes additional pin on the page and can be a vector to mess > > with the page during split. We want to avoid this. > > If try_to_free_buffers() fails, split_huge_page() would return -EBUSY. > > > > Readahead doesn't play with huge pages well: 128k max readahead window, > > assumption on page size, PageReadahead() to track hit/miss. I've got it > > to allocate huge pages, but it doesn't provide any readahead as such. > > I don't know how to do this right. It's not clear at this point if we > > really need readahead with huge pages. I guess it's good enough for now. > > Typically read-ahead is a loss if you are able to get large allocations on > disk, since you can get at least seek_rate * chunk_size throughput from the > disks even with random IO at that size. With 1MB allocations and 7200 > RPM drives this works out to be about 150MB/s, which is close to the > throughput of these drive already. I'm more worried about not about throughput, but latancy spikes once we cross huge page boundaries. We can get cache miss where we had hit with small pages. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html