On Tue, 2 Dec 2014, Dave Chinner wrote: > On Mon, Dec 01, 2014 at 04:12:03PM -0800, Sage Weil wrote: > > On Tue, 2 Dec 2014, Dave Chinner wrote: > > > What behaviour are you wanting for a journal file? it sounds like > > > you want it to behave like a wandering log: automatically allocating > > > it's next block where-ever the previous write of any kind occurred? > > > > Precisely. Well, as long as it is adjacent to *some* other scheduled > > write, it would save us a seek. The real question, I guess, is whether > > there is an XFS allocation mode that makes no attempt to avoid > > fragmentation for the file and that chooses something adjacent to other > > small, newly-written data during delayed allocation. > > Ok, so what is the most common underlying storage you need to > optimise for? Is it raid5/6 where a small write will trigger a > larger RMW cycle and so proximity rather than exact adjacency > matters, or is it raid 0/1/jbod where exact adjacency is the only > way to avoid a seek? The common case is a single raw disk. > I suspect that we can play certain tricks to trigger unaligned, > discontiguous allocation (i.e. no target allocation block), but the > question is whether we can get determine sufficient > allocation/writeback context to enable delayed allocation to make > sensible "next written block" decisions. Yeah. > > It's a circular file, usually a few GB in site, written sequentially with > > a range of small to large (block-aligned) write sizes, and (for all > > intents and purposes) is never read. We periodically overwrite the first > > block with recent start and end pointers and other metadata. > > Ok, so it's just another typical WAL file. ;) Nothing to lose sleep over if this mode doesn't already exist, but I expect a fair number of applications could make use of this. FWIW, while I am already distracting you from useful things, I suspect (batched) aio_fsync would be a bigger win for us and probably a smaller investment of effort. :) sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html