On Mon, Dec 01, 2014 at 05:24:46PM -0800, Sage Weil wrote: > On Tue, 2 Dec 2014, Dave Chinner wrote: > > On Mon, Dec 01, 2014 at 04:12:03PM -0800, Sage Weil wrote: > > > On Tue, 2 Dec 2014, Dave Chinner wrote: > > > > What behaviour are you wanting for a journal file? it sounds like > > > > you want it to behave like a wandering log: automatically allocating > > > > it's next block where-ever the previous write of any kind occurred? > > > > > > Precisely. Well, as long as it is adjacent to *some* other scheduled > > > write, it would save us a seek. The real question, I guess, is whether > > > there is an XFS allocation mode that makes no attempt to avoid > > > fragmentation for the file and that chooses something adjacent to other > > > small, newly-written data during delayed allocation. > > > > Ok, so what is the most common underlying storage you need to > > optimise for? Is it raid5/6 where a small write will trigger a > > larger RMW cycle and so proximity rather than exact adjacency > > matters, or is it raid 0/1/jbod where exact adjacency is the only > > way to avoid a seek? > > The common case is a single raw disk. Ok, so it's an exact match that is really required. I'll have a think about it. > > > It's a circular file, usually a few GB in site, written sequentially with > > > a range of small to large (block-aligned) write sizes, and (for all > > > intents and purposes) is never read. We periodically overwrite the first > > > block with recent start and end pointers and other metadata. > > > > Ok, so it's just another typical WAL file. ;) > > Nothing to lose sleep over if this mode doesn't already exist, but I > expect a fair number of applications could make use of this. > > FWIW, while I am already distracting you from useful things, I suspect > (batched) aio_fsync would be a bigger win for us and probably a smaller > investment of effort. :) If you want to test a patch that implements a basic, simple implementation of aio_fsync: http://oss.sgi.com/archives/xfs/2014-06/msg00214.html Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html