Re: file journal fadvise

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2 Dec 2014, Dave Chinner wrote:
> On Mon, Dec 01, 2014 at 04:12:03PM -0800, Sage Weil wrote:
> > On Tue, 2 Dec 2014, Dave Chinner wrote:
> > > What behaviour are you wanting for a journal file? it sounds like
> > > you want it to behave like a wandering log: automatically allocating
> > > it's next block where-ever the previous write of any kind occurred?
> > 
> > Precisely.  Well, as long as it is adjacent to *some* other scheduled 
> > write, it would save us a seek.  The real question, I guess, is whether 
> > there is an XFS allocation mode that makes no attempt to avoid 
> > fragmentation for the file and that chooses something adjacent to other 
> > small, newly-written data during delayed allocation.
> 
> Ok, so what is the most common underlying storage you need to
> optimise for? Is it raid5/6 where a small write will trigger a
> larger RMW cycle and so proximity rather than exact adjacency
> matters, or is it raid 0/1/jbod where exact adjacency is the only
> way to avoid a seek?

The common case is a single raw disk.

> I suspect that we can play certain tricks to trigger unaligned,
> discontiguous allocation (i.e. no target allocation block), but the
> question is whether we can get determine sufficient
> allocation/writeback context to enable delayed allocation to make
> sensible "next written block" decisions.

Yeah.

> > It's a circular file, usually a few GB in site, written sequentially with 
> > a range of small to large (block-aligned) write sizes, and (for all 
> > intents and purposes) is never read.  We periodically overwrite the first 
> > block with recent start and end pointers and other metadata.
> 
> Ok, so it's just another typical WAL file. ;)

Nothing to lose sleep over if this mode doesn't already exist, but I 
expect a fair number of applications could make use of this.

FWIW, while I am already distracting you from useful things, I suspect 
(batched) aio_fsync would be a bigger win for us and probably a smaller 
investment of effort.  :)

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux