Re: Storage, File Systems and Data Scrubbing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This was very helpful -thanks.  However I'm still trying to reconcile this with something that Sage mentioned a while back on a similar topic. Apparently you can disable the journal if you're using  btrfs.  Is that possible because btrfs takes care of things like atomic object writes and updates to the osd metadata ? 


-----Original Message-----
From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Sage Weil
Sent: Thursday, July 11, 2013 8:39 PM
To: Mark Nelson
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Turning off ceph journaling with xfs ?

 

Note that you *can* disable teh journal if you use btrfs, but your write latency will tend to be pretty terrible.  This is only viable for bulk-storage use cases where throughput trumps all and latency is not an issue at all (it may be seconds).

 

We are planning on eliminating the double-write for at least large writes when using btrfs by cloning data out of the journal and into the target file.  This is not a hugely complex task (although it is non-trivial) but it hasn't made it to the top of the priority list yet.

 

sage



On Mon, Aug 26, 2013 at 4:05 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
ceph-osd builds a transactional interface on top of the usual posix
operations so that we can do things like atomically perform an object
write and update the osd metadata.  The current implementation
requires our own journal and some metadata ordering (which is provided
by the backing filesystem's own journal) to implement our own atomic
operations.  It's true that in some cases you might be able to get
away with having the client replay the operation (which we do anyway
for other reasons), but that wouldn't be enough to ensure consistency
of the filesystem's own internal structures.  It also wouldn't be
enough to ensure that the OSD's internal structure remain consistent
in the case of a crash.  Also, if the client is unavailable to do the
replay, you'd have a problem.

In summary, it's actually really hard to to detect partial/corrupted
writes after a crash without journaling of some form.
-Sam


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux