Re: What is the state of filestore sloppy CRC?

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 25 Nov 2014 20:47:08 -0800 (PST)

On Tue, 25 Nov 2014, Tomasz Kuzemko wrote:
> On Tue, Nov 25, 2014 at 07:10:26AM -0800, Sage Weil wrote:
> > On Tue, 25 Nov 2014, Tomasz Kuzemko wrote:
> > > Hello,
> > > as far as I can tell, Ceph does not make any guarantee that reads from an
> > > object return what was actually written to it. In other words, it does not
> > > check data integrity (except doing deep-scrub once every few days).
> > > Considering the fact that BTRFS is not production-ready, not many people use
> > > Ceph on top of ZFS, then the only option to have some sort of guarantee of
> > > integrity is to enable "filestore sloppy crc" option. Unfortunately the docs
> > > aren't too clear about this matter and "filestore sloppy crc" is not even
> > > documented, which is weird considering it's merged since Emperor.
> > > 
> > > Getting back to my actual question - what is the state of "filestore sloppy
> > > crc"? Does someone actually use it in production? Are there any
> > > considerations one should make before enabling it? Is it safe to enable it
> > > on an existing cluster?
> > 
> > We enable it in our automated QA, but do not know of anyone using it in 
> > production and have not recommended it for that.  It is not intended to be 
> > particularly fast and we didn't thoroughly analyze the xattr size 
> > implications on the file systems people may run on.  Also note that it 
> > simply fails (crashes) the OSD when it detects an error and has no 
> > integration with scrub, which makes it not particularly friendly.
> 
> We have run some initial tests of sloppy crc on our dev cluster and
> performance hit was in fact neglible (on SSD). We noticed also the
> crashing behavior on bad CRC, bad still I would prefer OSD to crash than
> to serve corrupted data to the client. So far we only had to modify
> upstart script to stop respawning OSD after a few crashes so we can
> detect the CRC error and let clients failover to another OSD.
> 
> About xattr size limitations, as I understand it, when using omap no
> such limitations apply? Besides, considering default settings of 64k CRC
> block and 4M object size, only 64 additional metadata entries for CRC
> would be required.

I suspect it won't break in that scenario (especially since we haven't 
seen problems in QA).  It definitely isn't tested with non-default 
striping options where those limits may be blown through.  Use with 
caution.

> > Note that I am working on a related patch set that will keep a persistent 
> > checksum of the entire object that will interact directly with deep 
> > scrubs.  It will not be as fine-grained but is intended for production 
> > use and will cover the bulk of data that sits unmodified at rest for 
> > extended periods.
> 
> When is it planned to release this feature? Will it be included as point
> release to Giant, or should we expect it in Hammer?

It is targetted for hammer and unlikely to be backported.

sage

> 
> >
> > sage
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> -- 
> Tomasz Kuzemko
> tomasz.kuzemko@xxxxxxx
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com