On Tue, 25 Nov 2014, Tomasz Kuzemko wrote: > On Tue, Nov 25, 2014 at 07:10:26AM -0800, Sage Weil wrote: > > On Tue, 25 Nov 2014, Tomasz Kuzemko wrote: > > > Hello, > > > as far as I can tell, Ceph does not make any guarantee that reads from an > > > object return what was actually written to it. In other words, it does not > > > check data integrity (except doing deep-scrub once every few days). > > > Considering the fact that BTRFS is not production-ready, not many people use > > > Ceph on top of ZFS, then the only option to have some sort of guarantee of > > > integrity is to enable "filestore sloppy crc" option. Unfortunately the docs > > > aren't too clear about this matter and "filestore sloppy crc" is not even > > > documented, which is weird considering it's merged since Emperor. > > > > > > Getting back to my actual question - what is the state of "filestore sloppy > > > crc"? Does someone actually use it in production? Are there any > > > considerations one should make before enabling it? Is it safe to enable it > > > on an existing cluster? > > > > We enable it in our automated QA, but do not know of anyone using it in > > production and have not recommended it for that. It is not intended to be > > particularly fast and we didn't thoroughly analyze the xattr size > > implications on the file systems people may run on. Also note that it > > simply fails (crashes) the OSD when it detects an error and has no > > integration with scrub, which makes it not particularly friendly. > > We have run some initial tests of sloppy crc on our dev cluster and > performance hit was in fact neglible (on SSD). We noticed also the > crashing behavior on bad CRC, bad still I would prefer OSD to crash than > to serve corrupted data to the client. So far we only had to modify > upstart script to stop respawning OSD after a few crashes so we can > detect the CRC error and let clients failover to another OSD. > > About xattr size limitations, as I understand it, when using omap no > such limitations apply? Besides, considering default settings of 64k CRC > block and 4M object size, only 64 additional metadata entries for CRC > would be required. I suspect it won't break in that scenario (especially since we haven't seen problems in QA). It definitely isn't tested with non-default striping options where those limits may be blown through. Use with caution. > > Note that I am working on a related patch set that will keep a persistent > > checksum of the entire object that will interact directly with deep > > scrubs. It will not be as fine-grained but is intended for production > > use and will cover the bulk of data that sits unmodified at rest for > > extended periods. > > When is it planned to release this feature? Will it be included as point > release to Giant, or should we expect it in Hammer? It is targetted for hammer and unlikely to be backported. sage > > > > > sage > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > Tomasz Kuzemko > tomasz.kuzemko@xxxxxxx > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com