Scrubbing for RocksDB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,

we were wondering if and how the consistency of OSD journals (BlueStore) is checked.

Our cluster runs on Luminous (12.2.2) and we had migrated all our filestore OSDs to bluestore a couple of months ago. During that process we placed each rocksDB on a separate partition on a RAID1 consisting of two SSDs. Our cluster was healthy, we deep-scrub the whole cluster once a week without any errors etc.

Then we decided to restructure the disk layout on one of the hosts, we didn't want that RAID of SSDs anymore. So we failed one disk (diskB), wiped it and assigned a new volume group to it, now containing one logical volume per OSD. We started the journal migration as mentioned in [1] by copying the data from diskA (degraded RAID1) to diskB (LVM) with dd. The first journal migration worked like a charm, but for the next four partitions the dd command reported errors like these:

---cut here---
FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sense Key : Medium Error [current]
Add. Sense: Read retries exhausted
CDB: Read(10) 28 00 0a 08 8b a0 00 04 00 00
blk_update_request: critical medium error, dev sdk, sector 168332406
Buffer I/O error on dev md126p6, logical block 1363854, async page read
---cut here---

Four of six partitions reported these errors, a look into smartctl confirmed that this SSD is corrupt and has non-recoverable errors. That's why we had to rebuild the respective OSDs from scratch, but at least without rearranging the whole cluster (also mentioned in [1]).

So my question is, why can't I find anything in the ceph logs about this? The scrubbing and deep-scrubbing only check the PGs on the data device for consistency, but what about the journal? Is there any tool we haven't found yet or any mechanism that would detect an I/O error? Of course there is a possibility that the respective blocks on the corrupt partitions haven't been updated for some time, but IMHO there should be something to check the journal's consistency and report it in the ceph logs, something like a journal-scrub, maybe.

Has someone experienced similar issues and can shed some light on this? Any insights would be very helpful.

Regards,
Eugen

[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024913.html

--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : eblock@xxxxxx

        Vorsitzende des Aufsichtsrates: Angelika Mozdzen
          Sitz und Registergericht: Hamburg, HRB 90934
                  Vorstand: Jens-U. Mozdzen
                   USt-IdNr. DE 814 013 983

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux