Hi list,
we were wondering if and how the consistency of OSD journals
(BlueStore) is checked.
Our cluster runs on Luminous (12.2.2) and we had migrated all our
filestore OSDs to bluestore a couple of months ago. During that
process we placed each rocksDB on a separate partition on a RAID1
consisting of two SSDs. Our cluster was healthy, we deep-scrub the
whole cluster once a week without any errors etc.
Then we decided to restructure the disk layout on one of the hosts, we
didn't want that RAID of SSDs anymore. So we failed one disk (diskB),
wiped it and assigned a new volume group to it, now containing one
logical volume per OSD. We started the journal migration as mentioned
in [1] by copying the data from diskA (degraded RAID1) to diskB (LVM)
with dd. The first journal migration worked like a charm, but for the
next four partitions the dd command reported errors like these:
---cut here---
FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sense Key : Medium Error [current]
Add. Sense: Read retries exhausted
CDB: Read(10) 28 00 0a 08 8b a0 00 04 00 00
blk_update_request: critical medium error, dev sdk, sector 168332406
Buffer I/O error on dev md126p6, logical block 1363854, async page read
---cut here---
Four of six partitions reported these errors, a look into smartctl
confirmed that this SSD is corrupt and has non-recoverable errors.
That's why we had to rebuild the respective OSDs from scratch, but at
least without rearranging the whole cluster (also mentioned in [1]).
So my question is, why can't I find anything in the ceph logs about
this? The scrubbing and deep-scrubbing only check the PGs on the data
device for consistency, but what about the journal? Is there any tool
we haven't found yet or any mechanism that would detect an I/O error?
Of course there is a possibility that the respective blocks on the
corrupt partitions haven't been updated for some time, but IMHO there
should be something to check the journal's consistency and report it
in the ceph logs, something like a journal-scrub, maybe.
Has someone experienced similar issues and can shed some light on
this? Any insights would be very helpful.
Regards,
Eugen
[1]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024913.html
--
Eugen Block voice : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail : eblock@xxxxxx
Vorsitzende des Aufsichtsrates: Angelika Mozdzen
Sitz und Registergericht: Hamburg, HRB 90934
Vorstand: Jens-U. Mozdzen
USt-IdNr. DE 814 013 983
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com