Re: 12.2.7 + osd skip data digest + bluestore + I/O errors

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Tue, 24 Jul 2018 16:49:38 +0200

`ceph versions` -- you're sure all the osds are running 12.2.7 ?

osd_skip_data_digest = true is supposed to skip any crc checks during reads.
But maybe the cache tiering IO path is different and checks the crc anyway?

-- dan

On Tue, Jul 24, 2018 at 3:01 PM SCHAER Frederic <frederic.schaer@xxxxxx> wrote:
>
> Hi,
>
>
>
> I read the 12.2.7 upgrade notes, and set “osd skip data digest = true” before I started upgrading from 12.2.6 on my Bluestore-only cluster.
>
> As far as I can tell, my OSDs all got restarted during the upgrade and all got the option enabled :
>
>
>
> This is what I see for a specific OSD taken at random:
>
> # ceph --admin-daemon /var/run/ceph/ceph-osd.68.asok config show|grep data_digest
>
>     "osd_skip_data_digest": "true",
>
>
>
> This is what I see when I try to injectarg the option data digest ignore option :
>
>
>
> # ceph tell osd.* injectargs '--osd_skip_data_digest=true' 2>&1|head
>
> osd.0: osd_skip_data_digest = 'true' (not observed, change may require restart)
>
> osd.1: osd_skip_data_digest = 'true' (not observed, change may require restart)
>
> osd.2: osd_skip_data_digest = 'true' (not observed, change may require restart)
>
> osd.3: osd_skip_data_digest = 'true' (not observed, change may require restart)
>
> (…)
>
>
>
> This has been like that since I upgraded to 12.2.7.
>
> I read in the releanotes that the skip_data_digest  option should be sufficient to ignore the 12.2.6 corruptions and that objects should auto-heal on rewrite…
>
>
>
> However…
>
>
>
> My config :
>
> -          Using tiering with an SSD hot storage tier
>
> -          HDDs for cold storage
>
>
>
> And… I get I/O errors on some VMs when running some commands as simple as “yum check-update”.
>
>
>
> The qemu/kvm/libirt logs show me these (in : /var/log/libvirt/qemu) :
>
>
>
> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>
>
>
> In the ceph logs, I can see these errors :
>
>
>
> 2018-07-24 11:17:56.420391 osd.71 [ERR] 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head to 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head data digest 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.429936 osd.71 [ERR] 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head to 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head data digest 0x3bb26e16 != source 0xec476c54
>
>
>
> (yes, my cluster is seen as healthy)
>
>
>
> On the affected OSDs, I can see these errors :
>
>
>
> 2018-07-24 11:17:56.420349 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 182367'46340723 mlcod 182367'46340723 active+clean] process_copy_chunk data digest 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.420388 7f034642a700 -1 log_channel(cluster) log [ERR] : 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head to 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head data digest 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.420395 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 182367'46340723 mlcod 182367'46340723 active+clean] finish_promote unexpected promote error (5) Input/output error
>
> 2018-07-24 11:17:56.429900 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 182367'46340723 mlcod 182367'46340723 active+clean] process_copy_chunk data digest 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.429934 7f034642a700 -1 log_channel(cluster) log [ERR] : 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head to 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head data digest 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.429939 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 182367'46340723 mlcod 182367'46340723 active+clean] finish_promote unexpected promote error (5) Input/output error
>
>
>
> And…. I don’t know how to recover from that.
>
> Pool #1 is my SSD cache tier, hence pg 1.23 is on the SSD side.
>
>
>
> I’ve tried setting the cache pool to “readforward” despite the “not well supported” warning and could immediately get back working VMs (no more I/O errors).
>
> But with no SSD tiering : not really useful.
>
>
>
> As soon as I’ve tried setting the cache tier to writeback again, I got those I/O errors again… (not on the yum command, but in the mean time I’ve stopped and set out, then unset out osd.71 to check it with badblocks just in case…)
>
> I still have to find how to reproduce the io error on an affected host to further try to debug/fix that issue…
>
>
>
> Any ideas ?
>
>
>
> Thanks && regards
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com