Hi again, Now with all OSDs restarted, I'm getting health: HEALTH_ERR 777 scrub errors Possible data damage: 36 pgs inconsistent (...) pgs: 4764 active+clean 36 active+clean+inconsistent But from what I could read up to now, this is what's expected and should auto-heal when objects are overwritten - fingers crossed as pg repair or scrub doesn't seem to help. New errors in the ceph logs include lines like the following, which I also hope/presume are expected - I still have posts to read on this list about omap and those errors : 2018-07-25 10:20:00.106227 osd.66 osd.66 192.54.207.75:6826/2430367 12 : cluster [ERR] 11.288 shard 207: soid 11:1155c332:::rbd_data.207dce238e1f29.0000000000000527:head data_digest 0xc8997a5b != data_digest 0x2ca15853 from auth oi 11:1155c332:::rbd_data.207dce238e1f29.0000000000000527:head(182554'240410 client.6084296.0:48463693 dirty|data_digest|omap_digest s 4194304 uv 49429318 dd 2ca15853 od ffffffff alloc_hint [0 0 0]) 2018-07-25 10:20:00.106230 osd.66 osd.66 192.54.207.75:6826/2430367 13 : cluster [ERR] 11.288 soid 11:1155c332:::rbd_data.207dce238e1f29.0000000000000527:head: failed to pick suitable auth object But never mind : with the SSD cache in writeback, I just saw the same error again on one VM (only) for now : (lots of these) 2018-07-25 10:15:19.841746 osd.101 osd.101 192.54.207.206:6859/3392654 116 : cluster [ERR] 1.20 copy from 1:06dd6812:::rbd_data.194b8c238e1f29.00000000000007a3:head to 1:06dd6812:::rbd_data.194b8c238e1f29.00000000000007a3:head data digest 0x27451e3c != source 0x12c05014 (osd.101 is a SSD from the cache pool) => yum update => I/O error => Set the TIER pool to forward => yum update starts. Weird, but if that happens only on this host, I can cope with it (I have 780+ scrub errors to handle now :/ ) And just to be sure ;) [root@ceph10 ~]# ceph --admin-daemon /var/run/ceph/*osd*101* version {"version":"12.2.7","release":"luminous","release_type":"stable"} On the good side : this update is forcing us to dive into ceph internals : we'll be more ceph-aware tonight than this morning ;) Cheers Fred -----Message d'origine----- De : SCHAER Frederic Envoyé : mercredi 25 juillet 2018 09:57 À : 'Dan van der Ster' <dan@xxxxxxxxxxxxxx> Cc : ceph-users <ceph-users@xxxxxxxx> Objet : RE: 12.2.7 + osd skip data digest + bluestore + I/O errors Hi Dan, Just checked again : arggghhh... # grep AUTO_RESTART /etc/sysconfig/ceph CEPH_AUTO_RESTART_ON_UPGRADE=no So no :'( RPMs were upgraded, but OSD were not restarted as I thought. Or at least not restarted with new 12.2.7 binaries (but since the skip digest option was present in the running 12.2.6 OSDs, I guess the 12.2.6 osds did not understand that option) I just restarted all of the OSDs : I will check again the behavior and report here - thanks for pointing me in the good direction ! Fred -----Message d'origine----- De : Dan van der Ster [mailto:dan@xxxxxxxxxxxxxx] Envoyé : mardi 24 juillet 2018 16:50 À : SCHAER Frederic <frederic.schaer@xxxxxx> Cc : ceph-users <ceph-users@xxxxxxxx> Objet : Re: 12.2.7 + osd skip data digest + bluestore + I/O errors `ceph versions` -- you're sure all the osds are running 12.2.7 ? osd_skip_data_digest = true is supposed to skip any crc checks during reads. But maybe the cache tiering IO path is different and checks the crc anyway? -- dan On Tue, Jul 24, 2018 at 3:01 PM SCHAER Frederic <frederic.schaer@xxxxxx> wrote: > > Hi, > > > > I read the 12.2.7 upgrade notes, and set “osd skip data digest = true” before I started upgrading from 12.2.6 on my Bluestore-only cluster. > > As far as I can tell, my OSDs all got restarted during the upgrade and all got the option enabled : > > > > This is what I see for a specific OSD taken at random: > > # ceph --admin-daemon /var/run/ceph/ceph-osd.68.asok config show|grep data_digest > > "osd_skip_data_digest": "true", > > > > This is what I see when I try to injectarg the option data digest ignore option : > > > > # ceph tell osd.* injectargs '--osd_skip_data_digest=true' 2>&1|head > > osd.0: osd_skip_data_digest = 'true' (not observed, change may require restart) > > osd.1: osd_skip_data_digest = 'true' (not observed, change may require restart) > > osd.2: osd_skip_data_digest = 'true' (not observed, change may require restart) > > osd.3: osd_skip_data_digest = 'true' (not observed, change may require restart) > > (…) > > > > This has been like that since I upgraded to 12.2.7. > > I read in the releanotes that the skip_data_digest option should be sufficient to ignore the 12.2.6 corruptions and that objects should auto-heal on rewrite… > > > > However… > > > > My config : > > - Using tiering with an SSD hot storage tier > > - HDDs for cold storage > > > > And… I get I/O errors on some VMs when running some commands as simple as “yum check-update”. > > > > The qemu/kvm/libirt logs show me these (in : /var/log/libvirt/qemu) : > > > > block I/O error in device 'drive-virtio-disk0': Input/output error (5) > > > > In the ceph logs, I can see these errors : > > > > 2018-07-24 11:17:56.420391 osd.71 [ERR] 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head to 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head data digest 0x3bb26e16 != source 0xec476c54 > > 2018-07-24 11:17:56.429936 osd.71 [ERR] 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head to 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head data digest 0x3bb26e16 != source 0xec476c54 > > > > (yes, my cluster is seen as healthy) > > > > On the affected OSDs, I can see these errors : > > > > 2018-07-24 11:17:56.420349 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 182367'46340723 mlcod 182367'46340723 active+clean] process_copy_chunk data digest 0x3bb26e16 != source 0xec476c54 > > 2018-07-24 11:17:56.420388 7f034642a700 -1 log_channel(cluster) log [ERR] : 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head to 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head data digest 0x3bb26e16 != source 0xec476c54 > > 2018-07-24 11:17:56.420395 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 182367'46340723 mlcod 182367'46340723 active+clean] finish_promote unexpected promote error (5) Input/output error > > 2018-07-24 11:17:56.429900 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 182367'46340723 mlcod 182367'46340723 active+clean] process_copy_chunk data digest 0x3bb26e16 != source 0xec476c54 > > 2018-07-24 11:17:56.429934 7f034642a700 -1 log_channel(cluster) log [ERR] : 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head to 1:c590b9d7:::rbd_data.1920e2238e1f29.00000000000000e7:head data digest 0x3bb26e16 != source 0xec476c54 > > 2018-07-24 11:17:56.429939 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 182367'46340723 mlcod 182367'46340723 active+clean] finish_promote unexpected promote error (5) Input/output error > > > > And…. I don’t know how to recover from that. > > Pool #1 is my SSD cache tier, hence pg 1.23 is on the SSD side. > > > > I’ve tried setting the cache pool to “readforward” despite the “not well supported” warning and could immediately get back working VMs (no more I/O errors). > > But with no SSD tiering : not really useful. > > > > As soon as I’ve tried setting the cache tier to writeback again, I got those I/O errors again… (not on the yum command, but in the mean time I’ve stopped and set out, then unset out osd.71 to check it with badblocks just in case…) > > I still have to find how to reproduce the io error on an affected host to further try to debug/fix that issue… > > > > Any ideas ? > > > > Thanks && regards > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com