Hi Adam, Are your osds bluestore or filestore? -- dan On Fri, Jul 13, 2018 at 7:38 AM Adam Tygart <mozes@xxxxxxx> wrote: > > I've hit this today with an upgrade to 12.2.6 on my backup cluster. > Unfortunately there were issues with the logs (in that the files > weren't writable) until after the issue struck. > > 2018-07-13 00:16:54.437051 7f5a0a672700 -1 log_channel(cluster) log > [ERR] : 5.255 full-object read crc 0x4e97b4e != expected 0x6cfe829d on > 5:aa448500:::500.00000000:head > > It is a backup cluster and I can keep it around or blow away the data > (in this instance) as needed for testing purposes. > > -- > Adam > > On Thu, Jul 12, 2018 at 10:39 AM, Alessandro De Salvo > <Alessandro.DeSalvo@xxxxxxxxxxxxx> wrote: > > Some progress, and more pain... > > > > I was able to recover the 200.00000000 using the ceph-objectstore-tool for > > one of the OSDs (all identical copies) but trying to re-inject it just with > > rados put was giving no error while the get was still giving the same I/O > > error. So the solution was to rm the object and the put it again, that > > worked. > > > > However, after restarting one of the MDSes and seeting it to repaired, I've > > hit another, similar problem: > > > > > > 2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log [ERR] : > > error reading table object 'mds0_inotable' -5 ((5) Input/output error) > > > > > > Can I safely try to do the same as for object 200.00000000? Should I check > > something before trying it? Again, checking the copies of the object, they > > have identical md5sums on all the replicas. > > > > Thanks, > > > > > > Alessandro > > > > > > Il 12/07/18 16:46, Alessandro De Salvo ha scritto: > > > > Unfortunately yes, all the OSDs were restarted a few times, but no change. > > > > Thanks, > > > > > > Alessandro > > > > > > Il 12/07/18 15:55, Paul Emmerich ha scritto: > > > > This might seem like a stupid suggestion, but: have you tried to restart the > > OSDs? > > > > I've also encountered some random CRC errors that only showed up when trying > > to read an object, > > but not on scrubbing, that magically disappeared after restarting the OSD. > > > > However, in my case it was clearly related to > > https://tracker.ceph.com/issues/22464 which doesn't > > seem to be the issue here. > > > > Paul > > > > 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo > > <Alessandro.DeSalvo@xxxxxxxxxxxxx>: > >> > >> > >> Il 12/07/18 11:20, Alessandro De Salvo ha scritto: > >> > >>> > >>> > >>> Il 12/07/18 10:58, Dan van der Ster ha scritto: > >>>> > >>>> On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum <gfarnum@xxxxxxxxxx> > >>>> wrote: > >>>>> > >>>>> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo > >>>>> <Alessandro.DeSalvo@xxxxxxxxxxxxx> wrote: > >>>>>> > >>>>>> OK, I found where the object is: > >>>>>> > >>>>>> > >>>>>> ceph osd map cephfs_metadata 200.00000000 > >>>>>> osdmap e632418 pool 'cephfs_metadata' (10) object '200.00000000' -> pg > >>>>>> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) > >>>>>> > >>>>>> > >>>>>> So, looking at the osds 23, 35 and 18 logs in fact I see: > >>>>>> > >>>>>> > >>>>>> osd.23: > >>>>>> > >>>>>> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log > >>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b > >>>>>> on > >>>>>> 10:292cf221:::200.00000000:head > >>>>>> > >>>>>> > >>>>>> osd.35: > >>>>>> > >>>>>> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log > >>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b > >>>>>> on > >>>>>> 10:292cf221:::200.00000000:head > >>>>>> > >>>>>> > >>>>>> osd.18: > >>>>>> > >>>>>> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log > >>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b > >>>>>> on > >>>>>> 10:292cf221:::200.00000000:head > >>>>>> > >>>>>> > >>>>>> So, basically the same error everywhere. > >>>>>> > >>>>>> I'm trying to issue a repair of the pg 10.14, but I'm not sure if it > >>>>>> may > >>>>>> help. > >>>>>> > >>>>>> No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), > >>>>>> and > >>>>>> no disk problems anywhere. No relevant errors in syslogs, the hosts > >>>>>> are > >>>>>> just fine. I cannot exclude an error on the RAID controllers, but 2 of > >>>>>> the OSDs with 10.14 are on a SAN system and one on a different one, so > >>>>>> I > >>>>>> would tend to exclude they both had (silent) errors at the same time. > >>>>> > >>>>> > >>>>> That's fairly distressing. At this point I'd probably try extracting > >>>>> the object using ceph-objectstore-tool and seeing if it decodes properly as > >>>>> an mds journal. If it does, you might risk just putting it back in place to > >>>>> overwrite the crc. > >>>>> > >>>> Wouldn't it be easier to scrub repair the PG to fix the crc? > >>> > >>> > >>> this is what I already instructed the cluster to do, a deep scrub, but > >>> I'm not sure it could repair in case all replicas are bad, as it seems to be > >>> the case. > >> > >> > >> I finally managed (with the help of Dan), to perform the deep-scrub on pg > >> 10.14, but the deep scrub did not detect anything wrong. Also trying to > >> repair 10.14 has no effect. > >> Still, trying to access the object I get in the OSDs: > >> > >> 2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log [ERR] > >> : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on > >> 10:292cf221:::200.00000000:head > >> > >> Was deep-scrub supposed to detect the wrong crc? If yes, them it sounds > >> like a bug. > >> Can I force the repair someway? > >> Thanks, > >> > >> Alessandro > >> > >>> > >>>> > >>>> Alessandro, did you already try a deep-scrub on pg 10.14? > >>> > >>> > >>> I'm waiting for the cluster to do that, I've sent it earlier this > >>> morning. > >>> > >>>> I expect > >>>> it'll show an inconsistent object. Though, I'm unsure if repair will > >>>> correct the crc given that in this case *all* replicas have a bad crc. > >>> > >>> > >>> Exactly, this is what I wonder too. > >>> Cheers, > >>> > >>> Alessandro > >>> > >>>> > >>>> --Dan > >>>> > >>>>> However, I'm also quite curious how it ended up that way, with a > >>>>> checksum mismatch but identical data (and identical checksums!) across the > >>>>> three replicas. Have you previously done some kind of scrub repair on the > >>>>> metadata pool? Did the PG perhaps get backfilled due to cluster changes? > >>>>> -Greg > >>>>> > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> > >>>>>> Alessandro > >>>>>> > >>>>>> > >>>>>> > >>>>>> Il 11/07/18 18:56, John Spray ha scritto: > >>>>>>> > >>>>>>> On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo > >>>>>>> <Alessandro.DeSalvo@xxxxxxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>> Hi John, > >>>>>>>> > >>>>>>>> in fact I get an I/O error by hand too: > >>>>>>>> > >>>>>>>> > >>>>>>>> rados get -p cephfs_metadata 200.00000000 200.00000000 > >>>>>>>> error getting cephfs_metadata/200.00000000: (5) Input/output error > >>>>>>> > >>>>>>> Next step would be to go look for corresponding errors on your OSD > >>>>>>> logs, system logs, and possibly also check things like the SMART > >>>>>>> counters on your hard drives for possible root causes. > >>>>>>> > >>>>>>> John > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Can this be recovered someway? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> > >>>>>>>> Alessandro > >>>>>>>> > >>>>>>>> > >>>>>>>> Il 11/07/18 18:33, John Spray ha scritto: > >>>>>>>>> > >>>>>>>>> On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo > >>>>>>>>> <Alessandro.DeSalvo@xxxxxxxxxxxxx> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> after the upgrade to luminous 12.2.6 today, all our MDSes have > >>>>>>>>>> been > >>>>>>>>>> marked as damaged. Trying to restart the instances only result in > >>>>>>>>>> standby MDSes. We currently have 2 filesystems active and 2 MDSes > >>>>>>>>>> each. > >>>>>>>>>> > >>>>>>>>>> I found the following error messages in the mon: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> mds.0 <node1_IP>:6800/2412911269 down:damaged > >>>>>>>>>> mds.1 <node2_IP>:6800/830539001 down:damaged > >>>>>>>>>> mds.0 <node3_IP>:6800/4080298733 down:damaged > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Whenever I try to force the repaired state with ceph mds repaired > >>>>>>>>>> <fs_name>:<rank> I get something like this in the MDS logs: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2018-07-11 13:20:41.597970 7ff7e010e700 0 > >>>>>>>>>> mds.1.journaler.mdlog(ro) > >>>>>>>>>> error getting journal off disk > >>>>>>>>>> 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) > >>>>>>>>>> log > >>>>>>>>>> [ERR] : Error recovering journal 0x201: (5) Input/output error > >>>>>>>>> > >>>>>>>>> An EIO reading the journal header is pretty scary. The MDS itself > >>>>>>>>> probably can't tell you much more about this: you need to dig down > >>>>>>>>> into the RADOS layer. Try reading the 200.00000000 object (that > >>>>>>>>> happens to be the rank 0 journal header, every CephFS filesystem > >>>>>>>>> should have one) using the `rados` command line tool. > >>>>>>>>> > >>>>>>>>> John > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Any attempt of running the journal export results in errors, like > >>>>>>>>>> this one: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> cephfs-journal-tool --rank=cephfs:0 journal export backup.bin > >>>>>>>>>> Error ((5) Input/output error)2018-07-11 17:01:30.631571 > >>>>>>>>>> 7f94354fff00 -1 > >>>>>>>>>> Header 200.00000000 is unreadable > >>>>>>>>>> > >>>>>>>>>> 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal > >>>>>>>>>> not > >>>>>>>>>> readable, attempt object-by-object dump with `rados` > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Same happens for recover_dentries > >>>>>>>>>> > >>>>>>>>>> cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary > >>>>>>>>>> Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header > >>>>>>>>>> 200.00000000 is unreadable > >>>>>>>>>> Errors: > >>>>>>>>>> 0 > >>>>>>>>>> > >>>>>>>>>> Is there something I could try to do to have the cluster back? > >>>>>>>>>> > >>>>>>>>>> I was able to dump the contents of the metadata pool with rados > >>>>>>>>>> export > >>>>>>>>>> -p cephfs_metadata <filename> and I'm currently trying the > >>>>>>>>>> procedure > >>>>>>>>>> described in > >>>>>>>>>> > >>>>>>>>>> http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery > >>>>>>>>>> but I'm not sure if it will work as it's apparently doing nothing > >>>>>>>>>> at the > >>>>>>>>>> moment (maybe it's just very slow). > >>>>>>>>>> > >>>>>>>>>> Any help is appreciated, thanks! > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Alessandro > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> ceph-users mailing list > >>>>>>>>>> ceph-users@xxxxxxxxxxxxxx > >>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>>> > >>>>>> _______________________________________________ > >>>>>> ceph-users mailing list > >>>>>> ceph-users@xxxxxxxxxxxxxx > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list > >>>>> ceph-users@xxxxxxxxxxxxxx > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@xxxxxxxxxxxxxx > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > -- > > Paul Emmerich > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > > > croit GmbH > > Freseniusstr. 31h > > 81247 München > > www.croit.io > > Tel: +49 89 1896585 90 > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com