Re: unfound blocks IO or gives IO error?

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 12 Jul 2018 09:46:55 +0200

On Wed, Jul 11, 2018 at 11:40 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> On Mon, Jun 25, 2018 at 12:34 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>>
>> On Fri, Jun 22, 2018 at 10:44 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>> >
>> > On Fri, Jun 22, 2018 at 6:22 AM Sergey Malinin <hell@xxxxxxxxxxx> wrote:
>> >>
>> >> From http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/ :
>> >>
>> >> "Now 1 knows that these object exist, but there is no live ceph-osd who has a copy. In this case, IO to those objects will block, and the cluster will hope that the failed node comes back soon; this is assumed to be preferable to returning an IO error to the user."
>> >
>> >
>> > This is definitely the default and the way I recommend you run a cluster. But do keep in mind sometimes other layers in your stack have their own timeouts and will start throwing errors if the Ceph library doesn't return an IO quickly enough. :)
>>
>> Right, that's understood. This is the nice behaviour of virtio-blk vs
>> virtio-scsi: the latter has a timeout but blk blocks forever.
>> On 5000 attached volumes we saw around 12 of these IO errors, and this
>> was the first time in 5 years of upgrades that an IO error happened...
>
>
> Did you ever get more info about this? An unexpected EIO return-to-clients turned up on the mailing list today (http://tracker.ceph.com/issues/24875) but in a brief poke around I didn't see anything about missing objects doing so.

Not really. We understood *why* we had flapping osds following the
upgrade -- it was due to us having 'mon osd report timeout = 60'
(default 900), a setting we had in jewel which was a workaround for
some strange network issues we had in our data centre. It turns out
that in luminous this setting is ultra dangerous -- the osds don't
report pgstats back to the mon anymore so the mon starts marks osds
down every 60s. The resulting flapping led to some momentarily unfound
objects, and that is when we saw the EIO on the clients.

In the days following the upgrade, deep-scrub did find a handful of
inconsistent objects, e.g.

2018-06-25 20:41:18.070684 7f78580af700 -1 log_channel(cluster) log
[ERR] : 4.1e0 : soid
4:078dcd53:::rbd_data.4c50bf229fbf77.0000000000011ec6:head data_digest
0xd3329392 != data_digest 0x8a882df4 from shard 143
2018-06-25 21:07:14.157514 7f78580af700 -1 log_channel(cluster) log
[ERR] : 4.1e0 repair 0 missing, 1 inconsistent objects
2018-06-25 21:07:14.157952 7f78580af700 -1 log_channel(cluster) log
[ERR] : 4.1e0 repair 1 errors, 1 fixed

But I didn't find any corresponding crc errors of reads from those
objects before they were found to be inconsistent.
And no IO errors since the upgrade...

Alessandro's issue sounds pretty scary.

-- Dan

> -Greg
>
>>
>>
>> -- dan
>>
>>
>> > -Greg
>> >
>> >>
>> >>
>> >> On 22.06.2018, at 16:16, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> Quick question: does an IO with an unfound object result in an IO
>> >> error or should the IO block?
>> >>
>> >> During a jewel to luminous upgrade some PGs passed through a state
>> >> with unfound objects for a few seconds. And this seems to match the
>> >> times when we had a few IO errors on RBD attached volumes.
>> >>
>> >> Wondering what is the correct behaviour here...
>> >>
>> >> Cheers, Dan
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com