Re: RBD Image can't be formatted - blk_error

Ilya Dryomov <idryomov@xxxxxxxxx> · Fri, 8 Jan 2021 21:23:18 +0100

On Fri, Jan 8, 2021 at 2:19 PM Gaël THEROND <gael.therond@xxxxxxxxxxxx> wrote:
>
> Hi everyone!
>
> I'm facing a weird issue with one of my CEPH clusters:
>
> OS: CentOS - 8.2.2004 (Core)
> CEPH: Nautilus 14.2.11 - stable
> RBD using erasure code profile (K=3; m=2)
>
> When I want to format one of my RBD image (client side) I've got the
> following kernel messages multiple time with different sector IDs:
>
>
> *[2417011.790154] blk_update_request: I/O error, dev rbd23, sector
> 164743869184 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class
> 0[2417011.791404] rbd: rbd23: discard at objno 20110336 2490368~1703936
> result -1  *
>
> At first I thought about a faulty disk BUT the monitoring system is not
> showing anything faulty so I decided to run manual tests on all my OSDs to
> look at disk health using smartctl etc.
>
> None of them is marked as not healthy and actually they don't get any
> counter with faulty sectors/read or writes and the Wear Level is 99%
>
> So, the only particularity of this image is it is a 80Tb image, but it
> shouldn't be an issue as we already have that kind of image size used on
> another pool.
>
> If anyone have a clue at how I could sort this out, I'll be more than happy

Hi Gaël,

What command are you running to format the image?

Is it persistent?  After the first formatting attempt fails, do the
following attempts fail too?

Is it always the same set of sectors?

Could you please attach the output of "rbd info" for that image and the
entire kernel log from the time that image is mapped?

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx