Re: RBD Image can't be formatted - blk_error

Gaël THEROND <gael.therond@xxxxxxxxxxxx> · Mon, 11 Jan 2021 10:09:12 +0100

Hi Ilya,

Here is additional information:
My cluster is a three OSD Nodes cluster with each node having 24 4TB SSD
disks.

The mkfs.xfs command fail with the following error:
https://pastebin.com/yTmMUtQs

I'm using the following command to format the image: mkfs.xfs
/dev/rbd/<pool_name>/<image_name>
I'm facing the same problem (and same sectors) if I'm directly targeting
the device with mkfs.xfs /dev/rbb<devMapID>

The client authentication caps are as follows: https://pastebin.com/UuAHRycF

Regarding your questions, yes, it is a persistent issue as soon as I try to
create a large image from a newly created pool.
Yes, after the first attempt, all new attempts fail too.
Yes, it is always the same set of sectors that fails.

Strange thing is, if I use an already existing pool, and create this 80Tb
image within this pool, it formats it correctly.

Here is the image rbd info output: https://pastebin.com/sAjnmZ4g

Here is the complete kernel logs: https://pastebin.com/SNucPXZW

Thanks a lot for your answer, I hope these logs can help ^^

Le ven. 8 janv. 2021 à 21:23, Ilya Dryomov <idryomov@xxxxxxxxx> a écrit :

> On Fri, Jan 8, 2021 at 2:19 PM Gaël THEROND <gael.therond@xxxxxxxxxxxx>
> wrote:
> >
> > Hi everyone!
> >
> > I'm facing a weird issue with one of my CEPH clusters:
> >
> > OS: CentOS - 8.2.2004 (Core)
> > CEPH: Nautilus 14.2.11 - stable
> > RBD using erasure code profile (K=3; m=2)
> >
> > When I want to format one of my RBD image (client side) I've got the
> > following kernel messages multiple time with different sector IDs:
> >
> >
> > *[2417011.790154] blk_update_request: I/O error, dev rbd23, sector
> > 164743869184 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class
> > 0[2417011.791404] rbd: rbd23: discard at objno 20110336 2490368~1703936
> > result -1  *
> >
> > At first I thought about a faulty disk BUT the monitoring system is not
> > showing anything faulty so I decided to run manual tests on all my OSDs
> to
> > look at disk health using smartctl etc.
> >
> > None of them is marked as not healthy and actually they don't get any
> > counter with faulty sectors/read or writes and the Wear Level is 99%
> >
> > So, the only particularity of this image is it is a 80Tb image, but it
> > shouldn't be an issue as we already have that kind of image size used on
> > another pool.
> >
> > If anyone have a clue at how I could sort this out, I'll be more than
> happy
>
> Hi Gaël,
>
> What command are you running to format the image?
>
> Is it persistent?  After the first formatting attempt fails, do the
> following attempts fail too?
>
> Is it always the same set of sectors?
>
> Could you please attach the output of "rbd info" for that image and the
> entire kernel log from the time that image is mapped?
>
> Thanks,
>
>                 Ilya
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx