Hi Ilya, Here is additional information: My cluster is a three OSD Nodes cluster with each node having 24 4TB SSD disks. The mkfs.xfs command fail with the following error: https://pastebin.com/yTmMUtQs I'm using the following command to format the image: mkfs.xfs /dev/rbd/<pool_name>/<image_name> I'm facing the same problem (and same sectors) if I'm directly targeting the device with mkfs.xfs /dev/rbb<devMapID> The client authentication caps are as follows: https://pastebin.com/UuAHRycF Regarding your questions, yes, it is a persistent issue as soon as I try to create a large image from a newly created pool. Yes, after the first attempt, all new attempts fail too. Yes, it is always the same set of sectors that fails. Strange thing is, if I use an already existing pool, and create this 80Tb image within this pool, it formats it correctly. Here is the image rbd info output: https://pastebin.com/sAjnmZ4g Here is the complete kernel logs: https://pastebin.com/SNucPXZW Thanks a lot for your answer, I hope these logs can help ^^ Le ven. 8 janv. 2021 à 21:23, Ilya Dryomov <idryomov@xxxxxxxxx> a écrit : > On Fri, Jan 8, 2021 at 2:19 PM Gaël THEROND <gael.therond@xxxxxxxxxxxx> > wrote: > > > > Hi everyone! > > > > I'm facing a weird issue with one of my CEPH clusters: > > > > OS: CentOS - 8.2.2004 (Core) > > CEPH: Nautilus 14.2.11 - stable > > RBD using erasure code profile (K=3; m=2) > > > > When I want to format one of my RBD image (client side) I've got the > > following kernel messages multiple time with different sector IDs: > > > > > > *[2417011.790154] blk_update_request: I/O error, dev rbd23, sector > > 164743869184 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class > > 0[2417011.791404] rbd: rbd23: discard at objno 20110336 2490368~1703936 > > result -1 * > > > > At first I thought about a faulty disk BUT the monitoring system is not > > showing anything faulty so I decided to run manual tests on all my OSDs > to > > look at disk health using smartctl etc. > > > > None of them is marked as not healthy and actually they don't get any > > counter with faulty sectors/read or writes and the Wear Level is 99% > > > > So, the only particularity of this image is it is a 80Tb image, but it > > shouldn't be an issue as we already have that kind of image size used on > > another pool. > > > > If anyone have a clue at how I could sort this out, I'll be more than > happy > > Hi Gaël, > > What command are you running to format the image? > > Is it persistent? After the first formatting attempt fails, do the > following attempts fail too? > > Is it always the same set of sectors? > > Could you please attach the output of "rbd info" for that image and the > entire kernel log from the time that image is mapped? > > Thanks, > > Ilya > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx