1, 2 times a year we are having similar problem in *not* ceph disk cluster, where working -> but slow disk writes give us slow reads. We somehow "understand it", since probably slow writes fill up queues and buffers. On Thu, Mar 9, 2023 at 11:37 AM Andrej Filipcic <andrej.filipcic@xxxxxx> wrote: > > Thanks for the hint, did run some short test, all fine. I am not sure > it's a drive issue. > > Some more digging, the file with bad performance has this segments: > > [root@afsvos01 vicepa]# hdparm --fibmap $PWD/0 > > /vicepa/0: > filesystem blocksize 4096, begins at LBA 2048; assuming 512 byte sectors. > byte_offset begin_LBA end_LBA sectors > 0 743232 2815039 2071808 > 1060765696 3733064 5838279 2105216 > 2138636288 70841232 87586575 16745344 > 10712252416 87586576 87635727 49152 > > Reading by segments: > > # dd if=0 of=/tmp/0 bs=4M status=progress count=252 > 1052770304 bytes (1.1 GB, 1004 MiB) copied, 45 s, 23.3 MB/s > 252+0 records in > 252+0 records out > > # dd if=0 of=/tmp/0 bs=4M status=progress skip=252 count=256 > 935329792 bytes (935 MB, 892 MiB) copied, 4 s, 234 MB/s > 256+0 records in > 256+0 records out > > # dd if=0 of=/tmp/0 bs=4M status=progress skip=510 > 7885291520 bytes (7.9 GB, 7.3 GiB) copied, 12 s, 657 MB/s > 2050+0 records in > 2050+0 records out > > So, 1st 1G is very slow, second segment is faster, then the rest quite > fast, and it's reproducible (dropped caches before each dd) > > Now, the rbd is 3TB with 256 pgs (EC 8+3), I checked with rados that > objects are randomly distributed on pgs, eg > > # rados --pgid 23.82 ls|grep rbd_data.20.2723bd3292f6f8 > rbd_data.20.2723bd3292f6f8.0000000000000008 > rbd_data.20.2723bd3292f6f8.000000000000000d > rbd_data.20.2723bd3292f6f8.00000000000001cb > rbd_data.20.2723bd3292f6f8.00000000000601b2 > rbd_data.20.2723bd3292f6f8.000000000009001b > rbd_data.20.2723bd3292f6f8.000000000000005b > rbd_data.20.2723bd3292f6f8.00000000000900e8 > > where object ...05b for example corresponds to the 1st block of the file > I am testing. Well, if my understanding of rbd is correct: I assume > that LBA regions are mapped to consecutive rbd objects. > > So, now I am completely confused since the slow chunk of the file is > still mapped to ~256 objects on different pgs.... > > Maybe I misunderstood the whole thing. > > Any other hints? we will still do hdd tests on all the drives.... > > Cheers, > Andrej > > On 3/6/23 20:25, Paul Mezzanini wrote: > > When I have seen behavior like this it was a dying drive. It only > became obviously when I did a smart long test and I got failed reads. > Still reported smart OK though so that was a lie. > > > > > > > > -- > > > > Paul Mezzanini > > Platform Engineer III > > Research Computing > > > > Rochester Institute of Technology > > > > > > > > > > > > > > > > > > > > ________________________________________ > > From: Andrej Filipcic<andrej.filipcic@xxxxxx> > > Sent: Monday, March 6, 2023 8:51 AM > > To: ceph-users > > Subject: rbd on EC pool with fast and extremely slow > writes/reads > > > > > > Hi, > > > > I have a problem on one of ceph clusters I do not understand. > > ceph 17.2.5 on 17 servers, 400 HDD OSDs, 10 and 25Gb/s NICs > > > > 3TB rbd image is on erasure coded 8+3 pool with 128pgs , xfs filesystem, > > 4MB objects in rbd image, mostly empy. > > > > I have created a bunch of 10G files, most of them were written with > > 1.5GB/s, few of them were really slow, ~10MB/s, a factor of 100. > > > > When reading these files back, the fast-written ones are read fast, > > ~2-2.5GB/s, the slowly-written are also extremely slow in reading, iotop > > shows between 1 and 30 MB/s reading speed. > > > > This does not happen at all on replicated images. There are some OSDs > > with higher apply/commit latency, eg 200ms, but there are no slow ops. > > > > The tests were done actually on proxmox vm with librbd, but the same > > happens with krbd, and on bare metal with mounted krbd as well. > > > > I have tried to check all OSDs for laggy drives, but they all look about > > the same. > > > > I have also copied entire image with "rados get...", object by object, > > the strange thing here is that most of objects were copied within > > 0.1-0.2s, but quite some took more than 1s. > > The cluster is quite busy with base traffic of ~1-2GB/s, so the speeds > > can vary due to that. But I would not expect a factor of 100 slowdown > > for some writes/reads with rbds. > > > > Any clues on what might be wrong or what else to check? I have another > > similar ceph cluster where everything looks fine. > > > > Best, > > Andrej > > > > -- > > _____________________________________________________________ > > prof. dr. Andrej Filipcic, E-mail:Andrej.Filipcic@xxxxxx > > Department of Experimental High Energy Physics - F9 > > Jozef Stefan Institute, Jamova 39, P.o.Box 3000 > > SI-1001 Ljubljana, Slovenia > > Tel.: +386-1-477-3674 Fax: +386-1-477-3166 > > ------------------------------------------------------------- > > _______________________________________________ > > ceph-users mailing list --ceph-users@xxxxxxx > > To unsubscribe send an email toceph-users-leave@xxxxxxx > > > -- > _____________________________________________________________ > prof. dr. Andrej Filipcic, E-mail:Andrej.Filipcic@xxxxxx > Department of Experimental High Energy Physics - F9 > Jozef Stefan Institute, Jamova 39, P.o.Box 3000 > SI-1001 Ljubljana, Slovenia > Tel.: +386-1-477-3674 Fax: +386-1-477-3166 > ------------------------------------------------------------- > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx