Re: rbd on EC pool with fast and extremely slow writes/reads

Rok Jaklič <rjaklic@xxxxxxxxx> · Tue, 14 Mar 2023 11:50:40 +0100

1, 2 times a year we are having similar problem in *not* ceph disk cluster,
where working -> but slow disk writes give us slow reads. We somehow
"understand it", since probably slow writes fill up queues and buffers.

On Thu, Mar 9, 2023 at 11:37 AM Andrej Filipcic <andrej.filipcic@xxxxxx>
wrote:

>
> Thanks for the hint, did run some short test, all fine. I am not sure
> it's a drive issue.
>
> Some more digging, the file with bad performance has this segments:
>
> [root@afsvos01 vicepa]# hdparm --fibmap $PWD/0
>
> /vicepa/0:
> filesystem blocksize 4096, begins at LBA 2048; assuming 512 byte sectors.
> byte_offset  begin_LBA    end_LBA    sectors
>            0     743232    2815039    2071808
>   1060765696    3733064    5838279    2105216
>   2138636288   70841232   87586575   16745344
> 10712252416   87586576   87635727      49152
>
> Reading by segments:
>
> # dd if=0 of=/tmp/0 bs=4M status=progress count=252
> 1052770304 bytes (1.1 GB, 1004 MiB) copied, 45 s, 23.3 MB/s
> 252+0 records in
> 252+0 records out
>
> # dd if=0 of=/tmp/0 bs=4M status=progress skip=252 count=256
> 935329792 bytes (935 MB, 892 MiB) copied, 4 s, 234 MB/s
> 256+0 records in
> 256+0 records out
>
> # dd if=0 of=/tmp/0 bs=4M status=progress skip=510
> 7885291520 bytes (7.9 GB, 7.3 GiB) copied, 12 s, 657 MB/s
> 2050+0 records in
> 2050+0 records out
>
> So, 1st 1G is very slow, second segment is faster, then the rest quite
> fast, and it's reproducible (dropped caches before each dd)
>
> Now, the rbd is 3TB with 256 pgs (EC 8+3), I checked with rados that
> objects are randomly distributed on pgs, eg
>
> # rados --pgid 23.82 ls|grep rbd_data.20.2723bd3292f6f8
> rbd_data.20.2723bd3292f6f8.0000000000000008
> rbd_data.20.2723bd3292f6f8.000000000000000d
> rbd_data.20.2723bd3292f6f8.00000000000001cb
> rbd_data.20.2723bd3292f6f8.00000000000601b2
> rbd_data.20.2723bd3292f6f8.000000000009001b
> rbd_data.20.2723bd3292f6f8.000000000000005b
> rbd_data.20.2723bd3292f6f8.00000000000900e8
>
> where object ...05b for example corresponds to the 1st block of the file
> I am testing. Well, if my understanding of rbd  is correct: I assume
> that LBA regions are mapped to consecutive rbd objects.
>
> So, now I am completely confused since the slow chunk of the file is
> still mapped to ~256 objects on different pgs....
>
> Maybe I misunderstood the whole thing.
>
> Any other hints? we will still do hdd tests on all the drives....
>
> Cheers,
> Andrej
>
> On 3/6/23 20:25, Paul Mezzanini wrote:
> > When I have seen behavior like this it was a dying drive.  It only
> became obviously when I did a smart long test and I got failed reads.
> Still reported smart OK though so that was a lie.
> >
> >
> >
> > --
> >
> > Paul Mezzanini
> > Platform Engineer III
> > Research Computing
> >
> > Rochester Institute of Technology
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ________________________________________
> > From: Andrej Filipcic<andrej.filipcic@xxxxxx>
> > Sent: Monday, March 6, 2023 8:51 AM
> > To: ceph-users
> > Subject:  rbd on EC pool with fast and extremely slow
> writes/reads
> >
> >
> > Hi,
> >
> > I have a problem on one of ceph clusters I do not understand.
> > ceph 17.2.5 on 17 servers, 400 HDD OSDs, 10 and 25Gb/s NICs
> >
> > 3TB rbd image is on erasure coded 8+3 pool with 128pgs , xfs filesystem,
> > 4MB objects in rbd image, mostly empy.
> >
> > I have created a bunch of 10G files, most of them were written with
> > 1.5GB/s, few of them were really slow, ~10MB/s, a factor of 100.
> >
> > When reading these files back, the fast-written ones are read fast,
> > ~2-2.5GB/s, the slowly-written are also extremely slow in reading, iotop
> > shows between 1 and 30 MB/s reading speed.
> >
> > This does not happen at all on replicated images. There are some OSDs
> > with higher apply/commit latency, eg 200ms, but there are no slow ops.
> >
> > The tests were done actually on proxmox vm with librbd, but the same
> > happens with krbd, and on bare metal with mounted krbd as well.
> >
> > I have tried to check all OSDs for laggy drives, but they all look about
> > the same.
> >
> > I have also copied entire image with "rados get...", object by object,
> > the strange thing here is that most of objects were copied within
> > 0.1-0.2s, but quite some took more than 1s.
> > The cluster is quite busy with base traffic of ~1-2GB/s, so the speeds
> > can vary due to that. But I would not expect a factor of 100 slowdown
> > for some writes/reads with rbds.
> >
> > Any clues on what might be wrong or what else to check? I have another
> > similar ceph cluster where everything looks fine.
> >
> > Best,
> > Andrej
> >
> > --
> > _____________________________________________________________
> >      prof. dr. Andrej Filipcic,   E-mail:Andrej.Filipcic@xxxxxx
> >      Department of Experimental High Energy Physics - F9
> >      Jozef Stefan Institute, Jamova 39, P.o.Box 3000
> >      SI-1001 Ljubljana, Slovenia
> >      Tel.: +386-1-477-3674    Fax: +386-1-477-3166
> > -------------------------------------------------------------
> > _______________________________________________
> > ceph-users mailing list --ceph-users@xxxxxxx
> > To unsubscribe send an email toceph-users-leave@xxxxxxx
>
>
> --
> _____________________________________________________________
>     prof. dr. Andrej Filipcic,   E-mail:Andrej.Filipcic@xxxxxx
>     Department of Experimental High Energy Physics - F9
>     Jozef Stefan Institute, Jamova 39, P.o.Box 3000
>     SI-1001 Ljubljana, Slovenia
>     Tel.: +386-1-477-3674    Fax: +386-1-477-3166
> -------------------------------------------------------------
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx