Re: Deep-scrub much slower than HDD speed

Marc <Marc@xxxxxxxxxxxxxxxxx> · Thu, 27 Apr 2023 08:47:40 +0000

> >
> > > The question you should ask yourself, why you want to
> > change/investigate this?
> >
> > Because if scrubbing takes 10x longer thrashing seeks, my scrubs never
> > finish in time (the default is 1 week).
> > I end with e.g.
> >
> > > 267 pgs not deep-scrubbed in time
> >
> > On a 38 TB cluster, if you scrub 8 MB/s on 10 disks (using only
> numbers
> > already divided by replication factor), you need 55 days to scrub it
> > once.
> >
> > That's 8x larger than the default scrub factor, so I'll get warnings
> and
> > my risk of data degradation increases.
> >
> > Also, even if I set the default scrub interval to 8x larger, it my
> disks
> > will still be thrashing seeks 100% of the time, affecting the
> cluster's
> > throughput and latency performance.
> >
> 
> Oh I get it. Interesting. I think if you will expand the cluster in the
> future with more disks you will spread the load have more iops, this
> will disappear.
> I am not sure if you will be able to fix this other than to increase the
> scrub interval. If you are sure it is nothing related to hardware.
> 
> For you reference I have included how my disk io / performance looks
> like when I issue a deep-scrub. You can see it reads 2 disks here at
> ~70MB/s and the atop shows it is at 100% load. Nothing more you can do
> here.
> 
> #ceph osd pool ls detail
> #ceph pg ls | grep '^53'
> #ceph osd tree
> #ceph pg deep-scrub 53.38
> #dstat -D sdd,sde
> 
> 
> [@~]# dstat -d -D sdd,sde,sdj
> --dsk/sdd-----dsk/sde-----dsk/sdj--
>  read  writ: read  writ: read  writ
> 2493k  177k:5086k  316k:5352k  422k
>   70M    0 :  89M    0 :   0    68k
>   78M    0 :  59M    0 :   0     0
>   68M    0 :  68M    0 :   0    28k
>   90M 4096B:  90M   80k:4096B   24k
>   76M    0 :  78M    0 :   0    12k
>   66M    0 :  64M    0 :   0    12k
>   70M    0 :  80M    0 :4096B   52k
>   77M    0 :  70M    0 :   0     0
> 
> atop:
> |
> DSK |          sdd  | busy     97%  | read    1462  | write      4  |
> KiB/r    469  | KiB/w      5  | MBr/s   67.0  | MBw/s    0.0  | avq
> 1.01  | avio 6.59 ms  |
> DSK |          sde  | busy     64%  | read    1472  | write      4  |
> KiB/r    465  | KiB/w      6  | MBr/s   67.0  | MBw/s    0.0  | avq
> 1.01  | avio 4.32 ms  |
> DSK |          sdb  | busy      1%  | read       0  | write     82  |
> KiB/r      0  | KiB/w      9  | MBr/s    0.0  | MBw/s    0.1  | avq
> 1.30  | avio 1.29 ms  |
> 
> 

I did this on a pool with larger archived objects, when doing this on a filesystem with repo copies (rpm files) this performance is already dropping.

DSK |          sdh  | busy     86%  | read    1875  | write     26  | KiB/r    254  | KiB/w      4  | MBr/s   46.7  | MBw/s    0.0  | avq     1.59  | avio 4.50 ms
DSK |          sdd  | busy     79%  | read    1598  | write     63  | KiB/r    245  | KiB/w     16  | MBr/s   38.4  | MBw/s    0.1  | avq     1.89  | avio 4.77 ms
DSK |          sdf  | busy     33%  | read    1383  | write    139  | KiB/r    357  | KiB/w      7  | MBr/s   48.3  | MBw/s    0.1  | avq     1.14  | avio 2.20 ms

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx