Re: optimizing recovery throughput

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



HI,

On 07/19/13 07:16, Dan van der Ster wrote:
and that gives me something like this:

2013-07-18 21:22:56.546094 mon.0 128.142.142.156:6789/0 27984 : [INF]
pgmap v112308: 9464 pgs: 8129 active+clean, 398
active+remapped+wait_backfill, 3 active+recovery_wait, 933
active+remapped+backfilling, 1 active+clean+scrubbing; 15994
  GB data, 55567 GB used, 1380 TB / 1434 TB avail; 11982626/151538728
degraded (7.907%);  recovering 299 o/s, 114MB/s

but immediately I start to see slow requests piling up. Trying with
the different combinations, I found that it's the "max active = 10"
setting that leads to the slow requests. With a 20/5 setting, there
are no slow requests, but the recovery rate doesn't increase anyway.

So I'm wondering if you all agree that this indicates that the 10/5
setting for backfill/max active is already the limit for our cluster,
at least with this current set of test objects we have? Or am I
missing another option that should be tweaked to get more recovery
throughput?

this mostly looks like a 1Gb ethernet cap (114MB is 912Mb), its what I get with my small 2-nodes, 6 drives (SSD journals) cluster with a 1Gb/s cluster link, so you should have more out a 10Gb/s network. When I have more, its because of host-to-host moves; when I have less, its because of client's load.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux