+ceph users Hi, Here is first cut result. I can only manage 128TB box for now.
Summary : ------------ 1. First scenario, only 4 node scenario and since it is chassis level replication single node remaining on the chassis taking all the traffic. It seems that is a bottleneck as for the host level replication on the similar setup recovery
time is much less (data is not in this table). 2. In the second scenario , I kept everything else same but doubled the node/chassis. Recovery time is also half. 3. For the third scenario, increased cluster data and also now I have doubled the number of OSDs in the cluster (since each drive size is 4TB now). Recovery time came down further. 4. Moved to Jewel keeping everything else same, got further improvement. Mostly because of improved write performance in jewel (?). 5. Last scenario is interesting. I got improved recovery speed than any other scenario with this WPQ. Degraded PG % came down to 2% within 3 hours , ~0.6% within 4 hours and 15 min , but
last 0.6% took ~4 hours hurting overall time for recovery. 6. In fact, this long tail latency is hurting the overall recovery time for every other scenarios. Related tracker I found is
http://tracker.ceph.com/issues/15763 Any feedback much appreciated. We can discuss this in tomorrow’s performance call if needed. Thanks & Regards Somnath -----Original Message----- Thanks Mark, I will come back to you with some data on that. This is what I am planning to run. 1. One 2X IF150 chassis with 256 TB flash each and total 8 node cluster (4 servers on each). Will generate ~100TB of data on the cluster. 2. Will go for host and chassis level replication with 2 copies. 3. Heavy IO will be on (different block sizes 60% RW and 40% RR) Hammer took me ~4 hours to complete recovery for a host level replication and single host down. ~12 hours when single host down with chassis level replication. Bear with me till I find all the HW for this :-) Let me know if you guys want to add something here.. Regards Somnath -----Original Message----- From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] Sent: Wednesday, May 04, 2016 8:40 AM To: Somnath Roy; Nick Fisk; Ben England; Kyle Bader Cc: Sage Weil; Samuel Just Subject: Weighted Priority Queue testing Hi Guys, I think all of you have expressed some interest in recovery testing either now or in the past, so I wanted to get folks together to talk.
We need to get the new weighted priority queue tested to: a) see when/how it's breaking b) hopefully see better behavior It's available in Jewel through a simple ceph.conf change: osd_op_queue = wpq For those of you who have run cbt recovery tests in the past, it might be worth running some new stress tests comparing: a) jewel + wpq b) jewel + prio queue c) hammer In the past I've done this under various concurrent client workloads (say large sequential or small random writes). I think Kyle has done quite a bit of this kind of testing in the recent past with Intel as well, so he might have some
insights as to where we've been hurting recently. Thanks, Mark |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com