sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 82 66 263.704 264 0.034511 0.157596
2 16 194 178 355.768 448 0.144852 0.16572
3 16 298 282 375.816 416 0.075267 0.15512
4 16 419 403 402.835 484 0.073001 0.151483
5 15 531 516 412.653 452 0.05382 0.153122
6 16 652 636 423.861 480 0.045246 0.141938
7 16 776 760 434.154 496 0.094384 0.1461
8 16 869 853 426.377 372 0.055912 0.138176
Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects
Object prefix: benchmark_data_node-02_45166
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 60 44 175.943 176 0.32255 0.254007
2 16 105 89 177.954 180 0.172481 0.225552
3 16 168 152 202.622 252 0.192577 0.305603
4 16 223 207 206.958 220 0.353051 0.29058
5 15 263 248 198.362 164 0.330949 0.293684
6 16 307 291 193.964 172 0.192487 0.289606
7 16 354 338 193.108 188 0.288342 0.27043
8 16 393 377 188.465 156 0.388039 0.327652
9 16 423 407 180.855 120 0.309964 0.337049
10 16 481 465 185.967 232 0.090552 0.32017
11 15 482 467 169.788 8 0.053557 0.319134
12 15 482 467 155.639 0 - 0.319134
ceph@node-02:/home/leni$ rados -p test bench 10 seq
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 51 35 139.942 140 0.37242 0.308696
2 16 89 73 144.524 152 0.331963 0.386161
3 16 126 110 145.666 148 0.59192 0.40105
4 16 161 145 144.249 140 0.170732 0.40284
5 16 191 175 139.414 120 0.516386 0.42865
6 16 223 207 137.515 128 0.333633 0.444945
7 16 251 235 133.791 112 0.521015 0.453296
8 15 286 271 135.056 144 0.188018 0.456516
9 16 315 299 132.499 112 1.29282 0.465136
10 16 342 326 129.75 108 0.889321 0.476148
Total time run: 10.260278
Total reads made: 342
Read size: 4194304
Bandwidth (MB/sec): 133.330
Average Latency: 0.47688
Max latency: 1.69694
Min latency: 0.028467
2014-10-22 14:20:59.142970 7ff415d7a700 0 -- 172.100.0.25:6800/44707 submit_message osd_op_reply(321 benchmark_data_node-02_13320_object320 [write 0~4194304] v9387'2846 uv2846 _ondisk_ = 0) v6 remote, 172.100.0.22:0/1013320, failed lossy con, dropping message 0x2876900
2014-10-22 14:20:59.143126 7ff415d7a700 0 -- 172.100.0.25:6800/44707 submit_message osd_op_reply(322 benchmark_data_node-02_13320_object321 [write 0~4194304] v9387'3227 uv3227 _ondisk_ = 0) v6 remote, 172.100.0.22:0/1013320, failed lossy con, dropping message 0x17c40c80
When i connect node-05 osded the cluster is so slow that we cannot use any services hosted on it. We had only 128 pgs for our pool and adding even one osd made our cluster "unworkable", so we extended it to 1024 pgs. But we have now other problems aith adding new hosts.
On Mon, 20 Oct 2014 11:07:43 +0200 Leszek Master wrote:
> 1) If i want to use cache tier should i use it with ssd journaling or i
> can get better perfomance using more ssd GB for cache tier?
>
From reading what others on this ML experienced and what Robert already
pointed out, cache tiering is definitely too unpolished at this point in
time and not particular helpful. Given the right changes and more tuning
abilities I'd expect it to be useful in the future (1-2 releases out
maybe?) though.
> 2) I've got cluster made of 26x900GB SAS disk with ssd journaling. The
> placement groups i've got is 1024. When i add new osd to cluster, my VMs
> get io errors and got stuck even if i had osd_max_backfills set to 1. If
> i change pgs from 1024 to 4096 would it get less affected by backfilling
> and recovery?
>
You're not telling us enough about your cluster by far, starting with
Ceph and OS/kernel versions.
What are you storage nodes like (all the specs, cpu, memory, network,
what type of SSDs, journal to OSD ratio, etc.)?
If your replica size is 2 (risky!) then your PG and PGP count should be
2048, with a replica of 3 your current number is fine when it comes to the
formula but it might still be better for data distribution at 2048 as well.
But changing those values from what you have already should have little
effect on your data-migration impact, as in the end the same amount of
data needs to be moved if an OSD is added or lost and your current PG
count isn't horribly wrong.
If your cluster is running close to capacity (monitor with atop!) during
normal usage and with all the tunables already set to lowest impact your
only way forward is to address its shortcomings, whatever they are (CPU,
IOPS, etc).
Too high (way too high usually) PG counts will cost you in performance due
to CPU resource exhaustion caused by Ceph internal locking/protocol
overhead.
Too little PGs on the other hand will not only cause uneven data
distribution but ALSO cost you in performance as the same cause is prone
to creating hotspots.
> 3) When i was adding my last 6 drives to a cluster i've noticed that the
> recovery speed had gone from 500-1000MB/s to 10-50 MB/s. When i restarted
> the osd that i was adding the transfers got back to normal. Also i've
> noticed that when i then do rados benchmark i've got dropping transfers
> to 0 MB/s even few times a row. The restarting osdes that i was adding
> or one by one that was already in cluster solved the problem. What can
> it be ? In the logs there isn't anything weird. The whole cluster stucks
> till i restart or even recreate journals on them. How to solve this ?
>
That is very odd, maybe some of the Ceph developers have an idea or
recollection of seeing this before.
In general you will want to monitor all your cluster nodes with something
like atop in a situation like this to spot potential problems like slow
disks, CPU or network starvation, etc.
Christian
--
> Please help me.
>
> Best regards !
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com