Re: long blocking with writes on rbds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We are running one OSD per VM. All data is replicated across three VMs.
> 
That doesn't add up to 6 OSDs, as per your "ceph -s" output.

Yes it does. 6 VMs, 6 OSDs. Each pool is allocated to one of two 3-node sub-clusters.

AWS c3.large supposedly comes with 2 locally attached SSD storages.

It comes with as many storage device as you want to connect. We use one to boot and one for OSD.


And unless you really, REALLY require different pools, you'll much happier
with just one, or as little as possible.

Why? In your previous message, you told me that the performance hinged primarily on the amount of data, not how many pools or rbd it was divided up into. Even a 2x decrease in performance would not adequately explain the terrible performance we've seen.

The calculator and the suggestions on the documentation page suggest 512
PGs/PGPs for 6 OSDs with a replication of 3 and a target per OSD of 200
(double the "default", but a good idea for small clusters).

That's with one pool, with 2 evenly sized pools, it would be 256 per
pool and so forth.

This is very close to what we've done, see below. Performance problems persist.


Unless you have changed the crush map, a pool will the spread out amongst
all the PGs and all the OSDs. 

We have changed the crush map. Pools spread about among three of our six OSDs. I included our crush map in my first post.

With 23 pools (again, reduce this to 1 or 2 if possible) they should have
about 23 PG/PGPs per pool (if evenly sized) not 4 as your ratio up there
suggests.


I don't think so. We allocate pools gradually throughout the lifecycle of our application. With 250GB of storage per replica, a 5GB pool should be allocated a proportional number of pgs. In other words, we expect to be able to eventually have 50 pools, each sized 5Gb. Given that we have 3 OSDs per pool, and three replicas, the calculator says we should have 125 pgs total, distributed evenly over 50 pools. We rounded up to 4 pgs per pool. What's wrong with that?

Regardless, does anyone believe that this could cause the bizarre performance issues we've seen?

Your performance issues are most likely related to your platform, as in
actual OSD (SSD?) speed, network speed, things unique to AWS.

This seems highly unlikely. We get very good performance without ceph. Requisitioning and manupulating block devices through LVM happens instantaneously. We expect that ceph will be a bit slower by its distributed nature, but we've seen operations block for up to an hour, which is clearly behind the pale. Furthermore, as the performance measure I posted show, read/write speed is not the bottleneck: ceph is simply waiting.

So, does anyone else have any ideas why mkfs (and other operations) takes so long?

Jeff
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux