Re: Optimal OSD count for SSDs / NVMe disks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Once we put in our cache tier the I/O on the spindles was so low, we
just moved the journals off the SSDs onto the spindles and left the
SSD space for cache. There have been testing showing that better
performance can be achieved by putting more OSDs on an NVMe disk, but
you also have to balance that with OSDs not being evenly distributed
so some OSDs will use more space than others. I probably wouldn't go
more than 4 100 GB partitions, but it really depends on the number of
PGs and your data distribution. Also, even with all the data in the
cache, there is still a performance penalty for having the caching
tier vs. a native SSD pool. So if you are not using the tiering, move
to a straight SSD pool.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Feb 3, 2016 at 5:01 AM, Sascha Vogt  wrote:
> Hi all,
>
> we recently tried adding a cache tier to our ceph cluster. We had 5
> spinning disks per hosts with a single journal NVMe disk, hosting the 5
> journals (1 OSD per spinning disk). We have 4 hosts up to now, so
> overall 4 NVMes hosting 20 journals for 20 spinning disks.
>
> As we had some space left on the NVMes so we made two additional
> partitions on each NVMe and created a 4 OSD cache tier.
>
> To our surprise the 4 OSD cache pool was able to deliver the same
> performance then the previous 20 OSD pool while reducing the OPs on the
> spinning disk to zero as long as the cache pool was sufficient to hold
> all / most data (ceph is used for very short living KVM virtual machines
> which do pretty heavy disk IO).
>
> As we don't need that much more storage right now we decided to extend
> our cluster by adding 8 additional NVMe disks solely as a cache pool and
> freeing the journal NVMes again. Now the question is: How to organize
> the OSDs on the NVMe disks (2 per host)?
>
> As the NVMes peak around 5-7 concurrent sequential writes (tested with
> fio) I thought about using 5 OSDs per NVMe. That would mean 10
> partitions (5 journals, 5 data). On the other hand the NVMes are only
> 400GB large, so that would result in OSD disk sizes for <80 GB
> (depending on the journal size).
>
> Would it make sense to skip the separate Journal partition and leave the
> journal on the data disk itself and limitting it to a rather small
> amount (lets say 1 GB or even less?) as SSDs typically don't like
> sequential writes anyway?
>
> Or, if I leave journal and data on separate partitions should I reduce
> the number of OSDs per disk to 3 as Ceph will most likly write to
> journal and data in parallel and I therefore already get 6 parallel
> "threads" of IO?
>
> Any feedback is highly appreciated :)
>
> Greetings
> -Sascha-
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.3.4
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWspEZCRDmVDuy+mK58QAA3/0P/3ZXZ+VZxKqQmDkw178R
mgPzRnBrUavzjC4UI+CSg2q2xtcU0hhqW1htz/EXnd9Ou7pMUP5pG+FpInmw
aOAjBqVGjVsxauQlbPeSmw2h+E0BfbRMp3YnFeI8Lx/OKBvpXbm1XDJFZ7PK
4EWI9QLpXwF0inb9qgVU9qwmsT1ZJYSHe3P9F+nue1QQhDijdIjCZ8PzHWK6
02rnuHHMynfA+J9JN05Uy9M5qynHleO6LPeoFwEfzq1S+VOFz/HMNRm5Sua4
u4EwZAhDKGBZ1F01+HMQdwYBshVf87YahPqRuvE9dL3MFR6v0loMhNDikDpD
nbwtHsS3cR1Ti6CU+SJniXxYSjiYOyWwXIwGMn6xVl0VkcRBrt/o8fonIe6o
Zdb/8+1Jo7Z26NjBsyZ0sNv2kBlhJmlElj0ANEtwScDL7tcVhXNt97BFvJbF
aDpTpBvSWcipEOdlPEMN5rgeIYJRWu6A/w925cd5mXgqD5p98IKdkh7nc9OE
JbiNe4Aw4FeLqF6EqKx/pYxUucSW0GwS8K9nlQFcz53UmqenbISGy4C699Lx
unxCAewFCLfQFztiLhoHntwyyQTUX+wpERURGv76asP9M3RxDHFyWrZMBw65
skeWf5PNu2kiMS7RDYWmm12tvIbi+8w/xib/VwTmjxNf4MtDb2qfTb72ssbh
2Xn2
=gfXk
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux