Re: Optimal OSD count for SSDs / NVMe disks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



First on your comment of:

"we found that during times where the cache pool flushed to
the storage pool client IO took a severe hit"

We found the same thing. http://blog.wadeit.io/ceph-cache-tier-performance-random-writes/
-- I don't claim this is a great write up, and not what a lot of folks are interested in but it is what I was after.

Great on your fio test.  However take a look at the response time.  Naturally it will increase after 4-5 concurrent writes.  Which is of course what you were saying and is correct.  However, I think we can generally accept a slightly higher response time and therefore iodepth>1 is a more real world test.  Just my thoughts. You did the right thing, and tested well. 

Some might not like it , but I like Sebastien's journal size calculation and it has served me well:
http://slides.com/sebastienhan/ceph-performance-and-benchmarking#/24

Cheers
Wade





On Thu, Feb 4, 2016 at 7:24 AM Sascha Vogt <sascha.vogt@xxxxxxxxx> wrote:
Hi,

Am 04.02.2016 um 12:59 schrieb Wade Holler:
> You referenced parallel writes for journal and data. Which is default
> for btrfs but but XFS. Now you are mentioning multiple parallel writes
> to the drive , which of course yes will occur.
Ah, that is good to know. So if I want to create more "parallelism" I
should use btrfs then. Thanks a lot, that's a very critical bit of
information :)

> Also Our Dell 400 Gb NVMe drives do not top out around 5-7 sequential
> writes as you mentioned. That would be 5-7 random writes from a drives
> perspective and the NVMe drives can do many times that.
Hm, I used the following fio bench from [1]:
fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test

Our disks showed the following bandwidths: (#<no> is the numjobs
paramenter):

#1: write: io=1992.2MB, bw=33997KB/s, iops=8499
#2: write: io=5621.6MB, bw=95940KB/s, iops=23984
#3: write: io=8062.8MB, bw=137602KB/s, iops=34400
#4: write: io=9114.1MB, bw=155545KB/s, iops=38886
#5: write: io=8860.7MB, bw=151169KB/s, iops=37792

Also for more jobs (tried up to 8) bandwidth stayed at around 150MB/s
and around 37k iops. So I figured that around 5 should be the sweet spot
in terms for journals on a single disk.

> I would park it at 5-6 partitions per NVMe , journal on the same disk.
> Frequently I want more concurrent operations , rather than all out
> throughput.
For journal on the same partition, should I limit the size of the
journal size? If yes, what should be the limit? Rather large or rather
small?

Greetings
-Sascha-

[1]http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux