First on your comment of:
"we found that during times where the cache pool flushed to
the storage pool client IO took a severe hit"
the storage pool client IO took a severe hit"
We found the same thing. http://blog.wadeit.io/ceph-cache-tier-performance-random-writes/
-- I don't claim this is a great write up, and not what a lot of folks are interested in but it is what I was after.
Great on your fio test. However take a look at the response time. Naturally it will increase after 4-5 concurrent writes. Which is of course what you were saying and is correct. However, I think we can generally accept a slightly higher response time and therefore iodepth>1 is a more real world test. Just my thoughts. You did the right thing, and tested well.
Some might not like it , but I like Sebastien's journal size calculation and it has served me well:
Cheers
Wade
On Thu, Feb 4, 2016 at 7:24 AM Sascha Vogt <sascha.vogt@xxxxxxxxx> wrote:
Hi,
Am 04.02.2016 um 12:59 schrieb Wade Holler:
> You referenced parallel writes for journal and data. Which is default
> for btrfs but but XFS. Now you are mentioning multiple parallel writes
> to the drive , which of course yes will occur.
Ah, that is good to know. So if I want to create more "parallelism" I
should use btrfs then. Thanks a lot, that's a very critical bit of
information :)
> Also Our Dell 400 Gb NVMe drives do not top out around 5-7 sequential
> writes as you mentioned. That would be 5-7 random writes from a drives
> perspective and the NVMe drives can do many times that.
Hm, I used the following fio bench from [1]:
fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
Our disks showed the following bandwidths: (#<no> is the numjobs
paramenter):
#1: write: io=1992.2MB, bw=33997KB/s, iops=8499
#2: write: io=5621.6MB, bw=95940KB/s, iops=23984
#3: write: io=8062.8MB, bw=137602KB/s, iops=34400
#4: write: io=9114.1MB, bw=155545KB/s, iops=38886
#5: write: io=8860.7MB, bw=151169KB/s, iops=37792
Also for more jobs (tried up to 8) bandwidth stayed at around 150MB/s
and around 37k iops. So I figured that around 5 should be the sweet spot
in terms for journals on a single disk.
> I would park it at 5-6 partitions per NVMe , journal on the same disk.
> Frequently I want more concurrent operations , rather than all out
> throughput.
For journal on the same partition, should I limit the size of the
journal size? If yes, what should be the limit? Rather large or rather
small?
Greetings
-Sascha-
[1]http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com