Re: Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Why would you still be using journals when running fully OSDs on SSDs?
When using a journal the data is first written to a journal, and then that same data is (later on) written again to disk. This in the assumption that the time to write the journal is only a fraction of the time it costs to write to disk. And since writing data to stable storage in on the critical path, the journal brings an advantage. Now when the disk is already on SSD, I see very little difference in writing the data directly to disk en forgo the journal.

There are advantages to a 2 phase commit approach, without a journal a write could fail half way through with some but not all data written leading to integrity issues. Also note the the journal writes are done sequentially at the block level which should be faster than flushing to filesystem.


--------------------------------------------------
From: "Willem Jan Withagen" <wjw@xxxxxxxxxxx>
Sent: Sunday, January 08, 2017 1:47 PM
To: "Lionel Bouton" <lionel-subscription@xxxxxxxxxxx>; "kevin parrikar" <kevin.parker092@xxxxxxxxx>
Cc: <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

On 7-1-2017 15:03, Lionel Bouton wrote:
Le 07/01/2017 à 14:11, kevin parrikar a écrit :
Thanks for your valuable input.
We were using these SSD in our NAS box(synology)  and it was giving
13k iops for our fileserver in raid1.We had a few spare disks which we
added to our ceph nodes hoping that it will give good performance same
as that of NAS box.(i am not comparing NAS with ceph ,just the reason
why we decided to use these SSD)

We dont have S3520 or S3610 at the moment but can order one of these
to see how it performs in ceph .We have 4xS3500  80Gb handy.
If i create a 2 node cluster with 2xS3500 each and with replica of
2,do you think it can deliver 24MB/s of 4k writes .

Probably not. See
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

According to the page above the DC S3500 reaches 39MB/s. Its capacity
isn't specified, yours are 80GB only which is the lowest capacity I'm
aware of and for all DC models I know of the speed goes down with the
capacity so you probably will get lower than that.
If you put both data and journal on the same device you cut your
bandwidth in half : so this would give you an average <20MB/s per OSD
(with occasional peaks above that if you don't have a sustained 20MB/s).
With 4 OSDs and size=2, your total write bandwidth is <40MB/s. For a
single stream of data you will only get <20MB/s though (you won't
benefit from parallel writes to the 4 OSDs and will only write on 2 at a
time).

I'm new to this part of tuning ceph, but I do have an architectual
discussion:

Why would you still be using journals when running fully OSDs on SSDs?

When using a journal the data is first written to a journal, and then
that same data is (later on) written again to disk.
This in the assumption that the time to write the journal is only a
fraction of the time it costs to write to disk. And since writing data
to stable storage in on the critical path, the journal brings an advantage.

Now when the disk is already on SSD, I see very little difference in
writing the data directly to disk en forgo the journal.
I would imagine that not using journals would cut writing time in have
because the data is only written once. There is no loss of bandwidth on
the SSD, and internally the SSD does not have to manage double the
amount erase cycles in garbage collection once the SDD comes close to
being fully used.

The only thing I can imagine that makes a difference is that journal
writing is slightly faster than writing data into the FS that is used
for the disk. But that should not be such a major extra cost that it
warrants all the other disadvantages.

--WjW

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux