Re: SSD Journal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jan, I believe the block device (vs. filesystem) OSD layout is addressed in the Newstore/Bluestore: 

http://tracker.ceph.com/projects/ceph/wiki/NewStore_(new_osd_backend)

--
Alex Gorbachev
Storcium

On Thu, Jan 28, 2016 at 4:32 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
You can't run Ceph OSD without a journal. The journal is always there.
If you don't have a journal partition then there's a "journal" file on the OSD filesystem that does the same thing. If it's a partition then this file turns into a symlink.

You will always be better off with a journal on a separate partition because of the way writeback cache in linux works (someone correct me if I'm wrong).
The journal needs to flush to disk quite often, and linux is not always able to flush only the journal data. You can't defer metadata flushing forever and also doing fsync() makes all the dirty data flush as well. ext2/3/4 also flushes data to the filesystem periodicaly (5s is it I think?) which will make the latency of the journal go through the roof momentarily.
(I'll leave researching how exactly XFS does it to those who care about that "filesystem'o'thing").

P.S. I feel very strongly that this whole concept is broken fundamentaly. We already have a journal for the filesystem which is time proven, well behaved and above all fast. Instead there's this reinvented wheel which supposedly does it better in userspace while not really avoiding the filesystem journal either. It would maybe make sense if OSD was storing the data on a block device directly, avoiding the filesystem altogether. But it would still do the same bloody thing and (no disrespect) ext4 does this better than Ceph ever will.


On 28 Jan 2016, at 20:01, Tyler Bishop <tyler.bishop@xxxxxxxxxxxxxxxxx> wrote:

This is an interesting topic that i've been waiting for.

Right now we run the journal as a partition on the data disk.  I've build drives without journals and the write performance seems okay but random io performance is poor in comparison to what it should be.

 

 http://static.beyondhosting.net/img/bh-small.png
Tyler Bishop
Chief Technical Officer
513-299-7108 x10

If you are not the intended recipient of this transmission you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

 



From: "Bill WONG" <wongahshuen@xxxxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Thursday, January 28, 2016 1:36:01 PM
Subject: [ceph-users] SSD Journal

Hi,
i have tested with SSD Journal with SATA, it works perfectly.. now, i am testing with full SSD ceph cluster, now with full SSD ceph cluster, do i still need to have SSD as journal disk? 

[ assumed i do not have PCIe SSD Flash which is better performance than normal SSD disk]

please give some ideas on full ssd ceph cluster ... thank you!

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux