Re: FreeBSD port net/ceph-devel released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4-4-2017 21:05, Gregory Farnum wrote:
> [ Sorry for the empty email there. :o ]
> 
> On Tue, Apr 4, 2017 at 12:28 PM, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>> On Sat, Apr 1, 2017 at 4:58 PM, Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote:
>>> On 1-4-2017 21:59, Wido den Hollander wrote:
>>>>
>>>>> Op 31 maart 2017 om 19:15 schreef Willem Jan Withagen <wjw@xxxxxxxxxxx>:
>>>>>
>>>>>
>>>>> On 31-3-2017 17:32, Wido den Hollander wrote:
>>>>>> Hi Willem Jan,
>>>>>>
>>>>>>> Op 30 maart 2017 om 13:56 schreef Willem Jan Withagen
>>>>>>> <wjw@xxxxxxxxxxx>:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm pleased to announce that my efforts to port to FreeBSD have
>>>>>>> resulted in a ceph-devel port commit in the ports tree.
>>>>>>>
>>>>>>> https://www.freshports.org/net/ceph-devel/
>>>>>>>
>>>>>>
>>>>>> Awesome work! I don't touch FreeBSD that much, but I can imagine that
>>>>>> people want this.
>>>>>>
>>>>>> Out of curiosity, does this run on ZFS under FreeBSD? Or what
>>>>>> Filesystem would you use behind FileStore with this? Or does
>>>>>> BlueStore work?
>>>>>
>>>>> Since I'm a huge ZFS fan, that is what I run it on.
>>>>
>>>> Cool! The ZIL, ARC and L2ARC can actually make that very fast. Interesting!
>>>
>>> Right, ZIL is magic, and more or equal to the journal now used with OSDs
>>> for exactly the same reason. Sad thing is that a write is now 3*
>>> journaled: 1* by Ceph, and 2* by ZFS. Which means that the used
>>> bandwidth to the SSDs is double of what it could be.
>>>
>>> Had some discussion about this, but disabling the Ceph journal is not
>>> just setting an option. Although I would like to test performance of an
>>> OSD with just the ZFS journal. But I expect that the OSD journal is
>>> rather firmly integrated.
>>
>> Disabling the OSD journal will never be viable. The journal is also
>> necessary for transactions and batch updates which cannot be done
>> atomically in FileStore.
> 
> To expand on Patrick's statement: You shouldn't get confused by the
> presence of options to disable journaling. They exist but only work on
> btrfs-backed FileStores and are *not* performant. You could do the
> same on zfs, but in order to provide the guarantees of the RADOS
> protocol, when in that mode the OSD just holds replies on all
> operations until it knows they've been persisted to disk and
> snapshotted, then sends back a commit. You can probably imagine the
> horrible IO patterns and bursty application throughput that result.

When I talked about this with Sage in CERN, I got the same answer. So
this is at least consistent. ;-)

And I have to admit that I do not understand the intricate details of
this part of Ceph. So at the moment I'm looking at it from a more global
view

What, i guess, needs to be done, is to get ride of at least one of the
SSD writes.
Which is possible by mounting the journal disk as a separate VDEV (2
SSDs in mirror) and get the max speed out of this.
Problem with this all is that the number of SSDs sort of blows up, and
very likely there is a lot of waste because the journals need not be
very large.

And yes the other way would be to do BlueStore on ZVOL, where the
underlying VDEVs are carefully crafted. But first we need to get AIO
working. And I have not (yet) looked at that at all...

First objective was to get a port of any sorts, which I did last week.
Second is to take Luminous and make a "stable" port which is less of a
moving target.
Only then AIO is on the radar....

--WjW

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux