NAS on RBD

ilya.dryomov@xxxxxxxxxxx (Ilya Dryomov) · Tue, 9 Sep 2014 12:44:25 +0400

On Tue, Sep 9, 2014 at 12:33 PM, Christian Balzer <chibi at gol.com> wrote:
>
> Hello,
>
> On Tue, 9 Sep 2014 17:05:03 +1000 Blair Bethwaite wrote:
>
>> Hi folks,
>>
>> In lieu of a prod ready Cephfs I'm wondering what others in the user
>> community are doing for file-serving out of Ceph clusters (if at all)?
>>
>> We're just about to build a pretty large cluster - 2PB for file-based
>> NAS and another 0.5PB rgw. For the rgw component we plan to dip our
>> toes in and use an EC backing pool with a ~25TB (usable) 10K SAS + SSD
>> cache tier.
>>
>> For the file storage we're looking at mounting RBDs (out of a standard
>> 3-replica pool for now) on a collection of presentation nodes, which
>> will use ZFS to stripe together those RBD vdevs into a zpool which we
>> can then carve datasets out of for access from NFS & CIFS clients.
>> Those presentation servers will have some PCIe SSD in them for ZIL and
>> L2ARC devices, and clients will be split across them depending on what
>> ID domain they are coming from. Presentation server availability
>> issues will be handled by mounting the relevant zpool on a spare
>> server, so it won't be HA from a client perspective, but I can't see a
>> way to getting this with an RBD backend.
>>
>> Wondering what the collective wisdom has to offer on such a setup...
>>
> I have nearly no experience with ZFS, but I'm wondering why you'd pool
> things on the level when Ceph is already supplying a redundant and
> resizeable block device.
>
> Wanting to use ZFS because of checksumming, which is sorely missing in
> Ceph, I can understand.
>
> Using a CoW filesystem on top of RBD might not be a great idea either,
> since it is sparsely allocated, performance is likely to be bad until all
> "blocks" have been actually allocated. Maybe somebody with experience in
> that can pipe up.
>
> Something that ties into the previous point, kernel based RBD currently
> does not support TRIM, so even if you were to use something other than
> ZFS, you'd never be able to get that space back.

Initial discard support will be in the 3.18 kernel.  (We have it in
testing and, unless something critical comes up, 3.18 is our target.)

Thanks,

                Ilya