NAS on RBD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Blair,

> On 09 Sep 2014, at 09:05, Blair Bethwaite <blair.bethwaite at gmail.com> wrote:
> 
> Hi folks,
> 
> In lieu of a prod ready Cephfs I'm wondering what others in the user
> community are doing for file-serving out of Ceph clusters (if at all)?
> 
> We're just about to build a pretty large cluster - 2PB for file-based
> NAS and another 0.5PB rgw. For the rgw component we plan to dip our
> toes in and use an EC backing pool with a ~25TB (usable) 10K SAS + SSD
> cache tier.
> 
> For the file storage we're looking at mounting RBDs (out of a standard
> 3-replica pool for now) on a collection of presentation nodes, which
> will use ZFS to stripe together those RBD vdevs into a zpool which we
> can then carve datasets out of for access from NFS & CIFS clients.
> Those presentation servers will have some PCIe SSD in them for ZIL and
> L2ARC devices, and clients will be split across them depending on what
> ID domain they are coming from. Presentation server availability
> issues will be handled by mounting the relevant zpool on a spare
> server, so it won't be HA from a client perspective, but I can't see a
> way to getting this with an RBD backend.
> 
> Wondering what the collective wisdom has to offer on such a setup?
> 

We do this for some small scale NAS use-cases, with ZFS running in a VM with rbd volumes. The performance is not great (especially since we throttle the IOPS of our RBD). We also tried a few kRBD / ZFS servers with an SSD ZIL ? the SSD solves any performance problem we ever had with ZFS on RBD.

I would say though that this setup is rather adventurous. ZoL is not rock solid ? we?ve had a few lockups in testing, all of which have been fixed in the latest ZFS code in git (my colleague in CC could elaborate if you?re interested).  One thing I?m not comfortable with is the idea of ZFS checking the data in addition to Ceph. Sure, ZFS will tell us if there is a checksum error, but without any redundancy at the ZFS layer there will be no way to correct that error. Of course, the hope is that RADOS will ensure 100% data consistency, but what happens if not?...

Personally, I think you?re very brave to consider running 2PB of ZoL on RBD. If I were you I would seriously evaluate the CephFS option. It used to be on the roadmap for ICE 2.0 coming out this fall, though I noticed its not there anymore (??!!!). Anyway I would say that ZoL on kRBD is not necessarily a more stable solution than CephFS. Even Gluster striped on top of RBD would probably be more stable than ZoL on RBD.

Cheers, Dan




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux