NAS on RBD

qgrasso@xxxxxxxxxx (Quenten Grasso) · Wed, 10 Sep 2014 00:32:22 +0000

We have been using the NFS/Pacemaker/RBD Method for a while explains it a bit better here, http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/
PS: Thanks Sebastien,

Our use case is vmware storage, So as I mentioned we've been running it for some time and we've had pretty mixed results. 
Pros are when it works it works really well!
Cons When it doesn't, I've had a couple of instances where the XFS volumes needed fsck and this took about 3 hours on a 4TB Volume. (Lesson learnt use smaller volumes)

ZFS RaidZ Option could be interesting but expensive if using say 3 Pools with 2x replicas with a RBD volume from each and a RaidZ on top of that. (I assume you would use 3 Pools here so we don't end up with data in the same PG which may be corrupted.)

Currently we also use FreeNAS VM's which are backed via RBD w/ 3 replicas and ZFS Striped Volumes and iSCSI/NFS out of these. While not really HA seems mostly work be it FreeNAS iSCSI can get a bit cranky at times. 

We are moving towards another KVM Hypervisor such as proxmox for these vm's which don't quite fit into our Openstack environment instead of having to use "RBD Proxys"

Regards,
Quenten Grasso

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Dan Van Der Ster
Sent: Wednesday, 10 September 2014 12:54 AM
To: Michal Kozanecki
Cc: ceph-users at lists.ceph.com; Blair Bethwaite
Subject: Re: NAS on RBD

> On 09 Sep 2014, at 16:39, Michal Kozanecki <mkozanecki at evertz.com> wrote:
> On 9 September 2014 08:47, Blair Bethwaite <blair.bethwaite at gmail.com> wrote:
>> On 9 September 2014 20:12, Dan Van Der Ster <daniel.vanderster at cern.ch> wrote:
>>> One thing I?m not comfortable with is the idea of ZFS checking the data in addition to Ceph. Sure, ZFS will tell us if there is a checksum error, but without any redundancy at the ZFS layer there will be no way to correct that error. Of course, the hope is that RADOS will ensure 100% data consistency, but what happens if not?...
>> 
>> The ZFS checksumming would tell us if there has been any corruption, which as you've pointed out shouldn't happen anyway on top of Ceph.
> 
> Just want to quickly address this, someone correct me if I'm wrong, but IIRC even with replica value of 3 or more, ceph does not(currently) have any intelligence when it detects a corrupted/"incorrect" PG, it will always replace/repair the PG with whatever data is in the primary, meaning that if the primary PG is the one that?s corrupted/bit-rotted/"incorrect", it will replace the good replicas with the bad.  

According to the the "scrub error on firefly? thread, repair "tends to choose the copy with the lowest osd number which is not obviously corrupted.  Even with three replicas, it does not do any kind of voting at this time.?

Cheers, Dan

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com