NFS over CEPH - best practice

leen@xxxxxxxxxxxxxxxxx (Leen Besselink) · Mon, 12 May 2014 18:50:30 +0200

On Mon, May 12, 2014 at 10:52:33AM +0100, Andrei Mikhailovsky wrote:
> Leen, 
> 
> thanks for explaining things. I does make sense now. 
> 
> Unfortunately, it does look like this technology would not fulfill my requirements as I do need to have an ability to perform maintenance without shutting down vms. 
> 

Sorry for being cautious. I've seen certain iSCSI-initiators act that way.

I do not know if that is representative for other iSCSI-initiators.

So I don't know if that applies to VMWare.

During failover reads/writes would be stalled of course.

When properly configured, failover of the target could be done in seconds.

> I will open another topic to discuss possible solutions. 
> 
> Thanks for all your help 
> 
> Andrei 
> ----- Original Message -----
> 
> From: "Leen Besselink" <leen at consolejunkie.net> 
> To: ceph-users at lists.ceph.com 
> Cc: "Andrei Mikhailovsky" <andrei at arhont.com> 
> Sent: Sunday, 11 May, 2014 11:41:08 PM 
> Subject: Re: NFS over CEPH - best practice 
> 
> On Sun, May 11, 2014 at 09:24:30PM +0100, Andrei Mikhailovsky wrote: 
> > Sorry if these questions will sound stupid, but I was not able to find an answer by googling. 
> > 
> 
> As the Astralians say: no worries, mate. 
> 
> It's fine. 
> 
> > 1. Does iSCSI protocol support having multiple target servers to serve the same disk/block device? 
> > 
> 
> No, I don't think so. What does work is active/standby failover. 
> 
> I suggest to have some kind of clustering, because as far as I can see, you never want to have 2 target servers active if they don't share state 
> (as far as I know there is no Linux iSCSI-target server which can share state between 2 targets). 
> 
> When there is a failure there is time to have all targets offline for a brief moment, before the second target comes online. The initiators should be able to handle short interruptions. 
> 
> > In case of ceph, the same rbd disk image. I was hoping to have multiple servers to mount the same rbd disk and serve it as an iscsi LUN. This LUN would be used as a vm image storage on vmware / xenserver. 
> > 
> 
> You'd have one server which handles a LUN, with it goes down, an other should take over the target IP-address and handle requests for that LUN. 
> 
> > 2.Does iscsi multipathing provide failover/HA capability only on the initiator side? The docs that i came across all mention multipathing on the client side, like using two different nics. I did not find anything about having multiple nics on the initiator connecting to multiple iscsi target servers. 
> > 
> 
> Multipathing for iSCSI, as I see it, only does one thing: it can be used to create multiple network paths between the initiator and the target. They can be used for resiliance (read: failover) or for loadbalancing when you need more bandwidth. 
> 
> The way I would do it is to have 2 switches and connect each initiator and each target to both switches. Also you would have 2 IP-subnets. 
> 
> So both the target and initiator would have 2 IP-addresses, one from each subnet. 
> 
> So for example: the target would have: 10.0.1.1 and 10.0.2.1 and the initiator: 10.0.1.11 and 10.0.2.11 
> 
> Then you run the IP-traffic for 10.0.1.x on switch 1 and the 10.0.2.x traffic on switch 2. 
> 
> Thus, you have created a resilient set up: The target has multiple connections to the network, the initiator has multiple connections to the network and you can also handle a switch failover. 
> 
> > I was hoping to have resilient solution on the storage side so that I can perform upgrades and maintenance without needing to shutdown vms running on vmware/xenserver. Is this possible with iscsi? 
> > 
> 
> The failover set up is mostly to handle failures, not really great for maintenance because it does give a short interruption in service. Like 30 seconds or so of no writing to the LUN. 
> 
> That might not be a problem for you, I don't know, but it is at least something to be aware of. And also something you should test when you've build the setup. 
> 
> > Cheers 
> > 
> 
> Hope that helps. 
> 
> > Andrei 
> > ----- Original Message ----- 
> > 
> > From: "Leen Besselink" <leen at consolejunkie.net> 
> > To: ceph-users at lists.ceph.com 
> > Sent: Saturday, 10 May, 2014 8:31:02 AM 
> > Subject: Re: NFS over CEPH - best practice 
> > 
> > On Fri, May 09, 2014 at 12:37:57PM +0100, Andrei Mikhailovsky wrote: 
> > > Ideally I would like to have a setup with 2+ iscsi servers, so that I can perform maintenance if necessary without shutting down the vms running on the servers. I guess multipathing is what I need. 
> > > 
> > > Also I will need to have more than one xenserver/vmware host servers, so the iscsi LUNs will be mounted on several servers. 
> > > 
> > 
> > So you have multiple machines talking to the same LUN at the same time ? 
> > 
> > You'll have to co-ordinate how changes are written to the backing store, normally you'd have the virtualization servers use some kind of protocol. 
> > 
> > When it's SCSI there are the older Reserve/Release commands and the newer SCSI-3 Persistent Reservation commands. 
> > 
> > (i)SCSI allows multiple changes to be in-flight, without coordination things will go wrong. 
> > 
> > Below it was mentioned that you can disable the cache for rbd, if you have no coordination protocol you'll need to do the same on the iSCSI-side. 
> > 
> > I believe when you do that it will be slower, but it might work. 
> > 
> > > Would the suggested setup not work for my requirements? 
> > > 
> > 
> > It depends on VMWare if they allow such a setup. 
> > 
> > Then there is an other thing. How do the VMWare machines coordinate which VM they should be running ? 
> > 
> > I don't know VMWare but usually if you have some kind of clustering setup you'll need to have a 'quorum'. 
> > 
> > A lot of times the quorum is handled by a quorum disk with the SCSI coordiation protocols mentioned above. 
> > 
> > An other way to have a quorum is to have a majority voting system with an un-even number of machines talking over the network. This is what Ceph monitor nodes do. 
> > 
> > As an example of a clustering system that allows it to be used without a quorum disk with only 2 machines talking over the network is Linux Pacemaker. When something bad happends, one machine will just turn off the power of the other machine to prevent things going wrong (this is called STONITH). 
> > 
> > > Andrei 
> > > ----- Original Message ----- 
> > > 
> > > From: "Leen Besselink" <leen at consolejunkie.net> 
> > > To: ceph-users at lists.ceph.com 
> > > Sent: Thursday, 8 May, 2014 9:35:21 PM 
> > > Subject: Re: NFS over CEPH - best practice 
> > > 
> > > On Thu, May 08, 2014 at 01:24:17AM +0200, Gilles Mocellin wrote: 
> > > > Le 07/05/2014 15:23, Vlad Gorbunov a ?crit : 
> > > > >It's easy to install tgtd with ceph support. ubuntu 12.04 for example: 
> > > > > 
> > > > >Connect ceph-extras repo: 
> > > > >echo deb http://ceph.com/packages/ceph-extras/debian $(lsb_release 
> > > > >-sc) main | sudo tee /etc/apt/sources.list.d/ceph-extras.list 
> > > > > 
> > > > >Install tgtd with rbd support: 
> > > > >apt-get update 
> > > > >apt-get install tgt 
> > > > > 
> > > > >It's important to disable the rbd cache on tgtd host. Set in 
> > > > >/etc/ceph/ceph.conf: 
> > > > >[client] 
> > > > >rbd_cache = false 
> > > > [...] 
> > > > 
> > > > Hello, 
> > > > 
> > > 
> > > Hi, 
> > > 
> > > > Without cache on the tgtd side, it should be possible to have 
> > > > failover and load balancing (active/avtive) multipathing. 
> > > > Have you tested multipath load balancing in this scenario ? 
> > > > 
> > > > If it's reliable, it opens a new way for me to do HA storage with iSCSI ! 
> > > > 
> > > 
> > > I have a question, what is your use case ? 
> > > 
> > > Do you need SCSI-3 persistent reservations so multiple machines can use the same LUN at the same time ? 
> > > 
> > > Because in that case I think tgtd won't help you. 
> > > 
> > > Have a good day, 
> > > Leen. 
> > > _______________________________________________ 
> > > ceph-users mailing list 
> > > ceph-users at lists.ceph.com 
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > > 
> > _______________________________________________ 
> > ceph-users mailing list 
> > ceph-users at lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> 

> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com