Re: Quincy NFS ingress failover

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If there isn't any documentation for this yet, can anyone tell me:

 * How do I inspect/change my NFS/haproxy/keepalived configuration?
 * What is it supposed to look like? Does someone have a working example?

Thank you.

On 31/08/2023 9:36 am, Thorne Lawler wrote:
Sorry everyone,

Is there any more detailed documentation on the high availability NFS functionality in current Ceph?

This is a pretty serious sticking point.

Thank you.

On 30/08/2023 9:33 am, Thorne Lawler wrote:
Fellow cephalopods,

I'm trying to get quick, seamless NFS failover happening on my four-node Ceph cluster.

I followed the instructions here:
https://docs.ceph.com/en/latest/cephadm/services/nfs/#high-availability-nfs

but testing shows that failover doesn't happen. When I placed node 2 ("san2") in maintenance mode, the NFS service shut down:

Aug 24 14:19:03 san2 ceph-e2f1b934-ed43-11ec-80fa-04421a1a1d66-nfs-xcpnfs-1-0-san2-datsvq[1962479]: 24/08/2023 04:19:03 : epoch 64b8af5a : san2 : ganesha.nfsd-8[Admin] do_shutdown :MAIN :EVENT :Removing all exports. Aug 24 14:19:13 san2 bash[3235994]: time="2023-08-24T14:19:13+10:00" level=warning msg="StopSignal SIGTERM failed to stop container ceph-e2f1b934-ed43-11ec-80fa-04421a1a1d66-nfs-xcpnfs-1-0-san2-datsvq in 10 seconds, resorting to SIGKILL" Aug 24 14:19:13 san2 bash[3235994]: ceph-e2f1b934-ed43-11ec-80fa-04421a1a1d66-nfs-xcpnfs-1-0-san2-datsvq Aug 24 14:19:13 san2 systemd[1]:ceph-e2f1b934-ed43-11ec-80fa-04421a1a1d66@nfs.xcpnfs.1.0.san2.datsvq.servic <mailto:ceph-e2f1b934-ed43-11ec-80fa-04421a1a1d66@nfs.xcpnfs.1.0.san2.datsvq.servic>e: Main process exited, code=exited, status=137/n/a Aug 24 14:19:14 san2 systemd[1]:ceph-e2f1b934-ed43-11ec-80fa-04421a1a1d66@nfs.xcpnfs.1.0.san2.datsvq.servic <mailto:ceph-e2f1b934-ed43-11ec-80fa-04421a1a1d66@nfs.xcpnfs.1.0.san2.datsvq.servic>e: Failed with result 'exit-code'. Aug 24 14:19:14 san2 systemd[1]: Stopped Ceph nfs.xcpnfs.1.0.san2.datsvq for e2f1b934-ed43-11ec-80fa-04421a1a1d66.

And that's it. The ingress IP didn't move.

More odd, the cluster seems to have placed the ingress IP on node 1 (san1) but seems to be using the NFS service on node 2.

Do I need to more tightly connect the NFS service to the keepalive and haproxy services, or do I need to expand the ingress services to refer to multiple NFS services?

Thank you.

--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and any attached files may be confidential information, and may also be the subject of legal professional privilege. _If you are not the intended recipient any use, disclosure or copying of this email is unauthorised. _If you received this email in error, please notify Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this matter and delete all copies of this transmission together with any attachments. /
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux