> -----Original Message----- > From: Eric Eastman [mailto:eric.eastman@xxxxxxxxxxxxxx] > Sent: 10 May 2016 18:29 > To: Nick Fisk <nick@xxxxxxxxxx> > Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx> > Subject: Re: CephFS + CTDB/Samba - MDS session timeout on > lockfile > > On Tue, May 10, 2016 at 6:48 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: > > > > > >> -----Original Message----- > >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf > >> Of Nick Fisk > >> Sent: 10 May 2016 13:30 > >> To: 'Eric Eastman' <eric.eastman@xxxxxxxxxxxxxx> > >> Cc: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx> > >> Subject: Re: CephFS + CTDB/Samba - MDS session timeout > >> on lockfile > > >> > On Mon, May 9, 2016 at 3:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote: > >> > > Hi Eric, > >> > > > >> > >> > >> > >> I am trying to do some similar testing with SAMBA and CTDB with > >> > >> the Ceph file system. Are you using the vfs_ceph SAMBA module > >> > >> or are you kernel mounting the Ceph file system? > >> > > > >> > > I'm using the kernel client. I couldn't find any up to date > >> > > information on if > >> > the vfs plugin supported all the necessary bits and pieces. > >> > > > >> > > How is your testing coming along? I would be very interested in > >> > > any > >> > findings you may have come across. > >> > > > >> > > Nick > >> > > >> > I am also using CephFS kernel mounts, with 4 SAMBA gateways. When > >> from > >> > a SAMBA client, I write a large file (about 2GB) to a gateway that > >> > is not the holder of the CTDB lock file, and then kill that gateway > >> > server during the write, the IP failover works as expected, and in > >> > most cases the file ends up being the correct size after the new > >> > server finishes writing it, but the data is corrupt. The data in > >> > the > > file, from > >> the point of the failover, is all zeros. > >> > > >> > I thought the issue may be with the kernel mount, so I looked into > >> > using the SAMBA vfs_ceph module, but I need SAMBA with AD support > >> and > >> > the current vfs_ceph module, even in the SAMBA git master version, > >> > is lacking ACL support for CephFS, as the vfs_ceph.c patches > >> > summited to the SAMBA mail list are not yet available. See: > >> > https://lists.samba.org/archive/samba-technical/2016-March/113063.h > >> > tml > >> > > >> > I tried using a FUSE mount of the CephFS, and it also fails setting > > ACLs. See: > >> > http://tracker.ceph.com/issues/15783. > >> > > >> > My current status is IP failover is working, but I am seeing data > >> > corruption on writes to the share when using kernel mounts. I am > >> > also seeing the issue you reported when I kill the system holding > >> > the CTDB lock file. Are you verifying your data after each failover? > >> > >> I must admit you are slightly ahead of me. I was initially trying to > >> just > > get > >> hard/soft failover working correctly. But your response has prompted > >> me to test out the scenario you mentioned. I'm seeing slightly > >> different > > results, my > >> copy seems to error out when I do a node failover. I'm copying an ISO > >> from > > a > >> 2008 server to the CTDB/Samba share and when I reboot the active > >> node, the copy pauses for a couple of seconds and then comes up with > >> the error box. Clicking try again several times doesn't let it > >> resume. I need to do > > a bit > >> more digging to try and work out why this is happening. The share > >> itself > > does > >> seem to be in a working state when trying to click the try again > >> button, > > so > >> there is probably some sort of state/session problem. > >> > >> Do you have multiple vip's configured on your cluster or just a single IP? > > I > >> have just the one at the moment. > > I have 4 HA addresses setup, and I am using my AD to do the round-robin > DNS. The moving of IP addresses on failure or when a CTDB controlled > SAMBA system comes on line works great. I've just added another VIP to the cluster so I will see if this changes anything. > > > > > Just to add to this, I have just been reading this article > > > > https://nnc3.com/mags/LM10/Magazine/Archive/2009/105/030- > 035_SambaHA/a > > rticle > > .html > > > > And the following paragraph seems to indicate that what I am seeing is > > the correct behaviour? I 'm wondering if this is not happening in your > > case and is why you are getting corruption? > > > > "It is important to understand that load balancing and client > > distribution over the client nodes are connection oriented. If an IP > > address is switched from one node to another, all the connections > > actively using this IP address are dropped and the clients have to > reconnect. > > > > To avoid delays, CTDB uses a trick: When an IP is switched, the new > > CTDB node "tickles" the client with an illegal TCP ACK packet (tickle > > ACK) containing an invalid sequence number of 0 and an ACK number of > > 0. The client responds with a valid ACK packet, allowing the new IP > > address owner to close the connection with an RST packet, thus forcing > > the client to reestablish the connection to the new node." > > > > Nice article. I have been trying to figure out if data integrity is supported with > CTDB on failover on any shared file system. From looking at various email > posts on CTDB+GPFS, it looks like it may work, so I am going to continue to > test it with various CephFS configurations. There is a new "witness protocol" > in SMB3 to support failover, that is not yet supported in any released > versions of SAMBA. > I may have to wait for it to be implemented in SAMBA to get fully working > failover. See: > > https://wiki.samba.org/index.php/Samba3/SMB2#Witness_Notification_Pro > tocol > https://sambaxp.org/archive_data/SambaXP2015- > SLIDES/wed/track1/sambaxp2015-wed-track1-Guenther_Deschner- > ImplementingTheWitnessProtocolInSamba.pdf Yes I saw that as well, looks really good and would certainly make the whole solution very smooth. I tested the settings Ira posted to lower the MDS session timeout and can confirm that I can now hard kill a CTDB node without the others getting banned. I plan to do some more testing around this, but I would really like to hear from Ira what his concerns around the settings were. Ie. 1. Just untested, probably ok, but I'm not putting my name on it 2. Yeah I saw a big dragon fly out of nowhere and eat all my data Have you done any testing with CephFS snapshots? I was having a go at getting them working with "Previous Version" yesterday, which worked ok, but the warning on the CephFS page is a bit off putting. Nick > > Eric _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com