> -----Original Message----- > From: Eric Eastman [mailto:eric.eastman@xxxxxxxxxxxxxx] > Sent: 11 May 2016 16:02 > To: Nick Fisk <nick@xxxxxxxxxx> > Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx> > Subject: Re: CephFS + CTDB/Samba - MDS session timeout on > lockfile > > On Wed, May 11, 2016 at 2:04 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: > >> -----Original Message----- > >> From: Eric Eastman [mailto:eric.eastman@xxxxxxxxxxxxxx] > >> Sent: 10 May 2016 18:29 > >> To: Nick Fisk <nick@xxxxxxxxxx> > >> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx> > >> Subject: Re: CephFS + CTDB/Samba - MDS session timeout > >> on lockfile > >> > >> On Tue, May 10, 2016 at 6:48 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: > >> > > >> > > >> >> -----Original Message----- > >> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On > >> >> Behalf Of Nick Fisk > >> >> Sent: 10 May 2016 13:30 > >> >> To: 'Eric Eastman' <eric.eastman@xxxxxxxxxxxxxx> > >> >> Cc: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx> > >> >> Subject: Re: CephFS + CTDB/Samba - MDS session > >> >> timeout on lockfile > >> > >> >> > On Mon, May 9, 2016 at 3:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote: > >> >> > > Hi Eric, > >> >> > > > >> >> > >> > >> >> > >> I am trying to do some similar testing with SAMBA and CTDB > >> >> > >> with the Ceph file system. Are you using the vfs_ceph SAMBA > >> >> > >> module or are you kernel mounting the Ceph file system? > >> >> > > > >> >> > > I'm using the kernel client. I couldn't find any up to date > >> >> > > information on if > >> >> > the vfs plugin supported all the necessary bits and pieces. > >> >> > > > >> >> > > How is your testing coming along? I would be very interested > >> >> > > in any > >> >> > findings you may have come across. > >> >> > > > >> >> > > Nick > >> >> > > >> >> > I am also using CephFS kernel mounts, with 4 SAMBA gateways. > >> >> > When > >> >> from > >> >> > a SAMBA client, I write a large file (about 2GB) to a gateway > >> >> > that is not the holder of the CTDB lock file, and then kill that > >> >> > gateway server during the write, the IP failover works as > >> >> > expected, and in most cases the file ends up being the correct > >> >> > size after the new server finishes writing it, but the data is > >> >> > corrupt. The data in the > >> > file, from > >> >> the point of the failover, is all zeros. > >> >> > > >> >> > I thought the issue may be with the kernel mount, so I looked > >> >> > into using the SAMBA vfs_ceph module, but I need SAMBA with AD > >> >> > support > >> >> and > >> >> > the current vfs_ceph module, even in the SAMBA git master > >> >> > version, is lacking ACL support for CephFS, as the vfs_ceph.c > >> >> > patches summited to the SAMBA mail list are not yet available. See: > >> >> > https://lists.samba.org/archive/samba-technical/2016-March/11306 > >> >> > 3.h > >> >> > tml > >> >> > > >> >> > I tried using a FUSE mount of the CephFS, and it also fails > >> >> > setting > >> > ACLs. See: > >> >> > http://tracker.ceph.com/issues/15783. > >> >> > > >> >> > My current status is IP failover is working, but I am seeing > >> >> > data corruption on writes to the share when using kernel mounts. > >> >> > I am also seeing the issue you reported when I kill the system > >> >> > holding the CTDB lock file. Are you verifying your data after each > failover? > >> >> > >> >> I must admit you are slightly ahead of me. I was initially trying > >> >> to just > >> > get > >> >> hard/soft failover working correctly. But your response has > >> >> prompted me to test out the scenario you mentioned. I'm seeing > >> >> slightly different > >> > results, my > >> >> copy seems to error out when I do a node failover. I'm copying an > >> >> ISO from > >> > a > >> >> 2008 server to the CTDB/Samba share and when I reboot the active > >> >> node, the copy pauses for a couple of seconds and then comes up > >> >> with the error box. Clicking try again several times doesn't let > >> >> it resume. I need to do > >> > a bit > >> >> more digging to try and work out why this is happening. The share > >> >> itself > >> > does > >> >> seem to be in a working state when trying to click the try again > >> >> button, > >> > so > >> >> there is probably some sort of state/session problem. > >> >> > >> >> Do you have multiple vip's configured on your cluster or just a single > IP? > >> > I > >> >> have just the one at the moment. > >> > >> I have 4 HA addresses setup, and I am using my AD to do the > >> round-robin DNS. The moving of IP addresses on failure or when a CTDB > >> controlled SAMBA system comes on line works great. > > > > I've just added another VIP to the cluster so I will see if this changes > anything. > > > >> > >> > > >> > Just to add to this, I have just been reading this article > >> > > >> > https://nnc3.com/mags/LM10/Magazine/Archive/2009/105/030- > >> 035_SambaHA/a > >> > rticle > >> > .html > >> > > >> > And the following paragraph seems to indicate that what I am seeing > >> > is the correct behaviour? I 'm wondering if this is not happening > >> > in your case and is why you are getting corruption? > >> > > >> > "It is important to understand that load balancing and client > >> > distribution over the client nodes are connection oriented. If an > >> > IP address is switched from one node to another, all the > >> > connections actively using this IP address are dropped and the > >> > clients have to > >> reconnect. > >> > > >> > To avoid delays, CTDB uses a trick: When an IP is switched, the new > >> > CTDB node "tickles" the client with an illegal TCP ACK packet > >> > (tickle > >> > ACK) containing an invalid sequence number of 0 and an ACK number > >> > of 0. The client responds with a valid ACK packet, allowing the new > >> > IP address owner to close the connection with an RST packet, thus > >> > forcing the client to reestablish the connection to the new node." > >> > > >> > >> Nice article. I have been trying to figure out if data integrity is > >> supported with CTDB on failover on any shared file system. From > >> looking at various email posts on CTDB+GPFS, it looks like it may > >> work, so I am going to continue to test it with various CephFS > configurations. There is a new "witness protocol" > >> in SMB3 to support failover, that is not yet supported in any > >> released versions of SAMBA. > >> I may have to wait for it to be implemented in SAMBA to get fully > >> working failover. See: > >> > >> > https://wiki.samba.org/index.php/Samba3/SMB2#Witness_Notification_Pro > >> tocol > >> https://sambaxp.org/archive_data/SambaXP2015- > >> SLIDES/wed/track1/sambaxp2015-wed-track1-Guenther_Deschner- > >> ImplementingTheWitnessProtocolInSamba.pdf > > > > Yes I saw that as well, looks really good and would certainly make the > whole solution very smooth. I tested the settings Ira posted to lower the > MDS session timeout and can confirm that I can now hard kill a CTDB node > without the others getting banned. I plan to do some more testing around > this, but I would really like to hear from Ira what his concerns around the > settings were. > > > > Ie. > > 1. Just untested, probably ok, but I'm not putting my name on it 2. > > Yeah I saw a big dragon fly out of nowhere and eat all my data > > Thank you for the info on your testing with the lower MDS session timeouts. > I will add those settings to my test system. > > > Have you done any testing with CephFS snapshots? I was having a go at > getting them working with "Previous Version" yesterday, which worked ok, > but the warning on the CephFS page is a bit off putting. > > I have been testing snapshots off the root directory since Hammer. > Over the last year I have found a few bugs, turned in tracker issues, and they > were quickly fixed. My default build scripts for my test clusters setup a > cronjob to do hourly snapshots, and I test the snapshots from time to time. > All my SAMBA testing is being done with snapshots on. With Jewel, > snapshots seem to be working well. I do not run snapshots on lower > directories as I don't need that functionality right now, and there are more > warnings from the Ceph engineers on using that additional functionality. Thanks for the heads up. I will proceed with caution > > Eric _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com