Re: CephFS + CTDB/Samba - MDS session timeout on lockfile

Eric Eastman <eric.eastman@xxxxxxxxxxxxxx> · Tue, 10 May 2016 11:28:55 -0600

On Tue, May 10, 2016 at 6:48 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>
>
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Nick Fisk
>> Sent: 10 May 2016 13:30
>> To: 'Eric Eastman' <eric.eastman@xxxxxxxxxxxxxx>
>> Cc: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx>
>> Subject: Re:  CephFS + CTDB/Samba - MDS session timeout on
>> lockfile

>> > On Mon, May 9, 2016 at 3:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> > > Hi Eric,
>> > >
>> > >>
>> > >> I am trying to do some similar testing with SAMBA and CTDB with the
>> > >> Ceph file system.  Are you using the vfs_ceph SAMBA module or are
>> > >> you kernel mounting the Ceph file system?
>> > >
>> > > I'm using the kernel client. I couldn't find any up to date
>> > > information on if
>> > the vfs plugin supported all the necessary bits and pieces.
>> > >
>> > > How is your testing coming along? I would be very interested in any
>> > findings you may have come across.
>> > >
>> > > Nick
>> >
>> > I am also using CephFS kernel mounts, with 4 SAMBA gateways. When
>> from
>> > a SAMBA client, I write a large file (about 2GB) to a gateway that is
>> > not the holder of the CTDB lock file, and then kill that gateway
>> > server during the write, the IP failover works as expected, and in
>> > most cases the file ends up being the correct size after the new
>> > server finishes writing it, but the data is corrupt. The data in the
> file, from
>> the point of the failover, is all zeros.
>> >
>> > I thought the issue may be with the kernel mount, so I looked into
>> > using  the SAMBA vfs_ceph module, but I need SAMBA with AD support
>> and
>> > the current vfs_ceph module, even in the SAMBA git master version, is
>> > lacking ACL support for CephFS, as the vfs_ceph.c patches summited to
>> > the SAMBA mail list are not yet available. See:
>> > https://lists.samba.org/archive/samba-technical/2016-March/113063.html
>> >
>> > I tried using a FUSE mount of the CephFS, and it also fails setting
> ACLs.  See:
>> > http://tracker.ceph.com/issues/15783.
>> >
>> > My current status is IP failover is working, but I am seeing data
>> > corruption on writes to the share when using kernel mounts. I am also
>> > seeing the issue you reported when I kill the system holding the CTDB
>> > lock file.  Are you verifying your data after each failover?
>>
>> I must admit you are slightly ahead of me. I was initially trying to just
> get
>> hard/soft failover working correctly. But your response has prompted me to
>> test out the scenario you mentioned. I'm seeing slightly different
> results, my
>> copy seems to error out when I do a node failover. I'm copying an ISO from
> a
>> 2008 server to the CTDB/Samba share and when I reboot the active node,
>> the copy pauses for a couple of seconds and then comes up with the error
>> box. Clicking try again several times doesn't let it resume. I need to do
> a bit
>> more digging to try and work out why this is happening. The share itself
> does
>> seem to be in a working state when trying to click the try again button,
> so
>> there is probably some sort of state/session problem.
>>
>> Do you have multiple vip's configured on your cluster or just a single IP?
> I
>> have just the one at the moment.

I have 4 HA addresses setup, and I am using my AD to do the
round-robin DNS. The moving of IP addresses on failure or when a CTDB
controlled SAMBA system comes on line works great.

>
> Just to add to this, I have just been reading this article
>
> https://nnc3.com/mags/LM10/Magazine/Archive/2009/105/030-035_SambaHA/article
> .html
>
> And the following paragraph seems to indicate that what I am seeing is the
> correct behaviour? I 'm wondering if this is not happening in your case and
> is why you are getting corruption?
>
> "It is important to understand that load balancing and client distribution
> over the client nodes are connection oriented. If an IP address is switched
> from one node to another, all the connections actively using this IP address
> are dropped and the clients have to reconnect.
>
> To avoid delays, CTDB uses a trick: When an IP is switched, the new CTDB
> node "tickles" the client with an illegal TCP ACK packet (tickle ACK)
> containing an invalid sequence number of 0 and an ACK number of 0. The
> client responds with a valid ACK packet, allowing the new IP address owner
> to close the connection with an RST packet, thus forcing the client to
> reestablish the connection to the new node."
>

Nice article.  I have been trying to figure out if data integrity is
supported with CTDB on failover on any shared file system.  From
looking at various email posts on CTDB+GPFS, it looks like it may
work, so I am going to continue to test it with various CephFS
configurations.  There is a new "witness protocol" in SMB3 to support
failover, that is not yet supported in any released versions of SAMBA.
I may have to wait for it to be implemented in SAMBA to get fully
working failover. See:

https://wiki.samba.org/index.php/Samba3/SMB2#Witness_Notification_Protocol
https://sambaxp.org/archive_data/SambaXP2015-SLIDES/wed/track1/sambaxp2015-wed-track1-Guenther_Deschner-ImplementingTheWitnessProtocolInSamba.pdf

Eric
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com