Re: CephFS + CTDB/Samba - MDS session timeout on lockfile

Nick Fisk <nick@xxxxxxxxxx> · Tue, 10 May 2016 13:48:06 +0100

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Nick Fisk
> Sent: 10 May 2016 13:30
> To: 'Eric Eastman' <eric.eastman@xxxxxxxxxxxxxx>
> Cc: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  CephFS + CTDB/Samba - MDS session timeout on
> lockfile
> 
> > -----Original Message-----
> > From: Eric Eastman [mailto:eric.eastman@xxxxxxxxxxxxxx]
> > Sent: 09 May 2016 23:09
> > To: Nick Fisk <nick@xxxxxxxxxx>
> > Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> > Subject: Re:  CephFS + CTDB/Samba - MDS session timeout on
> > lockfile
> >
> > On Mon, May 9, 2016 at 3:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> > > Hi Eric,
> > >
> > >>
> > >> I am trying to do some similar testing with SAMBA and CTDB with the
> > >> Ceph file system.  Are you using the vfs_ceph SAMBA module or are
> > >> you kernel mounting the Ceph file system?
> > >
> > > I'm using the kernel client. I couldn't find any up to date
> > > information on if
> > the vfs plugin supported all the necessary bits and pieces.
> > >
> > > How is your testing coming along? I would be very interested in any
> > findings you may have come across.
> > >
> > > Nick
> >
> > I am also using CephFS kernel mounts, with 4 SAMBA gateways. When
> from
> > a SAMBA client, I write a large file (about 2GB) to a gateway that is
> > not the holder of the CTDB lock file, and then kill that gateway
> > server during the write, the IP failover works as expected, and in
> > most cases the file ends up being the correct size after the new
> > server finishes writing it, but the data is corrupt. The data in the
file, from
> the point of the failover, is all zeros.
> >
> > I thought the issue may be with the kernel mount, so I looked into
> > using  the SAMBA vfs_ceph module, but I need SAMBA with AD support
> and
> > the current vfs_ceph module, even in the SAMBA git master version, is
> > lacking ACL support for CephFS, as the vfs_ceph.c patches summited to
> > the SAMBA mail list are not yet available. See:
> > https://lists.samba.org/archive/samba-technical/2016-March/113063.html
> >
> > I tried using a FUSE mount of the CephFS, and it also fails setting
ACLs.  See:
> > http://tracker.ceph.com/issues/15783.
> >
> > My current status is IP failover is working, but I am seeing data
> > corruption on writes to the share when using kernel mounts. I am also
> > seeing the issue you reported when I kill the system holding the CTDB
> > lock file.  Are you verifying your data after each failover?
> 
> I must admit you are slightly ahead of me. I was initially trying to just
get
> hard/soft failover working correctly. But your response has prompted me to
> test out the scenario you mentioned. I'm seeing slightly different
results, my
> copy seems to error out when I do a node failover. I'm copying an ISO from
a
> 2008 server to the CTDB/Samba share and when I reboot the active node,
> the copy pauses for a couple of seconds and then comes up with the error
> box. Clicking try again several times doesn't let it resume. I need to do
a bit
> more digging to try and work out why this is happening. The share itself
does
> seem to be in a working state when trying to click the try again button,
so
> there is probably some sort of state/session problem.
> 
> Do you have multiple vip's configured on your cluster or just a single IP?
I
> have just the one at the moment.

Just to add to this, I have just been reading this article

https://nnc3.com/mags/LM10/Magazine/Archive/2009/105/030-035_SambaHA/article
.html

And the following paragraph seems to indicate that what I am seeing is the
correct behaviour? I 'm wondering if this is not happening in your case and
is why you are getting corruption?

"It is important to understand that load balancing and client distribution
over the client nodes are connection oriented. If an IP address is switched
from one node to another, all the connections actively using this IP address
are dropped and the clients have to reconnect.

To avoid delays, CTDB uses a trick: When an IP is switched, the new CTDB
node "tickles" the client with an illegal TCP ACK packet (tickle ACK)
containing an invalid sequence number of 0 and an ACK number of 0. The
client responds with a valid ACK packet, allowing the new IP address owner
to close the connection with an RST packet, thus forcing the client to
reestablish the connection to the new node."

Nick

> 
> >
> > Eric
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com