Re: CephFS + CTDB/Samba - MDS session timeout on lockfile

Nick Fisk <nick@xxxxxxxxxx> · Tue, 10 May 2016 13:29:30 +0100

> -----Original Message-----
> From: Eric Eastman [mailto:eric.eastman@xxxxxxxxxxxxxx]
> Sent: 09 May 2016 23:09
> To: Nick Fisk <nick@xxxxxxxxxx>
> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  CephFS + CTDB/Samba - MDS session timeout on
> lockfile
> 
> On Mon, May 9, 2016 at 3:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> > Hi Eric,
> >
> >>
> >> I am trying to do some similar testing with SAMBA and CTDB with the
> >> Ceph file system.  Are you using the vfs_ceph SAMBA module or are you
> >> kernel mounting the Ceph file system?
> >
> > I'm using the kernel client. I couldn't find any up to date information on if
> the vfs plugin supported all the necessary bits and pieces.
> >
> > How is your testing coming along? I would be very interested in any
> findings you may have come across.
> >
> > Nick
> 
> I am also using CephFS kernel mounts, with 4 SAMBA gateways. When from a
> SAMBA client, I write a large file (about 2GB) to a gateway that is not the
> holder of the CTDB lock file, and then kill that gateway server during the
> write, the IP failover works as expected, and in most cases the file ends up
> being the correct size after the new server finishes writing it, but the data is
> corrupt. The data in the file, from the point of the failover, is all zeros.
> 
> I thought the issue may be with the kernel mount, so I looked into using  the
> SAMBA vfs_ceph module, but I need SAMBA with AD support and the
> current vfs_ceph module, even in the SAMBA git master version, is lacking
> ACL support for CephFS, as the vfs_ceph.c patches summited to the SAMBA
> mail list are not yet available. See:
> https://lists.samba.org/archive/samba-technical/2016-March/113063.html
> 
> I tried using a FUSE mount of the CephFS, and it also fails setting ACLs.  See:
> http://tracker.ceph.com/issues/15783.
> 
> My current status is IP failover is working, but I am seeing data corruption on
> writes to the share when using kernel mounts. I am also seeing the issue you
> reported when I kill the system holding the CTDB lock file.  Are you verifying
> your data after each failover?

I must admit you are slightly ahead of me. I was initially trying to just get hard/soft failover working correctly. But your response has prompted me to test out the scenario you mentioned. I'm seeing slightly different results, my copy seems to error out when I do a node failover. I'm copying an ISO from a 2008 server to the CTDB/Samba share and when I reboot the active node, the copy pauses for a couple of seconds and then comes up with the error box. Clicking try again several times doesn't let it resume. I need to do a bit more digging to try and work out why this is happening. The share itself does seem to be in a working state when trying to click the try again button, so there is probably some sort of state/session problem.

Do you have multiple vip's configured on your cluster or just a single IP? I have just the one at the moment.

> 
> Eric

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com