Re: CephFS + CTDB/Samba - MDS session timeout on lockfile

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Eric Eastman [mailto:eric.eastman@xxxxxxxxxxxxxx]
> Sent: 11 May 2016 16:02
> To: Nick Fisk <nick@xxxxxxxxxx>
> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  CephFS + CTDB/Samba - MDS session timeout on
> lockfile
> 
> On Wed, May 11, 2016 at 2:04 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> -----Original Message-----
> >> From: Eric Eastman [mailto:eric.eastman@xxxxxxxxxxxxxx]
> >> Sent: 10 May 2016 18:29
> >> To: Nick Fisk <nick@xxxxxxxxxx>
> >> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> >> Subject: Re:  CephFS + CTDB/Samba - MDS session timeout
> >> on lockfile
> >>
> >> On Tue, May 10, 2016 at 6:48 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
> >> >> Behalf Of Nick Fisk
> >> >> Sent: 10 May 2016 13:30
> >> >> To: 'Eric Eastman' <eric.eastman@xxxxxxxxxxxxxx>
> >> >> Cc: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx>
> >> >> Subject: Re:  CephFS + CTDB/Samba - MDS session
> >> >> timeout on lockfile
> >>
> >> >> > On Mon, May 9, 2016 at 3:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> >> > > Hi Eric,
> >> >> > >
> >> >> > >>
> >> >> > >> I am trying to do some similar testing with SAMBA and CTDB
> >> >> > >> with the Ceph file system.  Are you using the vfs_ceph SAMBA
> >> >> > >> module or are you kernel mounting the Ceph file system?
> >> >> > >
> >> >> > > I'm using the kernel client. I couldn't find any up to date
> >> >> > > information on if
> >> >> > the vfs plugin supported all the necessary bits and pieces.
> >> >> > >
> >> >> > > How is your testing coming along? I would be very interested
> >> >> > > in any
> >> >> > findings you may have come across.
> >> >> > >
> >> >> > > Nick
> >> >> >
> >> >> > I am also using CephFS kernel mounts, with 4 SAMBA gateways.
> >> >> > When
> >> >> from
> >> >> > a SAMBA client, I write a large file (about 2GB) to a gateway
> >> >> > that is not the holder of the CTDB lock file, and then kill that
> >> >> > gateway server during the write, the IP failover works as
> >> >> > expected, and in most cases the file ends up being the correct
> >> >> > size after the new server finishes writing it, but the data is
> >> >> > corrupt. The data in the
> >> > file, from
> >> >> the point of the failover, is all zeros.
> >> >> >
> >> >> > I thought the issue may be with the kernel mount, so I looked
> >> >> > into using  the SAMBA vfs_ceph module, but I need SAMBA with AD
> >> >> > support
> >> >> and
> >> >> > the current vfs_ceph module, even in the SAMBA git master
> >> >> > version, is lacking ACL support for CephFS, as the vfs_ceph.c
> >> >> > patches summited to the SAMBA mail list are not yet available. See:
> >> >> > https://lists.samba.org/archive/samba-technical/2016-March/11306
> >> >> > 3.h
> >> >> > tml
> >> >> >
> >> >> > I tried using a FUSE mount of the CephFS, and it also fails
> >> >> > setting
> >> > ACLs.  See:
> >> >> > http://tracker.ceph.com/issues/15783.
> >> >> >
> >> >> > My current status is IP failover is working, but I am seeing
> >> >> > data corruption on writes to the share when using kernel mounts.
> >> >> > I am also seeing the issue you reported when I kill the system
> >> >> > holding the CTDB lock file.  Are you verifying your data after each
> failover?
> >> >>
> >> >> I must admit you are slightly ahead of me. I was initially trying
> >> >> to just
> >> > get
> >> >> hard/soft failover working correctly. But your response has
> >> >> prompted me to test out the scenario you mentioned. I'm seeing
> >> >> slightly different
> >> > results, my
> >> >> copy seems to error out when I do a node failover. I'm copying an
> >> >> ISO from
> >> > a
> >> >> 2008 server to the CTDB/Samba share and when I reboot the active
> >> >> node, the copy pauses for a couple of seconds and then comes up
> >> >> with the error box. Clicking try again several times doesn't let
> >> >> it resume. I need to do
> >> > a bit
> >> >> more digging to try and work out why this is happening. The share
> >> >> itself
> >> > does
> >> >> seem to be in a working state when trying to click the try again
> >> >> button,
> >> > so
> >> >> there is probably some sort of state/session problem.
> >> >>
> >> >> Do you have multiple vip's configured on your cluster or just a single
> IP?
> >> > I
> >> >> have just the one at the moment.
> >>
> >> I have 4 HA addresses setup, and I am using my AD to do the
> >> round-robin DNS. The moving of IP addresses on failure or when a CTDB
> >> controlled SAMBA system comes on line works great.
> >
> > I've just added another VIP to the cluster so I will see if this changes
> anything.
> >
> >>
> >> >
> >> > Just to add to this, I have just been reading this article
> >> >
> >> > https://nnc3.com/mags/LM10/Magazine/Archive/2009/105/030-
> >> 035_SambaHA/a
> >> > rticle
> >> > .html
> >> >
> >> > And the following paragraph seems to indicate that what I am seeing
> >> > is the correct behaviour? I 'm wondering if this is not happening
> >> > in your case and is why you are getting corruption?
> >> >
> >> > "It is important to understand that load balancing and client
> >> > distribution over the client nodes are connection oriented. If an
> >> > IP address is switched from one node to another, all the
> >> > connections actively using this IP address are dropped and the
> >> > clients have to
> >> reconnect.
> >> >
> >> > To avoid delays, CTDB uses a trick: When an IP is switched, the new
> >> > CTDB node "tickles" the client with an illegal TCP ACK packet
> >> > (tickle
> >> > ACK) containing an invalid sequence number of 0 and an ACK number
> >> > of 0. The client responds with a valid ACK packet, allowing the new
> >> > IP address owner to close the connection with an RST packet, thus
> >> > forcing the client to reestablish the connection to the new node."
> >> >
> >>
> >> Nice article.  I have been trying to figure out if data integrity is
> >> supported with CTDB on failover on any shared file system.  From
> >> looking at various email posts on CTDB+GPFS, it looks like it may
> >> work, so I am going to continue to test it with various CephFS
> configurations.  There is a new "witness protocol"
> >> in SMB3 to support failover, that is not yet supported in any
> >> released versions of SAMBA.
> >> I may have to wait for it to be implemented in SAMBA to get fully
> >> working failover. See:
> >>
> >>
> https://wiki.samba.org/index.php/Samba3/SMB2#Witness_Notification_Pro
> >> tocol
> >> https://sambaxp.org/archive_data/SambaXP2015-
> >> SLIDES/wed/track1/sambaxp2015-wed-track1-Guenther_Deschner-
> >> ImplementingTheWitnessProtocolInSamba.pdf
> >
> > Yes I saw that as well, looks really good and would certainly make the
> whole solution very smooth. I tested the settings Ira posted to lower the
> MDS session timeout and can confirm that I can now hard kill a CTDB node
> without the others getting banned. I plan to do some more testing around
> this, but I would really like to hear from Ira what his concerns around the
> settings were.
> >
> > Ie.
> > 1. Just untested, probably ok, but I'm not putting my name on it 2.
> > Yeah I saw a big dragon fly out of nowhere and eat all my data
> 
> Thank you for the info on your testing with the lower MDS session timeouts.
> I will add those settings to my test system.
> 
> > Have you done any testing with CephFS snapshots? I was having a go at
> getting them working with "Previous Version" yesterday, which worked ok,
> but the warning on the CephFS page is a bit off putting.
> 
> I have been testing snapshots off the root directory since Hammer.
> Over the last year I have found a few bugs, turned in tracker issues, and they
> were quickly fixed. My default build scripts for my test clusters setup a
> cronjob to do hourly snapshots, and I test the snapshots from time to time.
> All my SAMBA testing is being done with snapshots on.  With Jewel,
> snapshots seem to be working well.  I do not run snapshots on lower
> directories as I don't need that functionality right now, and there are more
> warnings from the Ceph engineers on using that additional functionality.

Thanks for the heads up. I will proceed with caution


> 
> Eric

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux