On Fri, Mar 29, 2013 at 9:31 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> wrote: > Still trying with no success: > > Sage and Ronnie: > I've tried the ping_pong tool, even with "locking=no" in my smb.conf > (no differences) > > # ping_pong /mnt/ceph/samba-cluster/test 3 > I have about 180 locks/second That is very slow. > If I start the same command from the other node, the tools stops > completely. 0 locks/second Looks like fcntl() locking doenst work that well. The slow rate of fcntl() lock will impact samba. By default, for almost all file i/o samba will need to do at least one fcntl(G_GETLK) in ordet to check whether some other, non-samba, process holds a lock to the file. If you can only do 180 fcntl(F_*LK) per second across the cluster for a file (I assume this is per file limitation) this will have the effect of you only being able to do 180 i/o per second to a file, which will make CIFS impossibly slow for any real use. This was all from a single node as well so no inter-node contention! So here you probably want to use "locking = no" in samba. But beware, locking=no can have catastrophic effects on your data. But without "locking = no" would just become impossibly slow, probably uselessly slow. Using "locking = no" in samba does mean though that you no longer have any locking coherency across protocols. I.e. NFS clients and samba clients are now disjoint since they can no longer see eachothers locks. If you only ever access the data via CIFS, locking = no should be safe. But IF you access data via NFS or other NS protocols, breaking lock coherency across protocols like this could lead to dataloss depending on the i/o patterns. I would recommend only using locking = no if you can guarantee that you will never export the data via other means than CIFS. If you can not guarantee that, you will have to reseach the use patterns very carefully to determine whether locking = no is safe or not. I think for fcntl() locking, depending on use case , is this a home server where you can accept very poor performance? or is this a server for a small workgroup? If the latter, if using locking = yes you probably want your filesystem to allow >10.000 operations per second from a node with no contention and >1000 operations per node per second when there is contention across nodes. If it is a big server, you probably want >> instead of > for these numbers. At least. But first you would need to get ping_pong working reliably both running in a steady state, and later running and recovering from continous single node reboots. It seems ping pong is not working really well for you at all at this stage, so that is likely a problem. As I said, very few cluster filesystems have fcntl() locking that is not completely broken. For now, you could try "lokcing = no" in samba with the caveats above, and you can disable using fcntl() split brain prevention in CTDB by setting CTDB_RECOVERY_LOCK= in /etc/sysconfig/ctdb This will disable the split brain detection in ctdb but allow you to recover quicker if your cluster fs does not handle fcntl() locking well. (with 5 minute recovery you will have so much dataloss due to the way CIFS clients work and timeout that there is probably little point in running CIFS at all) > > Sage, when I start the CTDB service, the mds log says every second: > 2013-03-29 16:49:34.442437 7f33fe6f3700 0 mds.0.server > handle_client_file_setlock: start: 0, length: 0, client: 5475, pid: > 14795, type: 4 > > 2013-03-29 16:49:35.440856 7f33fe6f3700 0 mds.0.server > handle_client_file_setlock: start: 0, length: 0, client: 5475, pid: > 14799, type: 4 > > Exactly as you see it: with a blank line in between > When i start the ping_pong command i have these lines at the same rate > reported by the script (180 lines/second): > > 2013-03-29 17:07:50.277003 7f33fe6f3700 0 mds.0.server > handle_client_file_setlock: start: 2, length: 1, client: 5481, pid: > 11011, type: 2 > > 2013-03-29 17:07:50.281279 7f33fe6f3700 0 mds.0.server > handle_client_file_setlock: start: 1, length: 1, client: 5481, pid: > 11011, type: 4 > > 2013-03-29 17:07:50.286643 7f33fe6f3700 0 mds.0.server > handle_client_file_setlock: start: 0, length: 1, client: 5481, pid: > 11011, type: 2 > > Finally, I've tried to lower the ctdb's RecoverBanPeriod but the > clients was unable to recover for 5 minutes (again!) > So, I've found the mds logging this: > 2013-03-29 16:55:23.354854 7f33fc4ed700 0 log [INF] : closing stale > session client.5475 192.168.130.11:0/580042840 after 300.159862 > > I hope to find a solution. > I am at your disposal to further investigation > > -- > Marco Aroldi > > 2013/3/29 ronnie sahlberg <ronniesahlberg@xxxxxxxxx>: >> The ctdb package comes with a tool "ping pong" that is used to test >> and exercise fcntl() locking. >> >> I think a good test is using this tool and then randomly powercycling >> nodes in your fs cluster >> making sure that >> 1, fcntl() locking is still coherent and correct >> 2, always recover within 20 seconds for a single node power cycle >> >> >> That is probably a good test for CIFS serving. >> >> >> On Thu, Mar 28, 2013 at 6:22 PM, ronnie sahlberg >> <ronniesahlberg@xxxxxxxxx> wrote: >>> On Thu, Mar 28, 2013 at 6:09 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>>> On Thu, 28 Mar 2013, ronnie sahlberg wrote: >>>>> Disable the recovery lock file from ctdb completely. >>>>> And disable fcntl locking from samba. >>>>> >>>>> To be blunt, unless your cluster filesystem is called GPFS, >>>>> locking is probably completely broken and should be avoided. >>>> >>>> Ha! >>>> >>>>> On Thu, Mar 28, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> wrote: >>>>> > Thanks for the answer, >>>>> > >>>>> > I haven't yet looked at the samba.git clone, sorry. I will. >>>>> > >>>>> > Just a quick report on my test environment: >>>>> > * cephfs mounted with kernel driver re-exported from 2 samba nodes >>>>> > * If "node B" goes down, everything works like a charm: "node A" does >>>>> > ip takeover and bring up the "node B"'s ip >>>>> > * Instead, if "node A" goes down, "node B" can't take the rlock file >>>>> > and gives this error: >>>>> > >>>>> > ctdb_recovery_lock: Failed to get recovery lock on >>>>> > '/mnt/ceph/samba-cluster/rlock' >>>>> > Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds >>>>> > >>>>> > * So, for 5 minutes, neither "node A" nor "node B" are active. After >>>>> > that, the cluster recover correctly. >>>>> > It seems that one of the 2 nodes "owns" and don't want to "release" >>>>> > the rlock file >>>> >>>> Cephfs aims to give you coherent access between nodes. The cost of that >>>> is that if another client goes down and it holds some lease/lock, you have >>>> to wait for it to time out. That is supposed to happen after 60 seconds, >>>> it sounds like you've hit a bug here. The flock/fnctl locks aren't >>>> super-well tested in the failure scenarios. >>>> >>>> Even assuming it were working, though, I'm not sure that you want to wait >>>> the 60 seconds either for the CTDB's to take over for each other. >>> >>> You do not want to wait 60 seconds. That is approaching territory where >>> CIFS clients will start causing file corruption and dataloss due to >>> them dropping writeback caches. >>> >>> You probably want to aim to try to guarantee that fcntl() locking >>> start working again after >>> ~20 seconds or so to have some headroom. >>> >>> >>> Microsoft themself state 25seconds as the absolute deadline they >>> require you guarantee before they will qualify storage. >>> That is among other things to accomodate and have some headroom for >>> some really nasty dataloss issues that will >>> happen if storage can not recover quickly enough. >>> >>> >>> CIFS is hard realtime. And you will pay dearly for missing the deadline. >>> >>> >>> regards >>> ronnie sahlberg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com