On Wed, 2013-01-30 at 13:34 +0000, Steven Whitehouse wrote: > Hi, > > On Wed, 2013-01-30 at 12:31 +0100, Kristian Grønfeldt Sørensen wrote: > > Hi, > > > > I'm setting up a two-node cluster sharing a single GFS2 filesystem > > backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM > > involved). > > > > I am experiencing more or less the same as the OP in this thread: > > http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html > > > > Well I'm not so sure about that. We never found out what the issue was > in that case, but in your case it seems that you are doing something > which should work. Also, in the msg00136 case it seems that the lock > request didn't work at all, whereas in your case it appears that it does > work until a umount/mount of one node - at least if I've understood it > correctly. Correct. And I am able to bring the system into a working state by unmounting the file system from all nodes at the same time, and mounting it again. > Which kernel and userspace are you using? It's Debian testing - kernel is from experimental ( 3.7.1-1~experimental.2), since I had problems deleting files with the gfs2-module included in the default Debian testing kernel (3.2.x). cman + libdlm3 is v3.0.12 corosync is v1.4.2 Let me know if you need version numbers of other stuff. > It would be a good plan to report this as a bug (or via support if you > are a supported customer and are using RHEL) as it should work > correctly, OK will probably file a bug report then. It's at least encouraging to hear that it should work:-) /Kristian > Steve. > > > > I have an activemq-5.6.0 instance on each server that tries to lock a > > file on the GFS2-filesystem (using ). > > > > When i start the cluster, everything works as expected. The first > > activemq instance that starts up acquires the lock, the lock is released > > when the activemq exits, and the second instance takes the lock. > > > > The problem shows when I unmount and subsequently mount the GFS2 > > filesystem again on one of the nodes, or reboot one of the nodes (after > > having started at least one activemq instance.) > > The I start seeing statements like this in the activemq log files: > > > > Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: Function not implemented | org.apache.activemq.store.kahadb.MessageDatabase > > > > strace -f while that message is logged gives the following: > > > > [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 > > [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 > > [pid 3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", O_RDWR|O_CREAT, 0666) = 133 > > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > [pid 3549] fcntl(133, F_GETFD) = 0 > > [pid 3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0 > > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > [pid 3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = -1 ENOSYS (Function not implemented) > > [pid 3549] dup2(138, 133) = 133 > > [pid 3549] close(133) > > > > As you can see, the "Function not implemented" originates from the > > F_SETLK fnctl that the JVM does. > > The only way to recover from this state seems to be by unmounting the > > GFS2-filesystem on both nodes, then mounting it again again on both > > nodes. > > > > I've tried to isolate this by using a simpler testcase than starting two > > activemq instances. I ended up using the java sample from > > http://www.javabeat.net/2007/10/locking-files-using-java/ . > > > > I haven't managed to get the system in to a state where F_SETLK returns > > "Function no implemented" by only using the above FileLockTest class, (I > > need activemq in order to trigger the situation) but when the system is > > in that state, I can run FileLockTest, and it will print out the > > following stacktrace. > > > > Exception in thread "main" java.io.IOException: Function not implemented > > at sun.nio.ch.FileChannelImpl.lock0(Native Method) > > at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871) > > at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) > > at FileLockTest.main(FileLockTest.java:15) > > > > > > If I run this on the other server (where the GFS2 fs was not unmounted > > and mounted again), it works correctly. > > > > Any ideas to what happens, and why? > > > > BR > > Kristian Sørensen > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster