Hi, On Wed, 2013-01-30 at 12:31 +0100, Kristian Grønfeldt Sørensen wrote: > Hi, > > I'm setting up a two-node cluster sharing a single GFS2 filesystem > backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM > involved). > > I am experiencing more or less the same as the OP in this thread: > http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html > Well I'm not so sure about that. We never found out what the issue was in that case, but in your case it seems that you are doing something which should work. Also, in the msg00136 case it seems that the lock request didn't work at all, whereas in your case it appears that it does work until a umount/mount of one node - at least if I've understood it correctly. Which kernel and userspace are you using? It would be a good plan to report this as a bug (or via support if you are a supported customer and are using RHEL) as it should work correctly, Steve. > I have an activemq-5.6.0 instance on each server that tries to lock a > file on the GFS2-filesystem (using ). > > When i start the cluster, everything works as expected. The first > activemq instance that starts up acquires the lock, the lock is released > when the activemq exits, and the second instance takes the lock. > > The problem shows when I unmount and subsequently mount the GFS2 > filesystem again on one of the nodes, or reboot one of the nodes (after > having started at least one activemq instance.) > The I start seeing statements like this in the activemq log files: > > Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: Function not implemented | org.apache.activemq.store.kahadb.MessageDatabase > > strace -f while that message is logged gives the following: > > [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 > [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 > [pid 3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", O_RDWR|O_CREAT, 0666) = 133 > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > [pid 3549] fcntl(133, F_GETFD) = 0 > [pid 3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0 > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > [pid 3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = -1 ENOSYS (Function not implemented) > [pid 3549] dup2(138, 133) = 133 > [pid 3549] close(133) > > As you can see, the "Function not implemented" originates from the > F_SETLK fnctl that the JVM does. > The only way to recover from this state seems to be by unmounting the > GFS2-filesystem on both nodes, then mounting it again again on both > nodes. > > I've tried to isolate this by using a simpler testcase than starting two > activemq instances. I ended up using the java sample from > http://www.javabeat.net/2007/10/locking-files-using-java/ . > > I haven't managed to get the system in to a state where F_SETLK returns > "Function no implemented" by only using the above FileLockTest class, (I > need activemq in order to trigger the situation) but when the system is > in that state, I can run FileLockTest, and it will print out the > following stacktrace. > > Exception in thread "main" java.io.IOException: Function not implemented > at sun.nio.ch.FileChannelImpl.lock0(Native Method) > at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871) > at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) > at FileLockTest.main(FileLockTest.java:15) > > > If I run this on the other server (where the GFS2 fs was not unmounted > and mounted again), it works correctly. > > Any ideas to what happens, and why? > > BR > Kristian Sørensen > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster