2.0.2 - Fast First Access and SQLite Affecting Bugs Still Present

Gordan Bobic <gordan@xxxxxxxxxx> · Sun, 07 Jun 2009 13:04:31 +0100

As the subject line says - the above two issues, as mentioned before, 
are still present.

Fast First Access Bug
=====================
To reproduce, use a script that mounts a glusterfs cluster/replicate 
share from the local node with only the local node being up, and then 
immediately tries to bind mount a subdirectory from that share into 
another directory, e.g.

8K-----8K-----8K-----8K-----8K-----8K-----8K-----
#!/bin/bash
mount -t glusterfs \
-o defaults,noatime,nodiratime,direct-io-mode=off,log-file=/dev/null\
,log-level=NONE /etc/glusterfs/root.vol /mnt/newroot
mount --bind /mnt/newroot/cluster/cdsl/2 /mnt/newroot/cdsl.local
8K-----8K-----8K-----8K-----8K-----8K-----8K-----

The bind mount will reliably fail. I'm not sure if this makes any 
difference WRT the amount of content in the directory being mounted, but 
in case it does, path that root.vol points at should contain something 
resembling a Linux root file system (i.e. not that many directories in 
the root).

Here is the root.vol I'm using:
8K-----8K-----8K-----8K-----8K-----8K-----8K-----
volume root1
        type protocol/client
        option transport-type socket
        option address-family inet
        option remote-host 10.1.0.10
        option remote-subvolume root1
end-volume

volume root-store
        type storage/posix
        option directory /mnt/tmproot/gluster/root/x86_64
end-volume

volume root2
        type features/posix-locks
        subvolumes root-store
end-volume

volume server
        type protocol/server
        option transport-type socket
        option address-family inet
        subvolumes root2
        option auth.addr.root2.allow 127.0.0.1,10.*
end-volume

volume root
        type cluster/replicate
        subvolumes root1 root2
        option read-subvolume root2
end-volume
8K-----8K-----8K-----8K-----8K-----8K-----8K-----

Note that 10.1.0.10 node isn't up, only the local node is up. I haven't 
tested with the 2nd node up since I haven't built the 2nd node yet.

If I modify the mounting script to do something like this instead:

8K-----8K-----8K-----8K-----8K-----8K-----8K-----
#!/bin/bash
mount -t glusterfs \
-o defaults,noatime,nodiratime,direct-io-mode=off,log-file=/dev/null\
,log-level=NONE /etc/glusterfs/root.vol /mnt/newroot
# Note - added sleep and ls
sleep 2
ls -la /mnt/newroot > /dev/null
sleep 2
ls -laR /mnt/newroot/cluster > /dev/null
sleep 2
mount --bind /mnt/newroot/cluster/cdsl/2 /mnt/newroot/cdsl.local
8K-----8K-----8K-----8K-----8K-----8K-----8K-----

then it works.

SQLite Affecting Bugs
=====================
There seems to be an issue that reliably (but very subtly) affects some 
of the SQLite functionality. This is evident in the way the RPM database 
behaves (converted to SQLite because as far as I can tell BDB needs 
writable mmap() which means it won't work on any fuse based fs) - for 
example, it just won't find some of the packages even though they are 
installed. Here is an example (a somewhat ironic one, you might say):

# ls -la /usr/lib64/libfuse.so.*
lrwxrwxrwx 1 root root     16 May 25 12:39 /usr/lib64/libfuse.so.2 -> 
libfuse.so.2.7.4
-rwxr-xr-x 1 root root 134256 Feb 19 21:40 /usr/lib64/libfuse.so.2.7.4

# rpm -q --whatprovides /usr/lib64/libfuse.so.2
fuse-libs-2.7.4glfs11-1

# rpm -Uvh glusterfs-client-2.0.2-1.el5.x86_64.rpm 
glusterfs-server-2.0.2-1.el5.x86_64.rpm 
glusterfs-client-2.0.2-1.el5.x86_64.rpm
warning: package glusterfs-client = 2.0.2-1.el5 was already added, 
skipping glusterfs-client < 2.0.2-1.el5
error: Failed dependencies:
        libfuse.so.2()(64bit) is needed by 
glusterfs-client-2.0.2-1.el5.x86_64
        libfuse.so.2(FUSE_2.4)(64bit) is needed by 
glusterfs-client-2.0.2-1.el5.x86_64
        libfuse.so.2(FUSE_2.5)(64bit) is needed by 
glusterfs-client-2.0.2-1.el5.x86_64
        libfuse.so.2(FUSE_2.6)(64bit) is needed by 
glusterfs-client-2.0.2-1.el5.x86_64

So libfuse is there, RPM knows that fuse-libs-2.7.4glfs11-1 package 
provides, and yet when glusterfs tries to install, it fails to find it. 
This _only_ happens when the RPM DB (/var/lib/rpm) is on glusterfs. The 
sama package sets on machines that aren't rooted on glusterfs deal with 
this package combination just fine. rpm --rebuilddb doesn't alter the 
situation at all, the issue is still present after the DB rebuild.

If the above is deemed difficult to set up, there is another way to 
easily recreate an SQLite related issue. Mount /home via glusterfs, log 
into X, and fire up Firefox 3.0.x (I'm using 3.0.10 on x86_64, but this 
has been reproducible for a very long time with older versions, too). 
Add a bookmark. It'll show up in the bookmarks menu. Now exit firefox, 
wait a few seconds for it to shut down, and fire it up again. Check the 
bookmarks - the page you have just added won't be there.

I only tested this (/home) with both nodes being up, I haven't tried it 
with one node being down.

Has anybody got any ideas on what could be causing this or any 
workarounds? In the RPM DB case, the FS is mounted with the following 
parameters (from ps, after startup):
/usr/sbin/glusterfs --log-level=NONE --log-file=/dev/null 
--disable-direct-io-mode --volfile=/etc/glusterfs/root.vol /mnt/newroot

Home is mounted with the following:
/usr/sbin/glusterfs --log-level=NORMAL --volfile=/etc/glusterfs/home.vol 
/home

If these are the same bug, then this implies that direct-io-mode has no 
effect on it.

Has anybody got a clue where the root causes of these two may be, and 
more importantly, when a fix might be available? The bind mount issue is 
particularly annoying because it means that startup of glusterfs root 
requires nasty init script bodges to work around timing/settling issues 
(as mentioned above), the likes of which really shouldn't find their way 
into a production environment.

If any further debug/reproduction info is needed, please do tell and 
I'll do my best to provide it.

Best regards.

Gordan