As the subject line says - the above two issues, as mentioned before,
are still present.
Fast First Access Bug
=====================
To reproduce, use a script that mounts a glusterfs cluster/replicate
share from the local node with only the local node being up, and then
immediately tries to bind mount a subdirectory from that share into
another directory, e.g.
8K-----8K-----8K-----8K-----8K-----8K-----8K-----
#!/bin/bash
mount -t glusterfs \
-o defaults,noatime,nodiratime,direct-io-mode=off,log-file=/dev/null\
,log-level=NONE /etc/glusterfs/root.vol /mnt/newroot
mount --bind /mnt/newroot/cluster/cdsl/2 /mnt/newroot/cdsl.local
8K-----8K-----8K-----8K-----8K-----8K-----8K-----
The bind mount will reliably fail. I'm not sure if this makes any
difference WRT the amount of content in the directory being mounted, but
in case it does, path that root.vol points at should contain something
resembling a Linux root file system (i.e. not that many directories in
the root).
Here is the root.vol I'm using:
8K-----8K-----8K-----8K-----8K-----8K-----8K-----
volume root1
type protocol/client
option transport-type socket
option address-family inet
option remote-host 10.1.0.10
option remote-subvolume root1
end-volume
volume root-store
type storage/posix
option directory /mnt/tmproot/gluster/root/x86_64
end-volume
volume root2
type features/posix-locks
subvolumes root-store
end-volume
volume server
type protocol/server
option transport-type socket
option address-family inet
subvolumes root2
option auth.addr.root2.allow 127.0.0.1,10.*
end-volume
volume root
type cluster/replicate
subvolumes root1 root2
option read-subvolume root2
end-volume
8K-----8K-----8K-----8K-----8K-----8K-----8K-----
Note that 10.1.0.10 node isn't up, only the local node is up. I haven't
tested with the 2nd node up since I haven't built the 2nd node yet.
If I modify the mounting script to do something like this instead:
8K-----8K-----8K-----8K-----8K-----8K-----8K-----
#!/bin/bash
mount -t glusterfs \
-o defaults,noatime,nodiratime,direct-io-mode=off,log-file=/dev/null\
,log-level=NONE /etc/glusterfs/root.vol /mnt/newroot
# Note - added sleep and ls
sleep 2
ls -la /mnt/newroot > /dev/null
sleep 2
ls -laR /mnt/newroot/cluster > /dev/null
sleep 2
mount --bind /mnt/newroot/cluster/cdsl/2 /mnt/newroot/cdsl.local
8K-----8K-----8K-----8K-----8K-----8K-----8K-----
then it works.
SQLite Affecting Bugs
=====================
There seems to be an issue that reliably (but very subtly) affects some
of the SQLite functionality. This is evident in the way the RPM database
behaves (converted to SQLite because as far as I can tell BDB needs
writable mmap() which means it won't work on any fuse based fs) - for
example, it just won't find some of the packages even though they are
installed. Here is an example (a somewhat ironic one, you might say):
# ls -la /usr/lib64/libfuse.so.*
lrwxrwxrwx 1 root root 16 May 25 12:39 /usr/lib64/libfuse.so.2 ->
libfuse.so.2.7.4
-rwxr-xr-x 1 root root 134256 Feb 19 21:40 /usr/lib64/libfuse.so.2.7.4
# rpm -q --whatprovides /usr/lib64/libfuse.so.2
fuse-libs-2.7.4glfs11-1
# rpm -Uvh glusterfs-client-2.0.2-1.el5.x86_64.rpm
glusterfs-server-2.0.2-1.el5.x86_64.rpm
glusterfs-client-2.0.2-1.el5.x86_64.rpm
warning: package glusterfs-client = 2.0.2-1.el5 was already added,
skipping glusterfs-client < 2.0.2-1.el5
error: Failed dependencies:
libfuse.so.2()(64bit) is needed by
glusterfs-client-2.0.2-1.el5.x86_64
libfuse.so.2(FUSE_2.4)(64bit) is needed by
glusterfs-client-2.0.2-1.el5.x86_64
libfuse.so.2(FUSE_2.5)(64bit) is needed by
glusterfs-client-2.0.2-1.el5.x86_64
libfuse.so.2(FUSE_2.6)(64bit) is needed by
glusterfs-client-2.0.2-1.el5.x86_64
So libfuse is there, RPM knows that fuse-libs-2.7.4glfs11-1 package
provides, and yet when glusterfs tries to install, it fails to find it.
This _only_ happens when the RPM DB (/var/lib/rpm) is on glusterfs. The
sama package sets on machines that aren't rooted on glusterfs deal with
this package combination just fine. rpm --rebuilddb doesn't alter the
situation at all, the issue is still present after the DB rebuild.
If the above is deemed difficult to set up, there is another way to
easily recreate an SQLite related issue. Mount /home via glusterfs, log
into X, and fire up Firefox 3.0.x (I'm using 3.0.10 on x86_64, but this
has been reproducible for a very long time with older versions, too).
Add a bookmark. It'll show up in the bookmarks menu. Now exit firefox,
wait a few seconds for it to shut down, and fire it up again. Check the
bookmarks - the page you have just added won't be there.
I only tested this (/home) with both nodes being up, I haven't tried it
with one node being down.
Has anybody got any ideas on what could be causing this or any
workarounds? In the RPM DB case, the FS is mounted with the following
parameters (from ps, after startup):
/usr/sbin/glusterfs --log-level=NONE --log-file=/dev/null
--disable-direct-io-mode --volfile=/etc/glusterfs/root.vol /mnt/newroot
Home is mounted with the following:
/usr/sbin/glusterfs --log-level=NORMAL --volfile=/etc/glusterfs/home.vol
/home
If these are the same bug, then this implies that direct-io-mode has no
effect on it.
Has anybody got a clue where the root causes of these two may be, and
more importantly, when a fix might be available? The bind mount issue is
particularly annoying because it means that startup of glusterfs root
requires nasty init script bodges to work around timing/settling issues
(as mentioned above), the likes of which really shouldn't find their way
into a production environment.
If any further debug/reproduction info is needed, please do tell and
I'll do my best to provide it.
Best regards.
Gordan