Re: NetBSD's read-subvol-entry.t spurious failures explained

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 03/06/2015 04:31 PM, Emmanuel Dreyfus wrote:
Hi

I tracked down the spurious failures of read-subvol-entry.t on NetBSD.

Here is what should happen: we have a volume with brick0 and brick1.
We disable self-heal, kill brick0, create a file in a directory,
restart brick0, and we list directory content to check we find the file.

The tested mechanism is that in brick1, trusted.afr.patchy-client-0
accuse brick0 of being outdated, hence AFR should rule out brick0
for listing directory content, and it should use brick1 which contains
the file we look for.

On NetBSD I can see that AFR never gets trusted.afr.patchy-client-0
and walways things brick0 is fine. AFR randomly picks brick0 or brick1
to list directory content, and when it picks brick0 the test fails.
After bringing brick0 up, and performing "ls abc/def", does afr_do_readdir() get called for "def"? If it does, then AFR will send lookup to both bricks via afr_inode_refresh() , and it will pick brick1 as the source. Like I suggested earlier, we could put a print in afr_readdir_wind() and see that it indeed goes to brick0 when the test fails.
The reason why trusted.afr.patchy-client-0 is not there is that the
node is cached in kernel FUSE from an earlier lookup. The TTL obtained
at that times tells the kernel this node is still valid, hence the
kernel does not send the new lookup to GlusterFS. Since GlusterFS uses
lookups to referesh client view of xattr, it sticks with older value
where brick0 was not yet oudated, and trusted.afr.patchy-client-0 is
unset.
If readdir comes on def, then it is AFR that initiates the lookup. So no fuse caching should be involved.


Questions:

1) Is NetBSD behavior wrong here? It got a TTL for a node, I understand
it should not send lookups to the filesystem until the TTL is expired.

2) How to fix it? If NetBSD behavior is correct, then I guess the test
only succeeds on Linux by chance and we only need to fix the test.
The change below flush kernel cache before looking for the file:

--- a/tests/basic/afr/read-subvol-entry.t
+++ b/tests/basic/afr/read-subvol-entry.t
@@ -26,6 +26,7 @@ TEST kill_brick $V0 $H0 $B0/brick0
TEST touch $M0/abc/def/ghi
  TEST $CLI volume start $V0 force
+( cd $M0 && umount $M0 )
  EXPECT_WITHIN $PROCESS_UP_TIMEOUT "ghi" echo `ls $M0/abc/def/`
#Cleanup




_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux