----- Original Message ----- > From: "Anand Avati" <avati@xxxxxxxxxxx> > To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx> > Cc: "Edward Shishkin" <edward@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Wednesday, May 21, 2014 12:36:22 PM > Subject: Re: spurios failures in tests/encryption/crypt.t > > On Tue, May 20, 2014 at 10:54 PM, Pranith Kumar Karampuri < > pkarampu@xxxxxxxxxx> wrote: > > > > > > > ----- Original Message ----- > > > From: "Anand Avati" <avati@xxxxxxxxxxx> > > > To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx> > > > Cc: "Edward Shishkin" <edward@xxxxxxxxxx>, "Gluster Devel" < > > gluster-devel@xxxxxxxxxxx> > > > Sent: Wednesday, May 21, 2014 10:53:54 AM > > > Subject: Re: spurios failures in tests/encryption/crypt.t > > > > > > There are a few suspicious things going on here.. > > > > > > On Tue, May 20, 2014 at 10:07 PM, Pranith Kumar Karampuri < > > > pkarampu@xxxxxxxxxx> wrote: > > > > > > > > > > > > > hi, > > > > > > crypt.t is failing regression builds once in a while and most > > of > > > > > > the times it is because of the failures just after the remount in > > the > > > > > > script. > > > > > > > > > > > > TEST rm -f $M0/testfile-symlink > > > > > > TEST rm -f $M0/testfile-link > > > > > > > > > > > > Both of these are failing with ENOTCONN. I got a chance to look at > > > > > > the logs. According to the brick logs, this is what I see: > > > > > > [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open] > > > > > > 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink: > > > > > > Transport endpoint is not connected > > > > > > > > > > posix_open() happening on a symlink? This should NEVER happen. glusterfs > > > itself should NEVER EVER by triggering symlink resolution on the server. > > In > > > this case, for whatever reason an open() is attempted on a symlink, and > > it > > > is getting followed back onto gluster's own mount point (test case is > > > creating an absolute link). > > > > > > So first find out: who is triggering fop->open() on a symlink. Fix the > > > caller. http://review.gluster.org/7824 > > > > > > Next: add a check in posix_open() to fail with ELOOP or EINVAL if the > > inode > > > is a symlink. http://review.gluster.org/7823 > > > > I think I understood what you are saying. Open call for symlink on fuse > > mount lead to an open call again for the target on the same fuse mount. > > > It's not that simple. The client VFS is intelligent enough to resolve > symlinks and send open() only on non-symlinks. And the test case script was > doing an obvious unlink() (TEST rm -f <filename>), so it was not initiated > by an open() attempt in the first place. My guess is that some xlator > (probably crypt?) is doing an open() on an inode and that is going through > unchecked in posix. It is a bug in both the caller and posix, but the > onus/responsibility is on posix to disallow open() on anything but regular > files (even open() on character or block devices should not happen in > posix). > > > > > Which lead to deadlock :). That is why we disallow opens on symlink in > > gluster? > > > > That's not just why open on symlink is disallowed in gluster, it is a more > generic problem of following symlinks in general inside gluster. Symlink > resolution must strictly happen only in the outermost VFS. Following > symlinks inside the filesystem is not only an invalid operation, but can > lead to all kinds of deadlocks, security holes (what if you opened a > symlink which points to /etc/passwd, should it show the contents of the > client machine's /etc/passwd or the server? Now what if you wrote to the > file through the symlink? etc. you get the idea..) and > wrong/weird/dangerous behaviors. This is not just related to following > symlinks, even open()ing special devices.. e.g if you create a char device > file with major/minor number of an audio device and wrote pcm data into it, > should it play music on the client machine or in the server machine? etc. > The summary is, following symlinks or opening non-regular files is > VFS/client operation and are invalid operations in a filesystem context. > Now only one question remains. How could it not hang everytime? Pranith _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel