Re: [PATCH] ceph-client: fix xattr bugs

Henry C Chang <henry_c_chang@xxxxxxxxxxxxxxxxxxx> · Mon, 3 May 2010 14:41:52 +0800

Please see my inline reply below.

-Henry

On Sat, May 1, 2010 at 2:21 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> On Fri, 30 Apr 2010, Henry C Chang wrote:
> > Hi Sage,
> >
> > Thanks for your reply.
> > I don't have the test program that reproduces the bug actually.
> > I detected it manually by entering xattr commands from the command prompt.
>
> Do you remember which setfattr/whatever commands?

I installed python-xattr package and it contains a script 'xattr' that
can be used from command prompt directly.

root@ceph-vm2:~# xattr --help
usage: xattr [-l] file [attr_name [attr_value]]
  -l: print long format (attr_name: attr_value) when listing xattrs
  With no optional arguments, lists the xattrs on file
  With attr_name only, lists the contents of attr_name on file
  With attr_value, set the contents of attr_name on file

One of my test sequence is:

$ xattr /mnt/ceph/folder user.quota 20000000
$ xattr /mnt/ceph/folder user.quota
20000000
$ umount /mnt/ceph
$ mount -t ceph 192.168.159.135:/ /mnt/ceph
$ xattr /mnt/ceph/folder user.quota
No such attribute.

Thereafter, the system will become abnormal at some point and crash eventually.

>
> > In fact, I am trying to add "folder quota support" to ceph right now.
> > My rough idea is as below:
> >
> > (1) Store the quota limit in the xattr of the folder;
> > (2) When the client requests a new max size for writing content to a file,
> > MDS authorizes the request according to the quota and the rstat of the
> > folder.
>
> One thing to keep in mind is that because the recursive rsize info is
> lazily propagated up the file tree, this won't work perfectly.  If you set
> a limit of 1GB on /foo and are writing data in /foo/bar/baz, it won't stop
> you right at 1GB.  Similarly, if you hit the limit, and delete some stuff,
> it will take time before the MDS notices and lets you start writing again.
>
Hmm... this would be a problem....
>From the perspective of a user, I would be happy if I can write more
than my quota.
However, I would get pissed off if I have deleted some stuff but still
cannot write anything and don't know how long I have to wait.

Is it possible to force MDS to propaget rsize info when files are deleted?
Or, can lazy propagation be bounded to a maximum interval (say 5 seconds)?

>
> If that is acceptable, then in principle this is doable, although
> difficult.  The first step would probably be keeping track of how much
> "max_size - size" slop is currently outstanding in a recursive fashion so
> that the mds can limit itself.  For a single mds that's pretty doable.
> When you consider that the hierarchy can be partitioned across multiple
> nodes, it becomes more difficult (which is partly why rsize is lazily
> propagated).  This would definitely take some planning...
>
> What is your use case?
>
I want to create "depots" inside ceph:
- Each depot has its own quota limit and can be resized as needed.
- Multiple users can read/write the same depot concurrently.

My original plan is to create a first-level folder for each depot
(e.g., /mnt/ceph/depot1, /mnt/ceph/depot2, ...) and set quota on it.
Do you have any suggestion on implementing such a use case?

>
> sage
>
>
>
> >
> > I detected the xattr bug when I was doing tests for (1). Everything goes
> > well so far after it's fixed.
> > I can write some test programs to test it more since I will write a program
> > (tool) to set/get folder quota.
> >
> > I would appreciate if you can give me some advices on implementing "folder
> > quota".
> >
> > Best Regards,
> > Henry
> >
> >
> > On Fri, Apr 30, 2010 at 12:39 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >
> > > Hi Henry--
> > >
> > > On Thu, 29 Apr 2010, Henry C Chang wrote:
> > > > 1. fill_inode() incorrectly frees the xattr blob in the end of the
> > > >   function. It will cause segfault and then kernel will crash.
> > >
> > > I fixed this slightly differently, by clearing xattr_blob if/when it is
> > > used.  See
> > >
> > >
> > > http://ceph.newdream.net/git/?p=ceph-client.git;a=commitdiff;h=c89f9c6decdbe2427d4d510a949a2d87c5e340dc
> > >
> > > > 2. ceph_listxattr() should compare index_version and version by '>='.
> > >
> > >
> > > http://ceph.newdream.net/git/?p=ceph-client.git;a=commitdiff;h=da6eb075f800b64af9e199ffbe9171752647fbaa
> > >
> > > Both are pushed to the unstable branch.  Do you mind testing?  Do you have
> > > a simple test that reproduced the bug before?  If so we should add it to
> > > the qa suite (which currently does nothing with xattrs).
> > >
> > > Thanks!
> > > sage
> > >
> > >
> > > >
> > > > Signed-off-by: henry_c_chang <henry_c_chang@xxxxxxxxxxxxxxxxxxx>
> > > > ---
> > > >  inode.c |    2 +-
> > > >  xattr.c |    2 +-
> > > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/inode.c b/inode.c
> > > > index 261f3e6..bfe48ee 100644
> > > > --- a/inode.c
> > > > +++ b/inode.c
> > > > @@ -742,7 +742,7 @@ no_change:
> > > >        err = 0;
> > > >
> > > >  out:
> > > > -       if (xattr_blob)
> > > > +       if (err && xattr_blob)
> > > >                ceph_buffer_put(xattr_blob);
> > > >        return err;
> > > >  }
> > > > diff --git a/xattr.c b/xattr.c
> > > > index 37d6ce6..8c4ef01 100644
> > > > --- a/xattr.c
> > > > +++ b/xattr.c
> > > > @@ -573,7 +573,7 @@ ssize_t ceph_listxattr(struct dentry *dentry, char
> > > > *names, size_t size)
> > > >             ci->i_xattrs.version, ci->i_xattrs.index_version);
> > > >
> > > >        if (__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1) &&
> > > > -           (ci->i_xattrs.index_version > ci->i_xattrs.version)) {
> > > > +           (ci->i_xattrs.index_version >= ci->i_xattrs.version)) {
> > > >                goto list_xattr;
> > > >        } else {
> > > >                spin_unlock(&inode->i_lock);
> > > > --
> > > > 1.6.3.3
> > > >
> > >
> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html