On 12/09/2015 08:08 PM, Shyam wrote:
On 12/09/2015 12:52 AM, Pranith Kumar Karampuri wrote:
On 12/09/2015 10:39 AM, Prashanth Pai wrote:
However, I’d be even more comfortable with an even simpler approach
that
avoids the need to solve what the database folks (who have dealt with
complex transactions for years) would tell us is a really hard
problem.
Instead of designing for every case we can imagine, let’s design
for the
cases that we know would be useful for improving performance. Open
plus
read/write plus close is an obvious one. Raghavendra mentions
create+inodelk as well.
From object interface (Swift/S3) perspective, this is the fop order
and flow for object operations:
GET: open(), fstat(), fgetxattr()s, read()s, close()
Krutika implemented fstat+fgetxattr(http://review.gluster.org/10180). In
posix there is an implementation of GF_CONTENT_KEY which is used to read
a file in lookup by quick-read. This needs to be exposed for fds as well
I think. So you can do all this using fstat on anon-fd.
HEAD: stat(), getxattr()s
Krutika already implemented this for sharding
http://review.gluster.org/10158. You can do this using stat fop.
I believe we need to fork this part of the conversation, i.e the stat
+ xattr information clubbing.
My view on a stat for gluster is, POSIX stat + gluster extended
information being returned. I state this as, a file system when it
stats its inode, should get all information regarding the inode, and
not just the POSIX ones. In the case of other local FS, the inode
structure has more fields than just what POSIX needs, so when the
inode is *read* the FS can populate all its internal inode information
and return to the application/syscall the relevant fields that it needs.
I believe gluster should do the same, so in the cases above, we should
actually extend our stat information (not elaborating how) to include
all information from the brick, i.e stat from POSIX and all the
extended attrs for the inode (file or dir). This can then be consumed
by any layer as needed.
Currently, each layer adds what it needs in addition to the stat
information in the xdata, as an xattr request, this can continue or go
away, if the relevant FOPs return the whole inode information upward.
This also has useful outcomes in readdirp calls, where we get the
extended stat information for each entry.
You can use "list-xattr" in xdata request to get this.
With the patches referred to, and older patches, this seems to be the
direction sought (around 2013), any reasons why this is not prevalent
across the stack and made so? Or am I mistaken?
No reason. We can revive it. There didn't seem to be any interest. So I
didn't follow up to get it in.
Pranith
PUT: creat(), write()s, setxattr(), fsync(), close(), rename()
This I think should be a new compound fop. Nothing similar exists.
DELETE: getxattr(), unlink()
This can also be clubbed in unlink already because xdata exists on the
wire already.
Compounding some of these ops and exposing them as consumable libgfapi
APIs like glfs_get() and glfs_put() similar to librados compound
APIs[1] would greatly improve performance for object based access.
[1]:
https://github.com/ceph/ceph/blob/master/src/include/rados/librados.h#L2219
Thanks.
- Prashanth Pai
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel