RIO-Distribution: Status update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Here is a detailed (and lengthy, considering it is been a while) status update on RIO. As the mail is long, you maybe interested in subsections of the same, so here are a list of sections and their numbers,

1) General information
2) What does the graph look like
3) What works
4) What does not work
5) Major changes to common code
6) Problems needing attention
7) Next steps of interest

1) General information:

Main github issue tracking the feature: https://github.com/gluster/glusterfs/issues/243 (this has been updated just before this mail, and has quite a few updates to the issue description itself)

Where are we developing the code:
  - We are using the experimental branch to develop code for RIO
  - List of current commits would be as in [1]

2) What does the graph look like:
- It is intended to look like this in the (near) future https://docs.google.com/document/d/1-1ibDCzHh0_U5KXz977MFtGxgNE8CADK5zmJm_f0_Dw/edit?usp=sharing

3) What works:
- Using the python volume file generator, we can create a RIO based volume on a single node (https://github.com/gluster/glusterfs/blob/experimental/tests/experimental/riocreate.t) - This volume is a bare bones FUSE->RIO-Client->protocol->RIO-Server->POSIX2 graph, so other xlators are not integrated or active at present

- This volume can be FUSE mounted, and operations such as, create, mkdir, stat, xattr(get/set/remove) can be performed on the volume - Creating files and directories in deeper than one directory depth, requires a couple of unmerged patches,
      - Add mkdir FOP: https://review.gluster.org/#/c/18270/
- Add ability to handle remote inodes in lookup: https://review.gluster.org/#/c/18295/

4) What does not work:
- Data operations are still under development, so reading or writing files will not work (as will things like fallocate, discard, etc. not work)
  - Directory listing does not work
  - unlink, rename, link, among a few other FOPs do not work

5) Major changes to common code:
- POSIX xlator has been *reorganized*, so that we can reuse all but entry ops from existing posix xlator code. This is still in experimental, but we intend to bring this into master in about 2 weeks, once we have a few data FOPs working, to ensure that this works and hence the reorganization is worth the effort.
    - Commits of interest:
- Reorganize posix xlator to prepare for reuse with rio: https://review.gluster.org/#/c/17990/ - Further reorganize posix xlator code for rio : https://review.gluster.org/#/c/17998/ - Some further reorganization of posix xlator: https://review.gluster.org/#/c/18013/

  - Added 2 new FOPs, icreate and namelink
- Sketchy details of these FOPs would be, icreate creates an inode and namelink links an inode to a basename. So in essence, icreate is a create without a name, and namelink completes the linking of the inode to its basename under the required parent GFID.
    - Some details can be found at [5]
- More details regarding the FOP will appear around the time we would attempt to push this to master.
    - Commits of interest:
      - add icreate/namelink fop: https://review.gluster.org/#/c/18085/
- io-threads: add icreate/namelink fop: https://review.gluster.org/#/c/18086/ - protocol: add icreate/namelink: https://review.gluster.org/#/c/18094/

6) Problems needing attention:
  - Keeping time/size updated in the MDS (from the DS)
Once we enable data operations, the time and size information on the DS needs to be synced/fetched from the MDS for any iatt related data returned. This problem is well written out by Venky here [2] and as noted earlier has similar solution requirements as the utime xlator work that Rafi is currently engaged on [3]. We intend to leverage the work with RIO as well.

- Handling cases where basename and inode are on different MDS subvolumes (remote inodes) There is an interesting case in RIO, where name and inode of a filesystem object can be in 2 different MDS subvolumes. In such cases, we will get the GFID when looking up the name on the first MDS, and using the GFID we would lookup the inode in the relevant MDS. This needs some thought, as currently this is plugged in as an op_ret = -1 and op_errno = EREMOTE, with changes to client/server protocol layers, to return iatt information on this class of errors. This changes the abstraction/assumption that a FOP should return parameters (instead of NULLs) even on errors, and hence needs a better fix for the same. Suggestions welcome, code snippet that achieves this is in [4]

- Handling notify for the client and the server (given the way the graph is now) When is the RIO-client or RIO-server ready to serve requests? IOW, how to handle notify? Currently this is hacked into the code, and will not survive any mishaps, but we need a better understanding of the problem, and related events and finally the solution to make this happen correctly. Code that does this: - Server is ready only when its POSIX xlator is ready (in RIO bricks connect to all other bricks, so an UP event from other bricks, does not mean we are ready): https://github.com/gluster/glusterfs/blob/experimental/xlators/experimental/rio/rio-server/src/rio-server-main.c#L41 - Client is ready when all children are ready (do not judge me by this hacky code! ;-p): https://github.com/gluster/glusterfs/blob/experimental/xlators/experimental/rio/rio-client/src/rio-client-main.c#L85

7) Next steps of interest:
  - Handling dirty inodes
inodes that have had data operations, hence have stale time/size information on the MDS
  - Adding the dentry backpointers to the inode
    Just like what is added today for POSIX using the xxhash named xattrs
  - Handling inheritence of parent bits
How and when SUID/SGID, ACLs are handled, when creating subdirectories, as we are not leveraging the hiearchy of the local FS

Shyam, Kotresh, Susant

[1] RIO experimental commits: https://github.com/gluster/glusterfs/issues/243#issuecomment-331476032 [2] Times and size maintenence in RIO: https://review.gluster.org/#/c/13395/3/design/DHT2/DHT2_Size_On_MDS
[3] POSIX changes for utime xlator: https://review.gluster.org/#/c/17224/4
[4] Returning iatt even on errors:
- https://review.gluster.org/#/c/18295/3/xlators/protocol/server/src/server-rpc-fops.c - https://review.gluster.org/#/c/18295/3/xlators/protocol/client/src/client-rpc-fops.c [5] Notes on icreate/namelink: https://review.gluster.org/#/c/13395/3/design/DHT2/DHT2_Icreate_Namelink_Notes.md
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux