Thanks, Shishir. Please make sure to also include your proposed changes on the wiki, either here: http://www.gluster.org/community/documentation/index.php/Features or here: http://www.gluster.org/community/documentation/index.php/Planning34#Contributed_Feature_Ideas If we're pretty confident that this will go into the 3.4 planning cycle, then I prefer the latter. -JM ----- Original Message ----- > Hi All, > > This is a proposed enhancement to DHT directory rename operations to > make it recoverable in-case of crashes. > > Please feel free to review/comment on the design. There also 2 open > issues which need to tackled (see below in recovery logic) > > We propose to add 2 new on-disk xattrs SRC(<key to be > decided>:destination path) and DST(<key to be decided>:src > path/gfid). > > Consider these scenarios > > case1. Only source directory exists > case2. Both source, and destination directories exist. > > The tasks for rename would be as follows: > > 1. Set SRC key on all source directories > 2. If step 1 fails, remove xattrs, and fail rename > 3. If case2, set xattrs on destination directories > 4. If failure in case2, ignore > 5. Rename directories (opendir on dst, readdir(ENOEMPTY error), > rename dst_hashed subvol first, and then rest) > 6. If step 5 fails with any error other than ENOTCONN, fail rename, > and remove xattrs > 7. If failure is because of ENOTCONN, proceed with rename and return > a success. > > Recovery steps (once the brick comes up): > > 1. On lookup/readdir (NFS requirement?) query for these SRC and DST > key. > 2. If SRC key is found , validate: > a. If mtime is less than 5 seconds of the lookup request, then do > not heal, as rename might be in progress (Can we make this more > fool proof?) > b. If dst does, not exist, proceed > c. If dst exists, check its key and see if they match. If > mismatch, do not rename, as it might lead to gfid mis-match. > > 3. Proceed with checks rename of directories (similar to step 5 of > above (rename). > 4. If successful, remove xattrs, return success. > 5. If failure what needs to be done? (other rename's might have > succeeded, this might fail due to ENOTEMPTY(even due to race) > > > As for subvol down, we can't guarantee in the scenarios of brick > going down after stage 1(setxattr). > > Brick going down before start of subvolume: We do not allow rename to > progress anywhere. > > If a brick goes down after setxattr, if it has files, or files are > created after its up (possible race), then we cant recover. > > > With regards, > Shishir > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > https://lists.nongnu.org/mailman/listinfo/gluster-devel >