On 4 June 2018 at 15:31, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: >> I think that in order to make the metadata_diff efficient, we still need >> to rely on the rstats. For example, if you modify the file >> /a/b/c/d/e/f/g, then g's ctime will change, but you'll still have to >> traverse the entire hierarchy in order to discover that. The >> rctime-informed search will let us efficiently find those changes. >> >> ...and if we have the rctimes, then the only real difference is whether we >> do a full readdir for the directory and look at each file's ctime and >> rctime, or whether that filtering is done on the MDS side. It's probably >> a bit faster with the MDS's help, but it needs a special tool, while >> simply modifying rsync to use rctime would work almost as well. >> >>> For the import-diff part, I think we can go this way: first, apply >>> diffs of files in the target subtree, and then, do "setattr" all the >>> files and directories that has been modified between the two snapshots >>> to make their metadata exactly the same as their counter parts on the >>> source filesystem. >> >> rsync does with this the appropriate options. >> >> It seems like the weak link in all of this is making sure the rctimes are >> coherent/correct. We could have a fs-wide sync-like operation that >> flushes all of the rstats down to the root or something. >> >> Am I missing something? > > Hi, sage. I think I get your point. I guess the reason that rstats is > updated lazily is that updated all the parents along the branch in > which the file modified exists is too expensive, since it means every > write under the subtree root would lead to an update of the root > inode. Is this right? If so, I think maybe we can overcome this > problem in this way: say, we are calculating a snapshot diff of the > directory DIR_X between snapshot A and B. We don't have to know the > exact most recent rctime of DIR_X, all we need to know is whether > there are files/dirs in the subtree of DIR_X that are modified after > snapshot A. So, maybe we can do this: say, there is a file > DIR_X/a/b/c/d, if we make the first modification to this file create > an old inode for every parent along the branch, when we do > metadata_diff for DIR_X, we would see that there is an old inode of > DIR_X for snapshot A, then we know we should go into DIR_X, and its > subdir "a", and subdir "b" of its subdir "a", and so on. Because only > the first modification would lead to the creation of old inodes along > the branch, its overhead should be tolerable. > > I don't whether I am make myself clear or whether I'm considering this > in the right way. > If I'm right, I can get down to implement a prototype for this. > > Thank you:-) > Sorry, it should be when we find there exists an old inode the snapshot section of which includes snapshot B and doesn't include snapshot A (A < B) that we can determine to go into DIR_X -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html