On 4 June 2018 at 17:31, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: > On 4 June 2018 at 15:31, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: >>> I think that in order to make the metadata_diff efficient, we still need >>> to rely on the rstats. For example, if you modify the file >>> /a/b/c/d/e/f/g, then g's ctime will change, but you'll still have to >>> traverse the entire hierarchy in order to discover that. The >>> rctime-informed search will let us efficiently find those changes. >>> >>> ...and if we have the rctimes, then the only real difference is whether we >>> do a full readdir for the directory and look at each file's ctime and >>> rctime, or whether that filtering is done on the MDS side. It's probably >>> a bit faster with the MDS's help, but it needs a special tool, while >>> simply modifying rsync to use rctime would work almost as well. >>> >>>> For the import-diff part, I think we can go this way: first, apply >>>> diffs of files in the target subtree, and then, do "setattr" all the >>>> files and directories that has been modified between the two snapshots >>>> to make their metadata exactly the same as their counter parts on the >>>> source filesystem. >>> >>> rsync does with this the appropriate options. >>> >>> It seems like the weak link in all of this is making sure the rctimes are >>> coherent/correct. We could have a fs-wide sync-like operation that >>> flushes all of the rstats down to the root or something. >>> >>> Am I missing something? >> >> Hi, sage. I think I get your point. I guess the reason that rstats is >> updated lazily is that updated all the parents along the branch in >> which the file modified exists is too expensive, since it means every >> write under the subtree root would lead to an update of the root >> inode. Is this right? If so, I think maybe we can overcome this >> problem in this way: say, we are calculating a snapshot diff of the >> directory DIR_X between snapshot A and B. We don't have to know the >> exact most recent rctime of DIR_X, all we need to know is whether >> there are files/dirs in the subtree of DIR_X that are modified after >> snapshot A. So, maybe we can do this: say, there is a file >> DIR_X/a/b/c/d, if we make the first modification to this file create >> an old inode for every parent along the branch, when we do >> metadata_diff for DIR_X, we would see that there is an old inode of >> DIR_X for snapshot A, then we know we should go into DIR_X, and its >> subdir "a", and subdir "b" of its subdir "a", and so on. Because only >> the first modification would lead to the creation of old inodes along >> the branch, its overhead should be tolerable. >> >> I don't whether I am make myself clear or whether I'm considering this >> in the right way. >> If I'm right, I can get down to implement a prototype for this. >> >> Thank you:-) >> > > Sorry, it should be when we find there exists an old inode the > snapshot section of which includes snapshot B and doesn't include > snapshot A (A < B) that we can determine to go into DIR_X Actually, I think, to make this process more efficient, instead of creating old inodes, we can add a new field identifying the snapshot sections that no modification happens, and add a new separate lock for this field. Is this right? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html