Hi Darrick, On Fri, Nov 11, 2016 at 1:11 AM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > [snip] > The VFS reflink interface (FICLONERANGE) and dedupe interface (FIDEDUPERANGE) > allows callers to specify the range of bytes on which to operate, which means > that we can share arbitrary parts of two files. For example, let's say you > have one regular extent in a file's block map: > > RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR (regular extent) > > Now ask reflink to share the middle of that extent with some other file. > ocfs2's extent mapping record can store a flag indicating that the > extent could be shared, which means that the one record now splits into > three: > > RRRRRRRRRRRssssssRRRRRRRRRRRRR (regular, shared, regular) > > This scenario happens if you run duperemove against an ocfs2 filesystem > to deduplicate file data, since dedupe wants the filesystem to be able > to share arbitrary blocks of files. > > You're correct that cp --reflink only ever deals with entire extents > because it calls FICLONERANGE with both file offsets zero and the length > set to the length of the source file. The important thing to remember > is that we are not limited to sharing entire files, even if the > userspace utilities don't take advantage of it. Thanks for your time! I thought it's for an existed ocfs2 issue until reaching patch[6/6]. > > (FWIW the duperemove program does take advantage of it.) > >> 2) what do you mean by "*full* extent btree"? > > (Slight correction to that -- I should have said "full extent tree split".) > > Record insertion operations on a tree structure all follow the same > basic strategy -- search from the root towards the records in the leaf > blocks until we find the place where the record would be, memmove() all > the records following that spot up by one index, copy the record data > into the newly vacated slot, and (if necessary) walk back up the tree to > update the interior node pointers. > > If the desired leaf is already full, however, there is a problem -- we > have to split the leaf into two half-full leaf blocks before we can > insert the record. We must also add a pointer to the new leaf into the > next level up in the tree. If that interior node is also full, we split > the interior node prior to adding the new leaf block pointer. We must > then add a pointer to the new interior node into the next level up in > the tree, and so on until we reach the root. > > In short, if we want to add a record to a tree whose blocks are > completely full, we end up splitting blocks all the way up the tree. > Oh yes, thanks for pointing me to the basic tree operation - record insertion;-) >> >> 3) in what situation, will this happen? - "we need an extent split but >> not a refcount split". >> Could you please explain more by example? > > An extreme example would be a program like this: > > - write to block zero > - for i in 1 to 524288, > - reflink block zero to block $i > > When this program terminates, the refcount tree will contain a single > refcount record ($phys_blk, len=1, refcount=524288). The extent map for > this file, however, will have 524,288 extent records: > > ($phys_blk, len=1, offset=0, flags=shared) > ($phys_blk, len=1, offset=1, flags=shared) > ... > ($phys_blk, len=1, offset=524288, flags=shared) > > There are 524288 records. A 4k leaf can fit 252 records, so there will > be 2081 leaf blocks. A 4k node also can fit 252 records, so there will > be 9 node blocks pointing to leaves, and one root block to point to the > first level nodes. Clearly, the extent tree has split many times across > all the reflink operations. However, the refcount tree never splits > because there's only one record. Excellent explanation! Now, I know that virtual blocks of a file can share the same physical blocks. Thanks a lot! Darwin > > --D > >> >> Thanks, >> Darwin >> >> > >> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> >> > --- >> > fs/ocfs2/refcounttree.c | 3 +++ >> > 1 file changed, 3 insertions(+) >> > >> > >> > diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c >> > index 59be8f4..d92b6c6 100644 >> > --- a/fs/ocfs2/refcounttree.c >> > +++ b/fs/ocfs2/refcounttree.c >> > @@ -3698,6 +3698,9 @@ int ocfs2_add_refcount_flag(struct inode *inode, >> > struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> > struct ocfs2_alloc_context *meta_ac = NULL; >> > >> > + /* We need to be able to handle at least an extent tree split. */ >> > + ref_blocks = ocfs2_extend_meta_needed(data_et->et_root_el); >> > + >> > ret = ocfs2_calc_refcount_meta_credits(inode->i_sb, >> > ref_ci, ref_root_bh, >> > p_cluster, num_clusters, >> > >> > >> > _______________________________________________ >> > Ocfs2-devel mailing list >> > Ocfs2-devel@xxxxxxxxxxxxxx >> > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> >> >> >> -- >> Thanks, >> Darwin -- Thanks, Darwin -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html