Hi Dave. 2014-02-11 8:32 GMT+09:00, Dave Chinner <david@xxxxxxxxxxxxx>: > On Sun, Feb 02, 2014 at 02:44:11PM +0900, Namjae Jeon wrote: >> From: Namjae Jeon <namjae.jeon@xxxxxxxxxxx> >> >> Add support FALLOC_FL_COLLAPSE_RANGE for fallocate. >> >> Signed-off-by: Namjae Jeon <namjae.jeon@xxxxxxxxxxx> >> Signed-off-by: Ashish Sangwan <a.sangwan@xxxxxxxxxxx> > > A more detailed description would be nice for the change logs. > ..... Okay, I will update it on next version. > >> + while (nexts++ < num_exts && >> + *current_ext < XFS_IFORK_NEXTENTS(ip, whichfork)) { >> + >> + gotp = xfs_iext_get_ext(ifp, *current_ext); >> + xfs_bmbt_get_all(gotp, &got); >> + startoff = got.br_startoff - offset_shift_fsb; >> + >> + /* >> + * Before shifting extent into hole, make sure that the hole >> + * is large enough to accomodate the shift. >> + */ >> + if (*current_ext) { >> + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, >> + *current_ext - 1), &left); >> + >> + if (startoff < left.br_startoff + left.br_blockcount) >> + error = XFS_ERROR(EINVAL); >> + >> + } else if (startoff > xfs_bmbt_get_startoff(gotp)) { >> + /* Hole is at the start but not large enough */ >> + error = XFS_ERROR(EINVAL); >> + } > > This second branch seems wrong to me: > > startoff = got.br_startoff - offset_shift_fsb; > and > got.br_startoff = xfs_bmbt_get_startoff(gotp)). > > I'm not 100% sure what you are trying to check in this case - > perhaps some basic ascii art to describe the two cases is in order > here: > > left hole got > +-------+hhhhhhhhhhhhhhh+---------+ > LS LE GS GE > HS HE > > The first is checking that GS - offset_shift_fsb is greater than LE. > i.e the shift doesn't overrun the hole betwenn LE and GS. > > left hole got > +-------+hhhhhhhhhhhhhhh+---------+ > LS LE GS GE > HS HE > +-------+hhhhhhh+---------+ > LS LE GS' GE' > HS HE' > > The second I can't visualise from the code or comment.... When we shift first extent, *current_ext will be 0. So we need to check that offset_shift_fsb ( Number of blocks to be shifted ) should be less than starting offset of the first extent. So, code will be changed more clearly like this. + else if (offset_shift_fsb > got.br_startoff) { + /* Hole is at the start but not large enough */ + error = XFS_ERROR(EINVAL); + } And will update comment more clearly. > > >> + >> + if (error) >> + goto del_cursor; >> + >> + if (cur) { >> + error = xfs_bmbt_lookup_eq(cur, >> + got.br_startoff, >> + got.br_startblock, >> + got.br_blockcount, >> + &i); > > Whitespace comment - a more compact form is the typical XFS > convention if it will fit in 80 columns: Okay. I will fix it. > > error = xfs_bmbt_lookup_eq(cur, got.br_startoff, > got.br_startblock, > got.br_blockcount, &i); > >> + if (error) >> + goto del_cursor; >> + XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor); >> + } >> + >> + /* Check if we can merge 2 adjacent extents */ >> + if (*current_ext && >> + left.br_startoff + left.br_blockcount == startoff && >> + left.br_startblock + left.br_blockcount == >> + got.br_startblock && >> + left.br_state == got.br_state && >> + left.br_blockcount + got.br_blockcount <= MAXEXTLEN) { >> + blockcount = left.br_blockcount + >> + xfs_bmbt_get_blockcount(gotp); > > got.br_blockcount? Right. will fix it. > >> + xfs_iext_remove(ip, *current_ext, 1, 0); >> + if (cur) { >> + error = xfs_btree_delete(cur, &i); >> + if (error) >> + goto del_cursor; >> + XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor); >> + } >> + XFS_IFORK_NEXT_SET(ip, whichfork, >> + XFS_IFORK_NEXTENTS(ip, whichfork) - 1); >> + gotp = xfs_iext_get_ext(ifp, --*current_ext); >> + xfs_bmbt_get_all(gotp, &got); >> + >> + /* Make cursor point to the extent we will update */ >> + if (cur) { >> + error = xfs_bmbt_lookup_eq(cur, >> + got.br_startoff, >> + got.br_startblock, >> + got.br_blockcount, >> + &i); > > whitespace. Okay. > >> + if (error) >> + goto del_cursor; >> + XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor); >> + } >> + >> + xfs_bmbt_set_blockcount(gotp, blockcount); >> + got.br_blockcount = blockcount; >> + goto bmbt_update; >> + } >> + >> + /* We have to update the startoff */ >> + xfs_bmbt_set_startoff(gotp, startoff); >> + got.br_startoff = startoff; >> + >> +bmbt_update: > > Use an } else {} for this, and the goto can be removed. Okay, I will change. > > .... >> /* >> + * xfs_collapse_file_space() >> + * This routine frees disk space and shift extent for the given >> file. >> + * >> + * RETURNS: >> + * 0 on success >> + * errno on error >> + * >> + */ >> +int >> +xfs_collapse_file_space( >> + struct xfs_inode *ip, >> + xfs_off_t offset, >> + xfs_off_t len) >> +{ >> + int done = 0; >> + struct xfs_mount *mp = ip->i_mount; >> + struct xfs_trans *tp; >> + int error; >> + xfs_extnum_t current_ext = 0; >> + struct xfs_bmap_free free_list; >> + xfs_fsblock_t first_block; >> + int committed; >> + xfs_fileoff_t start_fsb; >> + xfs_fileoff_t shift_fsb; >> + >> + ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); >> + >> + trace_xfs_collapse_file_space(ip); >> + >> + start_fsb = XFS_B_TO_FSB(mp, offset + len); >> + shift_fsb = XFS_B_TO_FSB(mp, len); >> + >> + /* >> + * The first thing we do is to free data blocks in the specified range >> + * by calling xfs_free_file_space(). It would also sync dirty data >> + * and invalidate page cache over the region on which collapse range >> + * is working. >> + */ > > This can probably go in the function header as part of describing > the overall algorithm the code is using. Okay. > >> + error = xfs_free_file_space(ip, offset, len); >> + if (error) >> + return error; >> + >> + while (!error && !done) { >> + tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT); >> + tp->t_flags |= XFS_TRANS_RESERVE; >> + /* >> + * We would need to reserve permanent block for transaction. >> + * This will come into picture when after shifting extent into >> + * hole we found that adjacent extents can be merged which >> + * may lead to freeing of a block during record update. >> + */ >> + error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, >> + XFS_DIOSTRAT_SPACE_RES(mp, 0), 0); >> + if (error) { >> + ASSERT(error == ENOSPC || XFS_FORCED_SHUTDOWN(mp)); >> + xfs_trans_cancel(tp, 0); >> + break; >> + } >> + >> + xfs_ilock(ip, XFS_ILOCK_EXCL); >> + error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot, >> + ip->i_gdquot, ip->i_pdquot, >> + XFS_DIOSTRAT_SPACE_RES(mp, 0), 0, >> + XFS_QMOPT_RES_REGBLKS); >> + if (error) >> + goto out; >> + >> + xfs_trans_ijoin(tp, ip, 0); >> + >> + xfs_bmap_init(&free_list, &first_block); >> + >> + /* >> + * We are using the write transaction in which max 2 bmbt >> + * updates are allowed >> + */ > > Right, but you've only reserved blocks for a single BMBT split > through XFS_DIOSTRAT_SPACE_RES(). In cases of allocation, adjacent > offset allocation is guaranteed to only require one split at most > and hence it's safe from that perspective. However, I can see how a > shift can require a split on the first extent move, and a merge on > the second. e.g: > > > left middle right > before maxrecs minrecs + 1 minrecs > first shift maxrecs + 1 minrecs minrecs > split > maxrecs / 2 minrecs minrecs > second shift > maxrecs/2 + 1 minrecs - 1 minrecs > merge merge > minrecs*2 - 1 (freed) > > So the question is whether the transaction reservations (both log > space and block allocation requirements) are covered. Yes, As you said, we could require two splits for extent move and merge in some cases. So, I will change number of shift extents with decraring define you pointed like this. #define XFS_BMAP_MAX_SHIFT_EXTENTS 1 > >> + error = xfs_bmap_shift_extents(tp, ip, &done, start_fsb, >> + shift_fsb, ¤t_ext, >> + &first_block, &free_list, 2); > > And that should really have a #define associated with it. ie.: > > #define XFS_BMAP_MAX_SHIFT_EXTENTS 2 > > And document the constraints that lead to that number with the > definition. > > Overall, all I'm really looking for here is sufficient comments to > document the constraints the code is operating under. Functionally > the code looks correct (apart from the branch above I can't work > out). Mostly I just want to make sure that in a couple of > years time I don't have to work it all out from first principles > again. ;) Okay, I will add sufficient comments to maintain easily this change. Thanks for your review! > > Cheers, > > Dave. > > -- > Dave Chinner > david@xxxxxxxxxxxxx > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html