On Mon, Aug 27, 2012 at 03:51:49PM -0400, Brian Foster wrote: > xfs_inodes_free_eofblocks() implements scanning functionality for > EOFBLOCKS inodes. It scans the radix tree and frees post-EOF blocks > for inodes that meet particular criteria. The scan can be filtered > by a particular quota type/id and minimum file size. The scan can > also be invoked in trylock mode or wait (force) mode. > > The xfs_free_eofblocks() helper is invoked to clear post-EOF space. > It is slightly modified to support an output parameter that > indicates whether space was freed and helps decide whether the > EOFBLOCKS tag should be cleared in trylock scans. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > fs/xfs/xfs_sync.c | 168 +++++++++++++++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_sync.h | 3 + > fs/xfs/xfs_vnodeops.c | 17 +++-- > fs/xfs/xfs_vnodeops.h | 2 + > 4 files changed, 184 insertions(+), 6 deletions(-) > > diff --git a/fs/xfs/xfs_sync.c b/fs/xfs/xfs_sync.c > index 5e14741..27c3c46 100644 > --- a/fs/xfs/xfs_sync.c > +++ b/fs/xfs/xfs_sync.c > @@ -971,6 +971,174 @@ xfs_reclaim_inodes_count( > return reclaimable; > } > > +/* > + * Handle an EOFBLOCKS tagged inode. If this is a forced scan, we wait on the > + * iolock ourselves rather than rely on the trylock in xfs_free_eofblocks(). > + * > + * We rely on the output parameter from xfs_free_eofblocks() to determine > + * whether we should clear the tag because in the trylock case, it could have > + * skipped the inode due to lock contention. > + */ > +STATIC int > +xfs_inode_free_eofblocks( > + struct xfs_inode *ip, > + int flags) > +{ > + int ret = 0; > + bool freed = false; > + bool wait_iolock = (flags & EOFBLOCKS_WAIT) ? true : false; > + > + if (wait_iolock) > + xfs_ilock(ip, XFS_IOLOCK_EXCL); Why do we need the IO lock here? xfs_free_eofblocks() does all the necessary locking.... > + > + if ((S_ISREG(ip->i_d.di_mode) && > + (VFS_I(ip)->i_size > 0 || > + (VN_CACHED(VFS_I(ip)) > 0 || ip->i_delayed_blks > 0)) && > + (ip->i_df.if_flags & XFS_IFEXTENTS)) && > + (!(ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))) { This check is now repeated in 3 places - xfs_inactive, xfs_release and now here. I think it needs a helper. > + /* !wait_iolock == need_iolock in xfs_free_eofblocks() */ > + ret = xfs_free_eofblocks(ip->i_mount, ip, !wait_iolock, &freed); > + if (freed) > + xfs_inode_clear_eofblocks_tag(ip); If you move xfs_inode_clear_eofblocks_tag() inside xfs_free_eofblocks(), there's no need for this extra return value. > + } else { > + /* inode could be preallocated or append-only */ > + xfs_inode_clear_eofblocks_tag(ip); This should be a rare event - it's probably worth adding a pair of trace events here for the two cases so we can see if there is ever a significant number of inodes being scanned for prealloc that can't be cleared... (e.g 'perf top -e xfs:xfs_i*' to count all the inode events) > + } > + > + if (wait_iolock) > + xfs_iunlock(ip, XFS_IOLOCK_EXCL); > + > + return ret; > +} > + > +/* > + * Determine whether an inode matches a particular qouta id. > + */ > +STATIC int > +xfs_inode_match_quota_id( > + struct xfs_inode *ip, > + int qtype, > + uint32_t id) > +{ > + switch (qtype) { > + case XFS_DQ_USER: > + return ip->i_d.di_uid == id; > + case XFS_DQ_GROUP: > + return ip->i_d.di_gid == id; > + default: > + return xfs_get_projid(ip) == id; > + } > + > + return 0; > +} There's nothing really quota specific about this scan. I'd leave this functionality to a separate patch once all the core infrastructure is in place. > + > +/* > + * This is mostly copied from xfs_reclaim_inodes_ag(). > + * > + * TODO: > + * - Could we enhance ag_iterator to support a tag and use it instead of this? Yes. This code is too tricky to duplicate for every use case, and this doesn't have special case requirements like the reclaim code. i.e. the xfs_inode_free_eofblocks() becomes the execute function (and the quota checks move inside that eventually). Passing a tag of "-1" would indicate a non-tag lookup, otherwise use a tag based lookup. Given the extra fields that this version uses, passing a void *args is probably necessary so that a structure can be passed to the execute function along with the flags.... I'd suggest this conversion should be done in a patch prior to introducing this scanner. FWIW, this is going to conflict with my "get rid of xfs-sync.c patch series, so we'll need to work out who rebases what at some point. > + */ > +int > +xfs_inodes_free_eofblocks( > + struct xfs_mount *mp, > + int qtype, > + uint32_t id, > + uint64_t min_file_size, > + int flags) > +{ ..... > + for (i = 0; i < nr_found; i++) { > + if (!batch[i]) > + continue; > + > + /* default projid represents a full scan */ I don't think thats a good idea. From a normal users perspective, the background trimming will occur irrespective of the quota groups the inode is part of. Background trimming defines the default behaviour, because that's what 99.99% of users will see active, not quota/application specific events driven through ioctls. IOWs, selecting inodes by quota type/id for pruning is a secondary function of the execute implementation, not a primary concern of the infrastructure. > + if ((!(qtype == XFS_DQ_PROJ && > + id == XFS_PROJID_DEFAULT) && > + !xfs_inode_match_quota_id(batch[i], qtype, > + id)) || > + (min_file_size && XFS_ISIZE(batch[i]) < > + min_file_size) > + ) { > + IRELE(batch[i]); > + continue; > + } Moving this check to the execute function will get rid of the indent mess.... > + > + error = xfs_inode_free_eofblocks(batch[i], flags); > + IRELE(batch[i]); > + if (error) > + last_error = error; > + } > + > + cond_resched(); > + > + } while (nr_found && !done); > + > + xfs_perag_put(pag); > + } > + > + return XFS_ERROR(last_error); > +} > + > STATIC void > __xfs_inode_set_eofblocks_tag( > struct xfs_perag *pag, > diff --git a/fs/xfs/xfs_sync.h b/fs/xfs/xfs_sync.h > index 4486491..78aca41 100644 > --- a/fs/xfs/xfs_sync.h > +++ b/fs/xfs/xfs_sync.h > @@ -43,8 +43,11 @@ void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip); > void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag, > struct xfs_inode *ip); > > +#define EOFBLOCKS_WAIT 0x0001 I'd just reuse SYNC_WAIT and SYNC_TRYLOCK which are already defined and used by the sync and reclaim iterators. > + > void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip); > void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip); > +int xfs_inodes_free_eofblocks(struct xfs_mount *, int, uint32_t, uint64_t, int); > > int xfs_sync_inode_grab(struct xfs_inode *ip); > int xfs_inode_ag_iterator(struct xfs_mount *mp, > diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c > index 658ee2e..53460f3 100644 > --- a/fs/xfs/xfs_vnodeops.c > +++ b/fs/xfs/xfs_vnodeops.c > @@ -150,11 +150,12 @@ xfs_readlink( > * when the link count isn't zero and by xfs_dm_punch_hole() when > * punching a hole to EOF. > */ > -STATIC int > +int > xfs_free_eofblocks( > xfs_mount_t *mp, > xfs_inode_t *ip, > - bool need_iolock) > + bool need_iolock, > + bool *blocks_freed) I don't really see a point to adding this. Either we removed all the EOF blocks or we didn't, and that means we should just clear the tags directly in this function if it is appropriate. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs