On Sat, Nov 19, 2011 at 04:37:13PM +0800, Jeff Liu wrote: > > Signed-off-by: Jie Liu <jeff.liu@xxxxxxxxxx> Christoph has commented on the code-related aspects of the patch, so I won't repeat that. I'll comment on structural/design issues instead. Firstly, the patch splits the functionality arbitrarily over three different files, and I don't think that is necessary. There really is no reason at all for xfs_bmap.c to know anything aout SEEK_HOLE/SEEK_DATA semantics - that file is for extent manipulation and search functions. SEEK semantics should be entirely encoded into the function that deals with the seeking. Secondly, xfs_find_desired_extent() and xfs_seek_hole_data() should be one function, and named something like xfs_file_seek_extent(). Finally, don't think using low level extent search functions like xfs_bmap_search_extents() is the right level to implement this functionality (see my comments about SEEK_HOLE/SEEK_DATA semantics in xfs-bmap.c above), especially as we already have functions for looking up holes and extents at given offsets. That is, to find the next hole at or after after a given offset, we already have xfs_bmap_first_unused(). Indeed, this function already has the exact semantics that SEEK_HOLE requires. Simply put: case SEEK_HOLE: fsb = XFS_B_TO_FSBT(mp, start_offset); error = xfs_bmap_first_unused(NULL, ip, 1, &fsb, XFS_DATA_FORK); if (error) return -error; if (fsb <= XFS_B_TO_FSBT(mp, start_offset)) return start_offset; return XFS_FSB_TO_B(mp, fsb); As to the data extent search, I'd prefer you to use xfs_bmapi_read() rather than xfs_bmap_search_extents() directly. I'd prefer that we return unwritten extents as holes rather than data from the initial implementation, and using the low level xfs_bmap_search_extents() makes that quite complex. However, we already have a function for handling that complexity: xfs_bmapi_read(). That is, xfs_bmapi_read() needs to be passed an array of two maps, one for the current offset, and one for the next extent type. This makes one call sufficient for most transitions. Done in a simple loop it will handle all conditions of hole->unwritten->hole.... until it finds a data extent of EOF. start_fsbno = XFS_B_TO_FSBT(mp, start_offset); while (1) { struct xfs_bmbt_irec maps[2]; int nmaps = 2; count_fsb = XFS_B_TO_FSB(mp, XFS_MAXIOFFSET(mp)); error = xfs_bmapi_read(ip, fsbno, count_fsb, &maps, &nmaps, XFS_BMAPI_ENTIRE); if (error) return -error; if (!nmaps) { /* no extents at given offset, must be beyond EOF */ return -ENXIO; } switch (map[0].br_startblock) { case DELAYSTARTBLOCK: /* landed in an in-memory data extent */ return map[0].br_startoff; default: /* landed in an allocated extent */ if (map[0].br_state == XFS_EXT_NORM) { /* a real data extent */ return map[0].br_startoff; } /* Fall through to hole handling for unwritten extents */ case HOLESTARTBLOCK: /* * landed in a hole. If the next extent is a data * extent, then return the start of it, otherwise * we need to move the start offset and map more * blocks. */ if (map[1].br_startblock == DELAYSTARTBLOCK || ((map[1].br_startblock != HOLESTARTBLOCK && map[1].br_state == XFS_EXT_NORM))) return map[1].br_startoff; start_fsbno = map[1].br_startoff + map[1].br_blockcount; break; } if (XFS_FSB_TO_B(mp, start_fsbno) > ip->i_size) { /* Beyond EOF now */ return -ENXIO; } } This can pretty much all be done in fs/xfs/xfs_file.c::xfs_file_seek_extent() because all the functions used by the above code are exported from xfs_bmap.c for external use - that solves the scattering problem and uses interfaces that we already know work in the intended manner.... ;) BTW: > +xfs_file_llseek( > + struct file *file, > + loff_t offset, > + int origin) > +{ > + struct inode *inode = file->f_mapping->host; > + int ret; > + > + if (origin != SEEK_DATA && origin != SEEK_HOLE) > + return generic_file_llseek(file, offset, origin); > + > + mutex_lock(&inode->i_mutex); > + switch (origin) { We don't need the i_mutex to be held here. We only need to hold the ilock in shared mode for this operation to protect against extent list modifications (like unwritten extent conversion and truncation). > +int > +xfs_find_desired_extent( > + struct inode *inode, > + loff_t *start, > + u32 type) > +{ > + xfs_inode_t *ip = XFS_I(inode); > + xfs_mount_t *mp = ip->i_mount; > + struct xfs_ifork *ifp; > + int lock; > + int error; > + > + if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS && > + ip->i_d.di_format != XFS_DINODE_FMT_BTREE && > + ip->i_d.di_format != XFS_DINODE_FMT_LOCAL) > + return XFS_ERROR(EINVAL); > + > + xfs_ilock(ip, XFS_IOLOCK_SHARED); > + > + /* > + * Flush the delay alloc blocks. Even after flushing the inode, > + * there can still be delalloc blocks on the inode beyond EOF > + * due to speculative preallocation. These are not removed until > + * the release function is called or the inode is inactivated. > + * Hence we cannot assert here that ip->i_delayed_blks == 0. > + */ > + if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) { > + error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF); > + if (error) > + goto out_unlock_iolock; > + } i.e. this IOLOCK and flush is completely unnecessary because we'll find delayed allocation extents in the extent lookup and can handle them just like real allocated extents.... > + lock = xfs_ilock_map_shared(ip); i.e. this is the only lock we need to take. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs