Thanks for the feedback. I will have an updated patch shortly. There is a slight problem in that f_ffree is an unsigned 64 so that it does not see something going negative but that is easy to work around. I'll send out the tested patch in a bit. Thanks, --- Stuart Brodsky 2750 Blue Water Road Eagan, MN. 55121 651-683-7910 sbrodsky@xxxxxxx On Jul 19, 2010, at 6:11 PM, Dave Chinner wrote: > Hi Stuart, > > A couple of minor list-etiquette comments first: > > - please don't top post when replying as it makes it > really hard to understand what you are replying to > because there is no context to your reply. > > - don't remove the mailing list from the CC list while > discussing a patch as there are plenty more people than > just the person who sent the email that might want to put > their 2c worth in... > > I've rearranged your reply and re-added the mailing list to the CC > list.... > > On Mon, Jul 19, 2010 at 07:11:55AM -0500, Stuart Brodsky wrote: >> On Jul 18, 2010, at 7:14 PM, Dave Chinner wrote: >>> On Fri, Jul 16, 2010 at 01:53:41PM -0500, Stuart Brodsky wrote: >>>> This patch fixes the stat -f icount field going negative. The use of >>>> lazy counts means that the super block field sb_icount is not updated >>>> correctly and will allow over allocation of inodes above maxicount. > ..... >>>> Index: a/fs/xfs/xfs_trans.c >>>> =================================================================== >>>> --- a/fs/xfs/xfs_trans.c.orig 2010-07-16 10:12:02.000000000 -0500 >>>> +++ b/fs/xfs/xfs_trans.c 2010-07-16 10:22:47.690249548 -0500 >>>> @@ -805,6 +805,7 @@ >>>> switch (field) { >>>> case XFS_TRANS_SB_ICOUNT: >>>> tp->t_icount_delta += delta; >>>> + atomic64_add(delta,&tp->t_mountp->m_inodesinuse); >>>> if (xfs_sb_version_haslazysbcount(&mp->m_sb)) >>>> flags &= ~XFS_TRANS_SB_DIRTY; >>>> break; >>> >>> This is still racy. It may fix your specific reproducer but I can't see how >>> changing the in use inode count from late in xfs_trans_commit() to early in >>> xfs_trans_commit() solves the problem: if we look at it >>> like this: > .... >>> Realistically, there is nothing wrong with XFS exceeding mp->m_maxicount by a >>> small amount - it's just an arbitrary limit that we use to prevent all the >>> filesytem space from being taken up by inodes. I think the problem is just a >>> reporting problem here in xfs_fs_statfs(): >>> >>> 1245 fakeinos = statp->f_bfree << sbp->sb_inopblog; >>> 1246 statp->f_files = >>> 1247 MIN(sbp->sb_icount + fakeinos, (__uint64_t)XFS_MAXINUMBER); >>> 1248 if (mp->m_maxicount) >>> 1249 statp->f_files = min_t(typeof(statp->f_files), >>> 1250 statp->f_files, >>> 1251 mp->m_maxicount); >>> 1252 statp->f_ffree = statp->f_files - (sbp->sb_icount - sbp->sb_ifree); >>> >>> Assume no free space and no free inodes (statp->f_bfree = 0, => fakeinos = 0), >>> and we have just hit the above race condition to allocate more inodes than >>> mp->m_maxicount. Hence we have sbp->sb_icount > mp->m_maxicount and that gives: >>> >>> statp->f_files = min(sbp->sb_icount, mp->m_maxicount); >>> statp->f_files = mp->m_maxicount; >>> >>> And now we've allocated all the free inodes (sbp->sb_ifree = 0). Therefore: >>> >>> statp->f_ffree = mp->m_maxicount - (sbp->sb_icount - 0) >>> >>> Which gives statp->f_ffree < 0 and hence negative free inodes. I think >>> all that is required to "fix" this is to clamp statp->f_ffree to zero. >>> i.e.: >>> >>> if (statp->f_ffree < 0) >>> statp->f_ffree = 0; >>> >>> Make sense? >> >> I agree that if we are willing to accept the fact that the actual >> number of inodes allocated can be over maxicount then your fix is >> correct in just clamping the f_ffree to 0. > > Given that it already exceeds maxicount (and has for years ;) and > nothing breaks except for the stats, I don't see any problem > here. It seems much more complex to do anythign accurate and AFAICT > it is not necessary to be absolutely accurate. > > Anyway, we have to allow the allocated inode count to exceed > maxicount as you can use growfs to reduce the allowed inode > percentage in the filesystem. Hence if we are near maxicount > allocated inodes and we reduce the max inode percentage then the > allocated inode count will exceed maxicount. In that case, we really > do need to report zero free inodes, not some negative number.... > >> I just don't like >> using a counter that has the potential to be out of date any where >> in the code. > > I'm not sure I follow you - nothing gets out of date AFAICT. > >> In this case as you suggest being over maxicount is >> not the end of the world. We will report ENOSPC a bit later but >> we still will detect it. > > We still report ENOSPC at exactly the same time as we do now. > Nothing there changes, only the information going to userspace is > sanitised. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs