Hi Stuart, A couple of minor list-etiquette comments first: - please don't top post when replying as it makes it really hard to understand what you are replying to because there is no context to your reply. - don't remove the mailing list from the CC list while discussing a patch as there are plenty more people than just the person who sent the email that might want to put their 2c worth in... I've rearranged your reply and re-added the mailing list to the CC list.... On Mon, Jul 19, 2010 at 07:11:55AM -0500, Stuart Brodsky wrote: > On Jul 18, 2010, at 7:14 PM, Dave Chinner wrote: > > On Fri, Jul 16, 2010 at 01:53:41PM -0500, Stuart Brodsky wrote: > >> This patch fixes the stat -f icount field going negative. The use of > >> lazy counts means that the super block field sb_icount is not updated > >> correctly and will allow over allocation of inodes above maxicount. ..... > >> Index: a/fs/xfs/xfs_trans.c > >> =================================================================== > >> --- a/fs/xfs/xfs_trans.c.orig 2010-07-16 10:12:02.000000000 -0500 > >> +++ b/fs/xfs/xfs_trans.c 2010-07-16 10:22:47.690249548 -0500 > >> @@ -805,6 +805,7 @@ > >> switch (field) { > >> case XFS_TRANS_SB_ICOUNT: > >> tp->t_icount_delta += delta; > >> + atomic64_add(delta,&tp->t_mountp->m_inodesinuse); > >> if (xfs_sb_version_haslazysbcount(&mp->m_sb)) > >> flags &= ~XFS_TRANS_SB_DIRTY; > >> break; > > > > This is still racy. It may fix your specific reproducer but I can't see how > > changing the in use inode count from late in xfs_trans_commit() to early in > > xfs_trans_commit() solves the problem: if we look at it > > like this: .... > > Realistically, there is nothing wrong with XFS exceeding mp->m_maxicount by a > > small amount - it's just an arbitrary limit that we use to prevent all the > > filesytem space from being taken up by inodes. I think the problem is just a > > reporting problem here in xfs_fs_statfs(): > > > > 1245 fakeinos = statp->f_bfree << sbp->sb_inopblog; > > 1246 statp->f_files = > > 1247 MIN(sbp->sb_icount + fakeinos, (__uint64_t)XFS_MAXINUMBER); > > 1248 if (mp->m_maxicount) > > 1249 statp->f_files = min_t(typeof(statp->f_files), > > 1250 statp->f_files, > > 1251 mp->m_maxicount); > > 1252 statp->f_ffree = statp->f_files - (sbp->sb_icount - sbp->sb_ifree); > > > > Assume no free space and no free inodes (statp->f_bfree = 0, => fakeinos = 0), > > and we have just hit the above race condition to allocate more inodes than > > mp->m_maxicount. Hence we have sbp->sb_icount > mp->m_maxicount and that gives: > > > > statp->f_files = min(sbp->sb_icount, mp->m_maxicount); > > statp->f_files = mp->m_maxicount; > > > > And now we've allocated all the free inodes (sbp->sb_ifree = 0). Therefore: > > > > statp->f_ffree = mp->m_maxicount - (sbp->sb_icount - 0) > > > > Which gives statp->f_ffree < 0 and hence negative free inodes. I think > > all that is required to "fix" this is to clamp statp->f_ffree to zero. > > i.e.: > > > > if (statp->f_ffree < 0) > > statp->f_ffree = 0; > > > > Make sense? > > I agree that if we are willing to accept the fact that the actual > number of inodes allocated can be over maxicount then your fix is > correct in just clamping the f_ffree to 0. Given that it already exceeds maxicount (and has for years ;) and nothing breaks except for the stats, I don't see any problem here. It seems much more complex to do anythign accurate and AFAICT it is not necessary to be absolutely accurate. Anyway, we have to allow the allocated inode count to exceed maxicount as you can use growfs to reduce the allowed inode percentage in the filesystem. Hence if we are near maxicount allocated inodes and we reduce the max inode percentage then the allocated inode count will exceed maxicount. In that case, we really do need to report zero free inodes, not some negative number.... > I just don't like > using a counter that has the potential to be out of date any where > in the code. I'm not sure I follow you - nothing gets out of date AFAICT. > In this case as you suggest being over maxicount is > not the end of the world. We will report ENOSPC a bit later but > we still will detect it. We still report ENOSPC at exactly the same time as we do now. Nothing there changes, only the information going to userspace is sanitised. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs