On Fri, Feb 08, 2019 at 01:47:30PM +1100, Dave Chinner wrote: > On Thu, Feb 07, 2019 at 10:52:43AM -0500, Brian Foster wrote: > > On Thu, Feb 07, 2019 at 04:39:41PM +1100, Dave Chinner wrote: > > > On Wed, Feb 06, 2019 at 09:21:14PM -0800, Darrick J. Wong wrote: > > > > On Thu, Feb 07, 2019 at 04:08:10PM +1100, Dave Chinner wrote: > > > > > Hi folks, > > > > > > > > > > I've just finished analysing an IO trace from a application > > > > > generating an extreme filesystem fragmentation problem that started > > > > > with extent size hints and ended with spurious ENOSPC reports due to > > > > > massively fragmented files and free space. While the ENOSPC issue > > > > > looks to have previously been solved, I still wanted to understand > > > > > how the application had so comprehensively defeated extent size > > > > > hints as a method of avoiding file fragmentation. > .... > > > FWIW, I think the scope of the problem is quite widespread - > > > anything that does open/something/close repeatedly on a file that is > > > being written to with O_DSYNC or O_DIRECT appending writes will kill > > > the post-eof extent size hint allocated space. That's why I suspect > > > we need to think about not trimming by default and trying to > > > enumerating only the cases that need to trim eof blocks. > > > > > > > To further this point.. I think the eofblocks scanning stuff came long > > after the speculative preallocation code and associated release time > > post-eof truncate. > > Yes, I cribed a bit of the history of the xfs_release() behaviour > on #xfs yesterday afternoon: > > <djwong> dchinner: feel free to ignore this until tomorrow if you want, but /me wonders why we'd want to free the eofblocks at close time at all, instead of waiting for inactivation/enospc/background reaper to do it? > <dchinner> historic. People doing operations then complaining du didn't match ls > <dchinner> stuff like that > <dchinner> There used to be a open file cache in XFS - we'd know exactly when the last reference went away and trim it then > <dchinner> but that went away when NFS and the dcache got smarter about file handle conversion > <dchinner> (i.e. that's how we used to make nfs not suck) > <dchinner> that's when we started doing work in ->release > <dchinner> it was close enough to "last close" for most workloads it made no difference. > <dchinner> Except for concurrent NFS writes into the same directory > <dchinner> and now there's another pathological application that triggers problems > <dchinner> The NFS exception was prior to having thebackground reaper > <dchinner> as these things goes the background reaper is relatively recent functionality > <dchinner> so perhaps we should just leave it to "inode cache expiry or background reaping" and not do it on close at al > Thanks. > > I think the background scanning was initially an > > enhancement to deal with things like the dirty release optimization > > leaving these blocks around longer and being able to free up this > > accumulated space when we're at -ENOSPC conditions. > > Yes, amongst other things like slow writes keeping the file open > forever..... > > > Now that we have the > > scanning mechanism in place (and a 5 minute default background scan, > > which really isn't all that long), it might be reasonable to just drop > > the release time truncate completely and only trim post-eof blocks via > > the bg scan or reclaim paths. > > Yeah, that's kinda the question I'm asking here. What's the likely > impact of not trimming EOF blocks at least on close apart from > people complaining about df/ls not matching du? > Ok. ISTM it's just a continuation of the same "might confuse some users" scenario that pops up occasionally. It also seems that kind of thing has died down as either most people don't really know or care about the transient state or are just more familiar with it at this point. IME, complex applications that depend on block ownership stats (userspace filesystems for example) already have to account for speculative preallocation with XFS, so tweaking the semantics of the optimization shouldn't really have much of an impact that I can tell so long as the broader/long-term behavior doesn't change[1]. I suppose there are all kinds of other applications that are technically affected by dropping the release time trim (simple file copies, archive extraction, etc.), but it's not clear to me that matters so long as we have effective bg and -ENOSPC scans. The only thing I can think of so far is whether we should consider changes to the bg scan heuristics to accommodate scenarios currently covered by the release time trim. For example, the release time scan doesn't consider whether the file is dirty or not while the bg scan always skips "active" files. Brian [1] Which isn't the case for a change to not bg trim files with extent size hints. > I don't really care about that anymore because, well, reflink/dedupe > completely break any remaining assumption that du reported space > consumption is related to the file size (if sparse files wasn't > enough of a hint arlready).... > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx