On Fri, Jun 09, 2017 at 04:35:26AM +0100, Al Viro wrote: > On Thu, Jun 08, 2017 at 05:11:39PM -0700, Richard Narron wrote: > > > Test results don't look pretty on FreeBSD. (I will also test OpenBSD and > > NetBSD.) > > OK, here's the cumulative diff so far - easy-to-backport parts only; that'll > be split into 6 commits (plus whatever else gets added). It really needs > beating... FWIW, so far it seems to survive xfstest generic/*, modulo simulated power loss - I'm running it without -o sync and we don't have UFS2 journalling support, so that's to be expected... Tons of tests don't run due to lack of various (mis)features, so it's not _that_ much, and there's nothing that would try to deliberately hit UFS-specific interesting cases. xattrs and acls can be supported reasonably easily, so can quota. O_DIRECT is a real bitch for fragment reallocation handling - no idea how painful would that be. UFS2 journal support is probably a lot more massive work than I'm willing to go into. Another bug I see there is recovery after failing copy from userland in write() on append-only file. We have allocated blocks already, so we might need to truncate the damn things. However, ufs_truncate_blocks() will see IS_APPEND(inode) and bail out, leaving garbage in the end of file. Not that hard to fix - these checks are simply not needed in the ufs_write_failed() case. I'm not happy with the way tail unpacking is done - we *probably* manage to avoid deadlocks, but the proof is a whole lot more subtle than I'd like, assuming it is correct in the first place. And we have a nasty trap caused by the way balloc works: when doing reallocation on failing attempt to extend tail in place we do have logics that tries to put the new copy into an empty block if filesystem is not too fragmented, but the *first* allocation has nothing of that sort going on. So if you have a block with 7 fragments in it in each cylinder group (just create a bunch of 28Kb files in different directories), any attempt to write more than 4K into a new file will *always* go like this: * for the first page, allocate 4Kb fragment. That has a goof chance of going into that almost full block - all * for the next page, notice that we need to expand that tail and can't do that in place. Now the anti-fragmentation heuristics hits and we pick two fragments in an empty block. And copy the one we'd just written into the new place. * next 6 pages go extending the tail we'd got. However, on the next page the whole thing repeats. FreeBSD avoids that mess by doing bigger allocations - in the same scenario it would've gone in 32Kb steps rather than 4Kb ones. Looks like we need a different ->write_iter() there; generic one is bloody painful in that respect...