On 4/27/14, 6:15 PM, Dave Chinner wrote: > On Sun, Apr 27, 2014 at 04:56:07PM -0500, Eric Sandeen wrote: >> On 4/27/14, 4:20 PM, Dave Chinner wrote: >>> On Fri, Apr 25, 2014 at 02:42:21PM -0500, Eric Sandeen wrote: >>>> Add a heuristic to flush data to a file which looks like it's >>>> going through a tmpfile/rename dance, but not fsynced. >>>> >>>> I had a report of a system with many 0-length files after >>>> package updates; as it turns out, the user had basically >>>> done 'yum update' and punched the power button when it was >>>> done. >>> >>> So yum didn't run sync() on completion of the update? That seems >>> rather dangerous to me - IMO system updates need to be guaranteed to >>> be stable by the update mechanisms, not to leave the system state to >>> chance if power fails or the system crashes immediately after an >>> update... >>> >>> >>>> Granted, the admin should not do this. Granted, the package >>>> manager should ensure persistence of files it updated. >>> >>> Yes, yes it should. Problem solved without needing to touch XFS. >> >> Right, I first suggested it 5 years or so ago for RPM. But hey, who >> knows, someday maybe. > > grrrrr. > >> So no need to touch XFS, just every godawful userspace app out there... >> >> Somebody should bring up the topic to wider audience, I'm sure they'll >> all get fixed in short order. Wait, or did we try that already? :) > > I'm not talking about any random application. Package managers are > *CRITICAL SYSTEM INFRASTRUCTURE*. They should be architectected to > handle failures gracefully; following *basic data integrity rules* > is a non-negotiable requirement for a system upgrade procedure. > Leaving the system in an indeterminate and potentially inoperable > state after a successful upgrade completion is reported is a > completely unacceptable outcome for any system management operation. > > Critical infrastructure needs to Do Things Right, not require other > people to hack around it's failings and hope that they might be able > to save the system when shit goes wrong. There is no excuse for > critical infrastructure developers failing to acknowledge and > address the data integrity requirements of their infrastructure. Yeah, I know - choir, preaching, etc. >>>> Ext4, however, added a heuristic like this for just this case; >>>> someone who writes file.tmp, then renames over file, but >>>> never issues an fsync. >>> >>> You mean like rsync does all the time for every file it copies? >> >> Yeah, I guess rsync doesn't fsync either. ;) > > That's because rsync doesn't need to sync until it completes all of > the data writes. A failed > rsync can simply be re-run after the system comes back up and > nothing is lost. That's a very different situation to a package > manager replacing binaries that the system may need to boot, yes? yeah, my point is that rsync overwrites exiting files and _never_ syncs. Not per-file, not at the end, not with any available option, AFAICT. Different situation, yes, but arguably just as bad under the wrong circumstances. >>>> Now, this does smack of O_PONIES, but I would hope that it's >>>> fairly benign. If someone already synced the tmpfile, it's >>>> a no-op. >>> >>> I'd suggest it will greatly impact rsync speed and have impact on >>> the resultant filesystem layout as it guarantees interleaving of >>> metadata and data on disk.... >> >> Ok, well, based on the responses thus far, sounds like a non-starter. >> >> I'm not wedded to it, just thought I'd float the idea. >> >> OTOH, it is an interesting juxtaposition to say the open O_TRUNC case >> is worth catching, but the tempfile overwrite case is not. > > We went through this years ago - the O_TRUNC case is dealing with > direct overwrite of data which we can reliably detect, usually only > occurs one file at a time, has no major performance impact and data > loss is almost entirely mitigated by the flush-on-close behaviour. > It's a pretty reliable mitigation mechanism. [citation needed] for a some of that, but *shrug* > Rename often involves many files (so much larger writeback delay on > async flush), it has cases we can't catch (e.g. rename of a > directory containing unsynced data files) and has much more > unpredictable behaviour (e.g. rename of files being actively written > to). There's nothing worse than having unpredictable/non-repeatable > data loss scenarios - if we can't handle all rename cases with the > same guarantees, then we shouldn't provide any data integrity > guarantees at all. Ok, so it's a NAK. I'm over it already, -Eric > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs