On Sun, Jun 9, 2013 at 7:02 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote: As I've posted previously, despite my best efforts and advice to customers, I still have to deal with the results of unclean shutdowns. And that is specifically what I am concerned about. If I've given the impression that I don't trust xfs or ext4 in normal operation, it was unintentional. I have the greatest confidence in them. I have particularly recent experience with unclean shutdowns here in OKC. One can say that I and the operating system are not responsible for the unwise (or not so unwise) things that other people might do which result in the unclean shutdowns. But ultimately, it is my responsibility to do my level best, despite everything, to see that data is not lost. It's my charge. And its what I'll do come hell, high water, or tornadoes. And I do have a pragmatic solution to the problem which has worked well for 3 years. But I'm open to other options. > I don't recommend nodelalloc just because I don't know that it's thoroughly > tested. I can help a bit there. At least regarding this particular Cobol workload, since it's a category that I've been running for about 25 years. The SysV filesystem of AT&T Unix '386 & 3B2, Xenix's filesystem, SCO Unix 4.x's Acer Fast Filesystem, and ext2 all performed similarly. Occasionally, file rebuilds were necessary after a crash. SCO Open Server 5's HTFS did better, IIRC. I have about 12 years of experience with ext3. And I cannot recall a time that I ever had data inconsistency problems. (Probably a more accurate way to put it than "data loss".) It's possible that I might have had 1 or 2 minor issues. 12 years is a long time. I might have forgotten. But it was a period of remarkable stability. This is why when people say "Oh, but it can happen under ext3, too!" it doesn't impress me particularly. Of course it "could". But I have 12 years of experience by which to gauge the relative likelihood. Now, with ext4 at it's defaults, it was an "every time" thing regarding serious data problems and unclean shutdowns, until I realized what was going on. I can tell you that in 3 or using nodelalloc on those data volumes, it's been smooth sailing. No unexpected problems. For reasons you note, I do try to keep things at the defaults as much as possible. That is generally the safe and best tested way to go. And it's one reason I don't go all the way and use data=journal. I remember one reports, some years ago, where ext3 was found to have a specific data loss issue... but only for people mounting it data=journal. But regarding nodelalloc not providing perfect protection... "perfection" is the enemy of "good". I'm a pragmatist. And nodelalloc works very well, while still providing acceptable performance, with no deleterious side-effects. At least in my experience, and on this category of workload, I would feel comfortable recommending it to others in similar situations, with the caveat that YMMV. > You probably need to define what _you_ mean by resiliency. I need for the metadata to be in a consistent state. And for the data to be in a consistent state. I do not insist upon that state being the last state written to memory by the application. Only that the resulting on-disk state reflect a valid state that the in-memory image had seen at some time, even for applications written in environments which have no concept of fsync or fdatasync, or where the program (e.g. virt-manager or cupsd) don't do proper fsyncs. i.e. I need ext3 data=ordered behavior. And I'm not at all embarrassed to say that I need (not want) a pony. And speaking pragmatically, I can vouch for the fact that my pony has always done a very good job. > Anything else you want in terms of data persistence (data from my careless > applications will be safe no matter what) is just wishful thinking. Unfortunately, I don't have the luxury of blaming the application. > ext3 gave you about 5 seconds thanks to default jbd behavior and > data=ordered behavior. ext4 & xfs are more on the order of > 30s. There's more to it than that, though, isn't there? Ext3 (and presumably ext4 without DA) flush the relevant data immediately before the metadata write. It's more to do with metadata and data being written at the same time (and data just *before* metadata) than of the frequency with which it happens. Am I correct about that? > But this all boils down to:> > Did you (or your app) fsync your data? No. Because Cobol doesn't support it. And few, apparently not even Red Hat, bothers to use the little known os.fsync() call under Python, so far as I've been able to tell. Still haven't checked on Perl and Ruby. > (It'd probably be better to take this up on the filesystem lists, > since we've gotten awfully off-topic for linux-raid. I agree that this is off-topic. It started as a relevant question (from me) about odd RAID10 performance I was seeing. Someone decided to use it as an opportunity to sell me on XFS, and things went south from there. (Although I have found it to be interesting.) I wasn't going to post further here. I'd even unsubscribed from the list. But I couldn't resist when you and Ric posted back. I know that you both know what you're talking about, and give honest answers, even if your world of pristine data centers and mine of makeshift "server closets" may result in differing views. I have a pretty good idea the way things would go were I to post on linux-fsdevel. I saw how that all worked out back in 2009. And I'd as soon not go there. I think I got all the answers I was looking for here, anyway. I know I asked a couple of questions of you in this post. But we can keep it short and then cut it short, after. Thanks for your time and your thoughts. -Steve Bergman -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html