Matthias Andree <ma+ext3@dt.e-technik.uni-dortmund.de> wrote: > > > Hoho. Seems the kernel doesn't like full write queues too much. ext3 is making it write much _more_ data. Due to the 5-second commit, the application's redirtying and an ext3 gremlin. > > > Now, looking at the enormous amount of system time which the commit=120 run > > took, I assume that the application is doing a _ton_ of overwriting. > > Redirtying the same pages again and again and again. So poor old ext3 keeps > > rewriting them again and again. > > The profile says c. 99% overwrites vs 1% writes to new pages. Ow. > However, > in my experiments and AFAIR in Greg's, the system times were quite > reasonable. I'm going with the default commit interval (5 s if I read my > logs right). Killing my test program after a minute: > > real 0m57.872s > user 0m1.750s > sys 0m4.920s It's the ratio between system and user which shows that it's doing a lot of overwrite. > This is an AMD Duron 700 MHz with PC-133 mem, but I don't recall if I run it > as PC-133 CL3 or PC-100 CL2. > > > You'll hit similar problems with ext2 - on a slower computer, or on a larger > > database, or on a system with the kupdate interval decreased from the 30 > > second default. > > decreased or increased? Decreased. If you decrease the ext2 or reiserfs writeout expiry time to the same as ext2, you may see similar problems. See, your test takes 25 seconds, which just squeezes inside the default writeback timeout.. If it happened to take 40 seconds, you may hit this problem on other filesystems. Or if the amount of dirty data exceeds 40% of physical memory. What is happening is that once writeback kicks in, that slows the userspace application down because in certain circumstances, userspace has to wait on writeout before it can get access to a buffer. And slowing down userspace in this way cause an exponential increase in runtime, because the longer userspace takes to run, the more commit intervals that run will span. It feeds on itself. Now, generally the kernel will attempt to prevent serialising userspace behind background writeout. But there's one spot in do_get_write_access(): if (jh->b_jlist == BJ_Shadow) { where a random mark_inode_dirty() call will serialise behind the ongoing transaction commit. This, and the 5-second commit, is the crux of the problem. If another filesystem (or the VFS) happens to run a random lock_buffer(), the same could happen there. It _shouldn't_, but it might. Testing those filesystems with a larger dataset, or a slower computer, or with the kupdate intervals wound down would tell that. > So what's special about the combination of "ext3fs and IDE"? Nothing. Possibly you got lucky on SCSI, and the serialisation against an under-commit buffer did not happen. Or the scsi disk has a larger writeback cache. Don't know. It will happen on SCSI as well. > The interesting thing in my test is vmstat 1 -- with SCSI, I get some > hundred blocks trickled out every once in a while. With IDE, I get a > constant write rate of some hundred blocks per second. (IDE lacks the > big write at the end because I abort it prematurely and the fsync() is > missed therefore). That's because for some unknown reason, IDE triggered the regenerative slowdown and SCSI didn't. Try varying a few things and you'll see scsi do the same thing. > Seriously, as long as ext3 + IDE is a problem and ext2 + IDE isn't (with > 2.4 at least), reiserfs + IDE isn't, ext3 + SCSI isn't, there's no > compelling reason to change the application code. Try larger datasets. ext2 _may_ be OK; it's pretty good about avoiding serialisation behind I/O. > Is there anything that might get in the way? Write barrier code? The locked shadow buffer. > Is the ext3fs jdb code shared with other file system types I could test? No. > I hope my vmstat data is useful. I can compile and test a specific > kernel version if needed. Probably ext3 needs to be changed to take a copy of the buffer in there rather than waiting on the commit. But you're only writing 24 megs of data! Delay the writeout. _______________________________________________ Ext3-users@redhat.com https://listman.redhat.com/mailman/listinfo/ext3-users