Re: ext4 finally doing the right thing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greg Stark wrote:

That doesn't sound right. The kernel having 10% of memory dirty doesn't mean there's a queue you have to jump at all. You don't get into any queue until the kernel initiates write-out which will be based on the usage counters -- basically a lru. fsync and cousins like sync_file_range and posix_fadvise(DONT_NEED) in initiate write-out right away.


Most safe ways ext3 knows how to initiate a write-out on something that must go (because it's gotten an fsync on data there) requires flushing every outstanding write to that filesystem along with it. So as soon as a single WAL write shows up, bam! The whole cache is emptied (or at least everything associated with that filesystem), and the caller who asked for that little write is stuck waiting for everything to clear before their fsync returns success.

This particular issue absolutely killed Firefox when they switched to using SQLite not too long ago; high-level discussion at http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/ and confirmation/discussion of the issue on lkml at https://kerneltrap.org/mailarchive/linux-fsdevel/2008/5/26/1941354 . Note the comment from the first article saying "those delays can be 30 seconds or more". On multiple occasions, I've measured systems with dozens of disks in a high-performance RAID1+0 with battery-backed controller that could grind to a halt for 10, 20, or more seconds in this situation, when running pgbench on a big database. As was the case on the latest one I saw, if you've got 32GB of RAM and have let 3.2GB of random I/O from background writer/checkpoint writes back up because Linux has been lazy about getting to them, that takes a while to clear no matter how good the underlying hardware.

Write barriers were supposed to improve all this when added to ext3, but they just never seemed to work right for many people. After reading that lkml thread, among others, I know I was left not trusting anything beyond the simplest path through this area of the filesystem. Slow is better than corrupted.

So the good news I was relaying is that it looks like this finally work on ext4, giving it the behavior you described and expected, but that's not actually been there until now. I was hoping someone with more free time than me might be interested to go investigate further if I pointed the advance out. I'm stuck with too many production systems to play with new kernels at the moment, but am quite curious.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux