ext4 finally doing the right thing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A few months ago the worst of the bugs in the ext4 fsync code started clearing up, with http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f3481e9a80c240f169b36ea886e2325b9aeb745 as a particularly painful one. That made it into the 2.6.32 kernel released last month. Some interesting benchmark news today suggests a version of ext4 that might actually work for databases is showing up in early packaged distributions:

http://www.phoronix.com/scan.php?page=article&item=ubuntu_lucid_alpha2&num=3

Along with the massive performance drop that comes from working fsync. See http://www.phoronix.com/scan.php?page=article&item=linux_perf_regressions&num=2 for background about this topic from when the issue was discovered:

"[This change] is required for safe behavior with volatile write caches on drives. You could mount with -o nobarrier and [the performance drop] would go away, but a sequence like write->fsync->lose power->reboot may well find your file without the data that you synced, if the drive had write caches enabled. If you know you have no write cache, or that it is safely battery backed, then you can mount with -o nobarrier, and not incur this penalty."

The pgbench TPS figure Phoronix has been reporting has always been a fictitious one resulting from unsafe write caching. With 2.6.32 released with ext4 defaulting to proper behavior on fsync, that's going to make for a very interesting change. On one side, we might finally be able to use regular drives with their caches turned on safely, taking advantage of the cache for other writes while doing the right thing with the database writes. On the other, anyone who believed the fictitious numbers before is going to be in a rude surprise and think there's a massive regression here. There's some potential for this to show PostgreSQL in a bad light, when people discover they really only can get ~100 commits/second out of cheap hard drives and assume the database is to blame. Interesting times.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx  www.2ndQuadrant.co


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux