This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---556791216-1798619434-1020209496=:11130 Content-Type: TEXT/PLAIN; charset=US-ASCII > Andrew Morton wrote: > > > > Does this patch help? > > I won't, I suspect. You've done an O_SYNC write. ext3 > needs to write your data out to disk before returning > from the pwrite() call. We do that by running a commit > and waiting for it to complete. > > In ordered mode, commit will writeback and wait upon > your newly-dirtied data. That's what you asked it to do. > > Other filesystems will do it by directly writing the data > and waiting on it. We've lost some concurrency because > the journal is busy, but in practice I suspect it won't > make much difference. > > Are you sure that you actually have a problem? Does your > application run significantly more quickly on ext2? I think so. Here's what I've tested so far using a test program (attached, see P.S. below) that simulates the load. I have: 1) Red Hat 7.2. kernel 2.4.17-rc2-aa2, with ext3 on a ATA133 disk. This reports about 70 blks/sec. 2) Red Hat 6.2 kernel 2.4.17-rc2-aa2 with ext2 on a SCSI U160 disk. This reports about 420 blks/sec. 3) Red Hat 7.2 (identical hardware to #2) kernel 2.4.19-pre7-aa2 with ext3 This reports about 40 blks/sec. Both ext3 systems are in the 40-70 range, though they differ in kernel version and hardware. The ext2 system is 10x faster, even on the same kernel or hardware. Also, kjournald has been eating a ton of cpu time lately. It had used 7 minutes in a month and then 3 minutes in a day since I noticed this was happening. This is with the real application, not the test proggy. > (I now need to know your exact kernel version - there > have been various goofups on the sync paths which were > fixed relatively recently). > > I suspect that ext3 is doing an unnecessary commit > on the fsync() case, and in the O_SYNC case, for your > application. If the mtime fix is in place then we > can try to drop all the ordered-mode data buffers > from the transaction (which will succeed) and then > look to see if there's anything to be committed > (there will not be). hmm. I will try out both your patch, which you think won't work, and various combinations of ext3 (ordered and writeback) and ext2. My target kernel version is 2.4.19-pre7-aa2. I'll try out vanilla pre7 if I have time too. One interesting and unexpected result is that running inside a looped back filesystem 1gb in size increases performance 4-fold from running on the real filesystem! That is, ext3-ordered looped on top of ext3-ordered is much faster than ext3-ordered! This is on kernel 2.4.17-rc2-aa2, which is a bit old, so it could be meaningless.... David P.S. I created a benchmark of this phenomenon called blktest.c. It's a bit rough (you need to recompile to change block size etc.). It's attached. It takes a single argument which is the number of concurrent writers. Each writer writes an 8kb block to a random location in the file using pwrite. The code is stupid in many places. Excuse it. -- /==============================\ | David Mansfield | | david@cobite.com | \==============================/ ---556791216-1798619434-1020209496=:11130 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="blktest.c" Content-Transfer-Encoding: BASE64 Content-ID: <Pine.LNX.4.44.0204301931360.11130@admin> Content-Description: Content-Disposition: attachment; filename="blktest.c" LyogZm9yIHB3cml0ZSAqLw0KI2RlZmluZSBfWE9QRU5fU09VUkNFIDUwMA0K DQojaW5jbHVkZSA8c3RkaW8uaD4NCiNpbmNsdWRlIDxzdGRsaWIuaD4NCiNp bmNsdWRlIDx0aW1lLmg+DQojaW5jbHVkZSA8c3RyaW5nLmg+DQojaW5jbHVk ZSA8c3lzL3R5cGVzLmg+DQojaW5jbHVkZSA8c3lzL3N0YXQuaD4NCiNpbmNs dWRlIDxzeXMvd2FpdC5oPg0KI2luY2x1ZGUgPHN5cy90aW1lLmg+DQojaW5j bHVkZSA8ZmNudGwuaD4NCiNpbmNsdWRlIDx1bmlzdGQuaD4NCiNpbmNsdWRl IDxzaWduYWwuaD4NCiNpbmNsdWRlIDxzeXMvbW1hbi5oPg0KDQojZGVmaW5l IEJMS1NJWkUgODE5Mg0KI2RlZmluZSBGSUxFU0laRSAoNTEyKjEwMjQqMTAy NCkgDQoNCnZvaWQgZGllKGNvbnN0IGNoYXIgKiByZWFzb24pDQp7DQogICAg ZnByaW50ZihzdGRlcnIsICJkeWluZzogJXNcbiIsIHJlYXNvbik7DQogICAg ZXhpdCgxKTsNCn0NCg0Kdm9pZCBzaWcoaW50IHdoaWNoKQ0Kew0KICAgIHBy aW50ZigicmVjZWl2ZWQgc2lnbmFsICVkXG4iLCB3aGljaCk7DQp9DQoNCnZv aWQgZG9fY2hpbGQoaW50IGZkLCBpbnQgY2hpbGQsIGludCAqIHNjb3JlKQ0K ew0KICAgIGNoYXIgYnVmZltCTEtTSVpFXTsNCiAgICBzdHJ1Y3QgdGltZXZh bCB0djsNCg0KICAgIC8qIHNldCByYW5kb20gc2VlZCBpbiBlYWNoIHByb2Nl c3MgKi8NCiAgICBnZXR0aW1lb2ZkYXkoJnR2LCBOVUxMKTsNCiAgICBzcmFu ZCh0di50dl91c2VjKTsNCg0KICAgIG1lbXNldChidWZmLCBjaGlsZCwgQkxL U0laRSk7DQoNCiAgICB3aGlsZSAoMSkNCiAgICB7DQoJaW50IGJsb2NrID0g cmFuZCgpICUgKEZJTEVTSVpFL0JMS1NJWkUpOw0KCXB3cml0ZShmZCwgYnVm ZiwgQkxLU0laRSwgYmxvY2sgKiBCTEtTSVpFKTsNCgkoKnNjb3JlKSsrOw0K ICAgIH0NCn0NCg0KaW50IG1haW4oaW50IGFyZ2MsIGNoYXIgKiBhcmd2W10p DQp7DQogICAgaW50IGksIG5yX3Byb2NzLCBmZDsNCiAgICBwaWRfdCAqIHBp ZDsNCiAgICBzdHJ1Y3Qgc2lnYWN0aW9uIHNhOw0KICAgIGludCBzY29yZV9m ZDsNCiAgICBpbnQgKiBzY29yZTsNCiAgICBzdHJ1Y3QgdGltZXZhbCBzdGFy dF90diwgZW5kX3R2Ow0KICAgIGludCB0b3RhbF9zY29yZSA9IDA7DQogICAg ZG91YmxlIHNlY3M7DQoNCiAgICBpZiAoYXJnYyA8IDIpDQoJZGllKCJ1c2Fn ZSIpOw0KDQogICAgaWYgKChucl9wcm9jcyA9IGF0b2koYXJndlsxXSkpIDw9 IDApDQoJZGllKCJ1c2FnZSIpOw0KDQogICAgLyogdGhlIHRlc3QgZmlsZSBu ZWVkcyB0byBiZSBjcmVhdGVkIGJlZm9yZWhhbmQgKi8NCiAgICBpZiAoKGZk ID0gb3BlbigiYmxrdGVzdC50bXAiLCBPX1JEV1J8T19TWU5DKSkgPCAwKQ0K CWRpZSgicGxlYXNlIGNyZWF0ZSBhIHRlc3QgZmlsZSB1c2luZzpcblxuZGQg aWY9L2Rldi96ZXJvIG9mPWJsa3Rlc3QudG1wIGJzPTFrIGNvdW50PXh4eCIp Ow0KDQogICAgLyogc2hhcmVkIG1lbW9yeSB0byBrZWVwIHRoZSAnc2NvcmVi b2FyZCcgKi8NCiAgICBpZiAoKHNjb3JlX2ZkID0gb3BlbigiL2Rldi96ZXJv IiwgT19SRFdSKSkgPCAwKQ0KCWRpZSgiL2Rldi96ZXJvIik7DQoNCiAgICBp ZiAoKHNjb3JlID0gKGludCopbW1hcCgwLCA0MDk2LCBQUk9UX1JFQUR8UFJP VF9XUklURSwgTUFQX1NIQVJFRCwgc2NvcmVfZmQsIDApKSA9PSBNQVBfRkFJ TEVEKQ0KCWRpZSgibW1hcCIpOw0KDQogICAgaWYgKCEocGlkID0gKHBpZF90 KiljYWxsb2MobnJfcHJvY3MsIHNpemVvZihwaWRfdCkpKSkNCglkaWUoImNh bGxvYyIpOw0KDQogICAgcHJpbnRmKCJmb3JraW5nIHdyaXRlcnMuXG4iKTsN Cg0KICAgIGdldHRpbWVvZmRheSgmc3RhcnRfdHYsIE5VTEwpOw0KDQogICAg Zm9yIChpID0gMDsgaSA8IG5yX3Byb2NzOyBpKyspDQogICAgew0KCWlmICgo cGlkW2ldID0gZm9yaygpKSA8IDApDQoJew0KCSAgICBpbnQgajsNCgkgICAg Zm9yIChqID0gMDsgaiA8IGk7IGorKykNCgkJa2lsbChwaWRbal0sIFNJR0tJ TEwpOw0KCSAgICBnb3RvIGNsZWFudXA7DQoJfQ0KCWVsc2UgaWYgKHBpZFtp XSA9PSAwKQ0KCXsNCgkgICAgZG9fY2hpbGQoZmQsIGksIHNjb3JlICsgaSk7 DQoJfQ0KCQ0KCXByaW50ZigiZm9ya2VkIHByb2Nlc3MgJWRcbiIsIHBpZFtp XSk7DQogICAgfQ0KDQogICAgbWVtc2V0KCZzYSwgMCwgc2l6ZW9mKHNhKSk7 DQogICAgc2Euc2FfaGFuZGxlciA9IHNpZzsNCiAgICBzaWdhY3Rpb24oU0lH SU5ULCAmc2EsIE5VTEwpOw0KICAgIHNpZ2FjdGlvbihTSUdURVJNLCAmc2Es IE5VTEwpOw0KDQogICAgcHJpbnRmKCJjaGlsZHJlbiBzdGFydGVkLCB3YWl0 aW5nIGZvciBzaWduYWxcbiIpOw0KICAgIHBhdXNlKCk7DQoNCiBjbGVhbnVw Og0KICAgIHdoaWxlIChpKQ0KICAgIHsNCglwaWRfdCBkZWFkID0gd2FpdChO VUxMKTsNCglwcmludGYoInBpZCAlZCBoYXMgZXhpdGVkXG4iLCBkZWFkKTsN CglpLS07DQogICAgfQ0KICAgIA0KICAgIGdldHRpbWVvZmRheSgmZW5kX3R2 LCBOVUxMKTsNCg0KICAgIGZvciAoaSA9IDA7IGkgPCBucl9wcm9jczsgaSsr KQ0KICAgIHsNCglwcmludGYoInNjb3JlIGZvciAlZDogJWRcbiIsIGksIHNj b3JlW2ldKTsNCgl0b3RhbF9zY29yZSArPSBzY29yZVtpXTsNCiAgICB9DQoN CiAgICBlbmRfdHYudHZfc2VjIC09IHN0YXJ0X3R2LnR2X3NlYzsNCiAgICBl bmRfdHYudHZfdXNlYyAtPSBlbmRfdHYudHZfdXNlYzsNCiAgICANCiAgICBp ZiAoZW5kX3R2LnR2X3VzZWMgPCAwKQ0KCWVuZF90di50dl9zZWMtLSwgZW5k X3R2LnR2X3VzZWMgKz0gMTAwMDAwMDsNCg0KICAgIHNlY3MgPSAoZG91Ymxl KWVuZF90di50dl9zZWMgKyAoZG91YmxlKWVuZF90di50dl91c2VjIC8gMTAw MDAwMC4wOw0KICAgIA0KICAgIHByaW50ZigidG90YWwgc2NvcmU6ICVkIGJs b2NrcyBpbiAlLjJmIHNlY29uZHMgJWYgYmxrcy9zZWNcbiIsIHRvdGFsX3Nj b3JlLCBzZWNzLCAoZG91YmxlKXRvdGFsX3Njb3JlL3NlY3MpOw0KDQogICAg ZXhpdCgwKTsNCn0NCg== ---556791216-1798619434-1020209496=:11130--