writing processes are blocking in log_wait_common with data=ordered

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.
  Send mail to mime@docserver.cac.washington.edu for more info.

---556791216-1798619434-1020209496=:11130
Content-Type: TEXT/PLAIN; charset=US-ASCII

> Andrew Morton wrote:
> > 
> > Does this patch help?
> 
> I won't, I suspect.  You've done an O_SYNC write.  ext3
> needs to write your data out to disk before returning
> from the pwrite() call.  We do that by running a commit
> and waiting for it to complete.
> 
> In ordered mode, commit will writeback and wait upon
> your newly-dirtied data.  That's what you asked it to do.
> 
> Other filesystems will do it by directly writing the data
> and waiting on it.  We've lost some concurrency because
> the journal is busy, but in practice I suspect it won't
> make much difference.
> 
> Are you sure that you actually have a problem?  Does your
> application run significantly more quickly on ext2?

I think so.  Here's what I've tested so far using a test program
(attached, see P.S. below) that simulates the load.  I have:

1) Red Hat 7.2. kernel 2.4.17-rc2-aa2, with ext3 on a ATA133 disk.  
This reports about 70 blks/sec.

2) Red Hat 6.2 kernel 2.4.17-rc2-aa2 with ext2 on a SCSI U160 disk. This
reports about 420 blks/sec.

3) Red Hat 7.2 (identical hardware to #2) kernel 2.4.19-pre7-aa2 with ext3
This reports about 40 blks/sec.

Both ext3 systems are in the 40-70 range, though they differ in kernel
version and hardware.  The ext2 system is 10x faster, even on the same
kernel or hardware.

Also, kjournald has been eating a ton of cpu time lately.  It had used 7
minutes in a month and then 3 minutes in a day since I noticed this was
happening.  This is with the real application, not the test proggy.

> (I now need to know your exact kernel version - there
> have been various goofups on the sync paths which were
> fixed relatively recently).
> 
> I suspect that ext3 is doing an unnecessary commit
> on the fsync() case, and in the O_SYNC case, for your
> application.  If the mtime fix is in place then we
> can try to drop all the ordered-mode data buffers
> from the transaction (which will succeed) and then
> look to see if there's anything to be committed
> (there will not be).  hmm.

I will try out both your patch, which you think won't work, and various
combinations of ext3 (ordered and writeback) and ext2.  My target kernel
version is 2.4.19-pre7-aa2.  I'll try out vanilla pre7 if I have time too.

One interesting and unexpected result is that running inside a looped back
filesystem 1gb in size increases performance 4-fold from running on the
real filesystem!  That is, ext3-ordered looped on top of ext3-ordered is
much faster than ext3-ordered!  This is on kernel 2.4.17-rc2-aa2, which is
a bit old, so it could be meaningless....

David

P.S.  I created a benchmark of this phenomenon called blktest.c.  It's a 
bit rough (you need to recompile to change block size etc.).  It's 
attached.  It takes a single argument which is the number of concurrent 
writers.  Each writer writes an 8kb block to a random location in the 
file using pwrite.  The code is stupid in many places.  Excuse it.

-- 
/==============================\
| David Mansfield              |
| david@cobite.com             |
\==============================/

---556791216-1798619434-1020209496=:11130
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="blktest.c"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.44.0204301931360.11130@admin>
Content-Description: 
Content-Disposition: attachment; filename="blktest.c"

LyogZm9yIHB3cml0ZSAqLw0KI2RlZmluZSBfWE9QRU5fU09VUkNFIDUwMA0K
DQojaW5jbHVkZSA8c3RkaW8uaD4NCiNpbmNsdWRlIDxzdGRsaWIuaD4NCiNp
bmNsdWRlIDx0aW1lLmg+DQojaW5jbHVkZSA8c3RyaW5nLmg+DQojaW5jbHVk
ZSA8c3lzL3R5cGVzLmg+DQojaW5jbHVkZSA8c3lzL3N0YXQuaD4NCiNpbmNs
dWRlIDxzeXMvd2FpdC5oPg0KI2luY2x1ZGUgPHN5cy90aW1lLmg+DQojaW5j
bHVkZSA8ZmNudGwuaD4NCiNpbmNsdWRlIDx1bmlzdGQuaD4NCiNpbmNsdWRl
IDxzaWduYWwuaD4NCiNpbmNsdWRlIDxzeXMvbW1hbi5oPg0KDQojZGVmaW5l
IEJMS1NJWkUgODE5Mg0KI2RlZmluZSBGSUxFU0laRSAoNTEyKjEwMjQqMTAy
NCkgDQoNCnZvaWQgZGllKGNvbnN0IGNoYXIgKiByZWFzb24pDQp7DQogICAg
ZnByaW50ZihzdGRlcnIsICJkeWluZzogJXNcbiIsIHJlYXNvbik7DQogICAg
ZXhpdCgxKTsNCn0NCg0Kdm9pZCBzaWcoaW50IHdoaWNoKQ0Kew0KICAgIHBy
aW50ZigicmVjZWl2ZWQgc2lnbmFsICVkXG4iLCB3aGljaCk7DQp9DQoNCnZv
aWQgZG9fY2hpbGQoaW50IGZkLCBpbnQgY2hpbGQsIGludCAqIHNjb3JlKQ0K
ew0KICAgIGNoYXIgYnVmZltCTEtTSVpFXTsNCiAgICBzdHJ1Y3QgdGltZXZh
bCB0djsNCg0KICAgIC8qIHNldCByYW5kb20gc2VlZCBpbiBlYWNoIHByb2Nl
c3MgKi8NCiAgICBnZXR0aW1lb2ZkYXkoJnR2LCBOVUxMKTsNCiAgICBzcmFu
ZCh0di50dl91c2VjKTsNCg0KICAgIG1lbXNldChidWZmLCBjaGlsZCwgQkxL
U0laRSk7DQoNCiAgICB3aGlsZSAoMSkNCiAgICB7DQoJaW50IGJsb2NrID0g
cmFuZCgpICUgKEZJTEVTSVpFL0JMS1NJWkUpOw0KCXB3cml0ZShmZCwgYnVm
ZiwgQkxLU0laRSwgYmxvY2sgKiBCTEtTSVpFKTsNCgkoKnNjb3JlKSsrOw0K
ICAgIH0NCn0NCg0KaW50IG1haW4oaW50IGFyZ2MsIGNoYXIgKiBhcmd2W10p
DQp7DQogICAgaW50IGksIG5yX3Byb2NzLCBmZDsNCiAgICBwaWRfdCAqIHBp
ZDsNCiAgICBzdHJ1Y3Qgc2lnYWN0aW9uIHNhOw0KICAgIGludCBzY29yZV9m
ZDsNCiAgICBpbnQgKiBzY29yZTsNCiAgICBzdHJ1Y3QgdGltZXZhbCBzdGFy
dF90diwgZW5kX3R2Ow0KICAgIGludCB0b3RhbF9zY29yZSA9IDA7DQogICAg
ZG91YmxlIHNlY3M7DQoNCiAgICBpZiAoYXJnYyA8IDIpDQoJZGllKCJ1c2Fn
ZSIpOw0KDQogICAgaWYgKChucl9wcm9jcyA9IGF0b2koYXJndlsxXSkpIDw9
IDApDQoJZGllKCJ1c2FnZSIpOw0KDQogICAgLyogdGhlIHRlc3QgZmlsZSBu
ZWVkcyB0byBiZSBjcmVhdGVkIGJlZm9yZWhhbmQgKi8NCiAgICBpZiAoKGZk
ID0gb3BlbigiYmxrdGVzdC50bXAiLCBPX1JEV1J8T19TWU5DKSkgPCAwKQ0K
CWRpZSgicGxlYXNlIGNyZWF0ZSBhIHRlc3QgZmlsZSB1c2luZzpcblxuZGQg
aWY9L2Rldi96ZXJvIG9mPWJsa3Rlc3QudG1wIGJzPTFrIGNvdW50PXh4eCIp
Ow0KDQogICAgLyogc2hhcmVkIG1lbW9yeSB0byBrZWVwIHRoZSAnc2NvcmVi
b2FyZCcgKi8NCiAgICBpZiAoKHNjb3JlX2ZkID0gb3BlbigiL2Rldi96ZXJv
IiwgT19SRFdSKSkgPCAwKQ0KCWRpZSgiL2Rldi96ZXJvIik7DQoNCiAgICBp
ZiAoKHNjb3JlID0gKGludCopbW1hcCgwLCA0MDk2LCBQUk9UX1JFQUR8UFJP
VF9XUklURSwgTUFQX1NIQVJFRCwgc2NvcmVfZmQsIDApKSA9PSBNQVBfRkFJ
TEVEKQ0KCWRpZSgibW1hcCIpOw0KDQogICAgaWYgKCEocGlkID0gKHBpZF90
KiljYWxsb2MobnJfcHJvY3MsIHNpemVvZihwaWRfdCkpKSkNCglkaWUoImNh
bGxvYyIpOw0KDQogICAgcHJpbnRmKCJmb3JraW5nIHdyaXRlcnMuXG4iKTsN
Cg0KICAgIGdldHRpbWVvZmRheSgmc3RhcnRfdHYsIE5VTEwpOw0KDQogICAg
Zm9yIChpID0gMDsgaSA8IG5yX3Byb2NzOyBpKyspDQogICAgew0KCWlmICgo
cGlkW2ldID0gZm9yaygpKSA8IDApDQoJew0KCSAgICBpbnQgajsNCgkgICAg
Zm9yIChqID0gMDsgaiA8IGk7IGorKykNCgkJa2lsbChwaWRbal0sIFNJR0tJ
TEwpOw0KCSAgICBnb3RvIGNsZWFudXA7DQoJfQ0KCWVsc2UgaWYgKHBpZFtp
XSA9PSAwKQ0KCXsNCgkgICAgZG9fY2hpbGQoZmQsIGksIHNjb3JlICsgaSk7
DQoJfQ0KCQ0KCXByaW50ZigiZm9ya2VkIHByb2Nlc3MgJWRcbiIsIHBpZFtp
XSk7DQogICAgfQ0KDQogICAgbWVtc2V0KCZzYSwgMCwgc2l6ZW9mKHNhKSk7
DQogICAgc2Euc2FfaGFuZGxlciA9IHNpZzsNCiAgICBzaWdhY3Rpb24oU0lH
SU5ULCAmc2EsIE5VTEwpOw0KICAgIHNpZ2FjdGlvbihTSUdURVJNLCAmc2Es
IE5VTEwpOw0KDQogICAgcHJpbnRmKCJjaGlsZHJlbiBzdGFydGVkLCB3YWl0
aW5nIGZvciBzaWduYWxcbiIpOw0KICAgIHBhdXNlKCk7DQoNCiBjbGVhbnVw
Og0KICAgIHdoaWxlIChpKQ0KICAgIHsNCglwaWRfdCBkZWFkID0gd2FpdChO
VUxMKTsNCglwcmludGYoInBpZCAlZCBoYXMgZXhpdGVkXG4iLCBkZWFkKTsN
CglpLS07DQogICAgfQ0KICAgIA0KICAgIGdldHRpbWVvZmRheSgmZW5kX3R2
LCBOVUxMKTsNCg0KICAgIGZvciAoaSA9IDA7IGkgPCBucl9wcm9jczsgaSsr
KQ0KICAgIHsNCglwcmludGYoInNjb3JlIGZvciAlZDogJWRcbiIsIGksIHNj
b3JlW2ldKTsNCgl0b3RhbF9zY29yZSArPSBzY29yZVtpXTsNCiAgICB9DQoN
CiAgICBlbmRfdHYudHZfc2VjIC09IHN0YXJ0X3R2LnR2X3NlYzsNCiAgICBl
bmRfdHYudHZfdXNlYyAtPSBlbmRfdHYudHZfdXNlYzsNCiAgICANCiAgICBp
ZiAoZW5kX3R2LnR2X3VzZWMgPCAwKQ0KCWVuZF90di50dl9zZWMtLSwgZW5k
X3R2LnR2X3VzZWMgKz0gMTAwMDAwMDsNCg0KICAgIHNlY3MgPSAoZG91Ymxl
KWVuZF90di50dl9zZWMgKyAoZG91YmxlKWVuZF90di50dl91c2VjIC8gMTAw
MDAwMC4wOw0KICAgIA0KICAgIHByaW50ZigidG90YWwgc2NvcmU6ICVkIGJs
b2NrcyBpbiAlLjJmIHNlY29uZHMgJWYgYmxrcy9zZWNcbiIsIHRvdGFsX3Nj
b3JlLCBzZWNzLCAoZG91YmxlKXRvdGFsX3Njb3JlL3NlY3MpOw0KDQogICAg
ZXhpdCgwKTsNCn0NCg==
---556791216-1798619434-1020209496=:11130--





[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux