On 20 Nov 07, at 1756, David Lang wrote:
Assuming the journal is on a distinct device and the distinct device can take the load. It isn't on ZFS, although work is in progress. One of the many benefits of the sadly underrated Solaris Disksuite product was the metatrans devices, which at least permitted metadata updates to go to a distinct device. When the UFS logging code went into core Solaris (the ON integration) that facility was dropped, sadly. My Pillar NFS server does data logging to distinct disk groups, but mostly --- like such boxes tend to do --- relies on 12GB of RAM and a battery. A sequential write is only of benefit if the head is in the right place and the platter is at the right rotational position and the write is well-matched to the transfer rate of the spindle: if the spindle is doing large sequential writes while also servicing reads and writes elsewhere, or can't keep up with writing tracks flat out, the problems increase.
I've split the meta-data out into separate partitions. The meta data is stored in ZFS filesystems in a pool which is a RAID 0+1 4 disk group with SAS drives, the message data is coming out of the lowest QoS on my Pillar. A ten second fsstat on VM operations shows that by request (this measures filesystem activity, not the implied disk activity) it's the meta partitions taking the pounding (ten second sample): map addmap delmap getpag putpag pagio 0 0 0 45 0 0 /var/imap 11 11 11 17 0 0 /var/imap/meta-partition-1 290 290 290 463 5 0 /var/imap/meta-partition-2 139 139 139 183 3 0 /var/imap/meta-partition-3 66 66 66 106 10 0 /var/imap/meta-partition-7 347 347 342 454 16 0 /var/imap/meta-partition-8 57 57 57 65 5 0 /var/imap/meta-partition-9 4 4 8 4 0 0 /var/imap/partition-1 11 11 22 14 0 0 /var/imap/partition-2 1 1 2 1 0 0 /var/imap/partition-3 6 6 12 49 10 0 /var/imap/partition-7 15 15 28 457 0 0 /var/imap/partition-8 1 1 2 2 0 0 /var/imap/partition-9 Similarly, by non-VM operation: new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 2.26K 0 6.15K 0 0 0 45 1.22K /var/imap 0 0 0 356 0 707 0 0 0 6 3.03K /var/imap/meta-partition-1 3 0 3 596 0 902 0 6 135K 90 305K /var/imap/meta-partition-2 0 0 0 621 0 1.08K 0 0 0 3 1.51K /var/imap/meta-partition-3 3 0 3 1.04K 0 1.70K 0 6 149K 36 650K /var/imap/meta-partition-7 0 0 0 2.28K 0 4.24K 0 0 0 7 1.87K /var/imap/meta-partition-8 0 0 0 18 0 32 0 0 0 2 176 /var/imap/meta-partition-9 2 2 2 22 0 30 0 1 2.37K 2 7.13K /var/imap/partition-1 3 4 12 84 0 157 0 1 677 3 7.51K /var/imap/partition-2 1 1 1 1.27K 0 2.16K 0 0 0 1 3.75K /var/imap/partition-3 2 2 4 35 0 56 0 1 3.97K 36 279K /var/imap/partition-7 1 2 1 256 0 514 0 0 0 1 3.75K /var/imap/partition-8 0 0 0 0 0 0 0 0 0 0 0 /var/imap/partition-9 And looking at the real IO load, ten seconds of zpool (for the meta data and /var/imap_ capacity operations bandwidth pool used avail read write read write ------------ ----- ----- ----- ----- ----- ----- pool1 51.6G 26.4G 0 142 54.3K 1001K mirror 25.8G 13.2G 0 68 38.4K 471K c0t0d0s4 - - 0 36 44.7K 471K c0t1d0s4 - - 0 36 0 471K mirror 25.8G 13.2G 0 73 15.9K 530K c0t2d0s4 - - 0 40 28.4K 531K c0t3d0s4 - - 0 39 6.39K 531K ------------ ----- ----- ----- ----- ----- ----- is very different to ten seconds of sar for the NFS: 09:46:34 device %busy avque r+w/s blks/s avwait avserv [...] nfs73 1 0.0 3 173 0.0 4.2 nfs86 3 0.1 12 673 0.0 6.5 nfs87 0 0.0 0 0 0.0 0.0 nfs89 0 0.0 0 0 0.0 0.0 nfs96 0 0.0 0 0 0.0 1.8 nfs101 1 0.0 1 25 0.0 8.0 nfs102 0 0.0 0 4 0.0 9.4 The machine has a _lot_ of memory (32GB) so it's likely that all mail that is delivered and then read within ten minutes never gets read back from the message store: the NFS load is almost entirely write as seen from the server. ian |
---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html