[ ... ] > I thought I would do a real measurement to have some numbers. > On my raid-1 ext3, extracting a kernel archive: > benjamin@metis ~/software $ time tar xfj > /usr/portage/distfiles/linux-2.6.38.tar.bz2 > real 0m21.769s > user 0m13.905s > sys 0m1.751s That's a "real measurement" of *something*, and it does give "some numbers", but to me the numbers are not that interesting as it is far from clear what they are about. So I happen to have an otherwise totally unused fastish contemporary 500GB disk and laptop for a measurement of something that might be better defined, a bit simplemindedly, but taking care about a few details (see also appended setup details), so that the numbers be about as good as possible (YMMV). First with 'ext3': % mount -t ext3 -o relatime /dev/sdb /mnt/sdb % df -BM /mnt/sdb Filesystem 1M-blocks Used Available Use% Mounted on /dev/sdb 469455M 687M 444922M 1% /mnt/sdb % df -i /mnt/sdb Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb 30531584 38100 30493484 1% /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 12m49.610s user 0m0.990s sys 0m8.610s That's like 570KB/s and 50 files/s, in more or less optimal conditions. Not so good for 'ext3', which indeed is well known for appalling small file/metadata write performance, but the order-of-magnitude of the results is the plausible one. XFS with 'delaylog' does worse, but then it has a difference tradeoff envelope: % mount -t xfs -o relatime,delaylog /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 24m4.282s user 0m1.260s sys 0m14.030s I also tried with JFS and it is faster at 1MB/s and 90 files/s which is pretty good (and I suspect that JFS may be cheating slightly on the semantics, but I know about its on-disk structure and twice as fast as 'ext3' is plausible): % mount -t jfs -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 6m56.508s user 0m1.000s sys 0m7.130s Consolation notes :-) ===================== Naturally the real (and arguably rather more meaningful than others) measurements above will be baffling those described here: [ ... ] many people (some with decades of "experience") just don't understand IOPS and metadata and commits and caching and who think "performance" is whatever number they can get with their clever "benchmarks". So as a consolation prize to them let's rerun with entirely different semantics but still taking a bit of care: % mount -t ext3 -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar -no-fsync; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m27.414s user 0m0.270s sys 0m2.430s Oh gosh, it looks like much better "performance"! 'ext3' really rises and shines with contiguous large IOs! :-) And similarly for XFS: % mount -t xfs -o relatime,delaylog /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar -no-fsync; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m33.849s user 0m0.310s sys 0m2.960s % mount -o relatime /dev/sdb /mnt/sdb And JFS is quite similar too: % mount -t jfs -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar -no-fsync; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m35.191s user 0m0.380s sys 0m2.920s Journaling notes ================ So there. I apologize to the readers who "understand IOPS and metadata and commits and caching" (and who may have read the man-page for 'star') who will be bored with the beginner-level nature of the points made above. But I am actually a bit surprised disappointed with the "really" numbers above because I would expected something more like 2-3 minutes duration or 2-4 files/s per IOPS, but I guess such are the horrors of seeking crazily between journal and metadata and data space, so let's try without a journal with 'ext2': % mount -t ext2 -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 8m12.196s user 0m1.120s sys 0m6.030s Sure it is better, that's 50% faster than 'ext3'. Let'a also try as a special case 'ext4' (yes, 'ext4' with its many improvements) without a journal: % mkfs.ext4 -O ^has_journal /dev/sdb mke2fs 1.41.11 (14-Mar-2010) /dev/sdb is entire device, not just one partition! Proceed anyway? (y,n) y [ ... ] % mount -t ext4 -o relatime /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m31.119s user 0m0.870s sys 0m6.190s Well, I don't believe that. That looks like a feature or bug in 'ext4' where without a journal it won't honor commits. The same appears to be the case for JFS, but then the manual explicitly says that 'nointegrity' is aptly named, and so it is be;lievable that switching off journaling is not its only effect: % mount -t jfs -o relatime,nointegrity /dev/sdb /mnt/sdb % time sh -c 'cd /mnt/sdb; star -x -b 2048 -f /tmp/linux-2.6.38.tar; cd /; umount /mnt/sdb' star: 420 blocks + 81920 bytes (total of 440483840 bytes = 430160.00k). real 0m35.820s user 0m0.610s sys 0m5.740s Setup details ============= ULTS10 64b, 2.6.35 kernel, 4GiB RAM, I3-M370 CPU. Quiet except for measurements. Every 'tar' extraction is preceded by a re-'mkfs'. Note the details below (e.g. the archive is uncompressed and stored in in-memory 'tmpfs', the disk is a fairly fast 500GB drive on eSATA). ---------------------------------------------------------------- % dd bs=1M if=/tmp/linux-2.6.38.tar of=/dev/null 420+1 records in 420+1 records out 440483840 bytes (440 MB) copied, 0.159935 s, 2.8 GB/s ---------------------------------------------------------------- % hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 388 MB in 3.01 seconds = 128.98 MB/sec ---------------------------------------------------------------- % lsscsi | grep sdb [4:0:0:0] disk ATA ST3500418AS CC44 /dev/sdb ---------------------------------------------------------------- % mkfs.ext3 /dev/sdb mke2fs 1.41.11 (14-Mar-2010) /dev/sdb is entire device, not just one partition! Proceed anyway? (y,n) y Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 30531584 inodes, 122096646 blocks 6104832 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 3727 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 32 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. ---------------------------------------------------------------- % mkfs.xfs -f /dev/sdb meta-data=/dev/sdb isize=256 agcount=4, agsize=30524162 blks = sectsz=512 attr=2 data = bsize=4096 blocks=122096646, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=59617, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 ---------------------------------------------------------------- _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs