Andreas, 1, The size of files I created for benchmark is 0 byte. I created the files by this script: > for i in `seq 1 50`;do for j in `seq 1 10000`;do touch `/usr/bin/keygen | head -c 8`;done;done 2, using magic inode will not generate compatibility issues. for fsck do not understand magic inode can ignore and remove the magic inodes. This can only happen when fsck is performced and filesystem code can rebuild the magic inode if it can not be found (this will take a some time for reading inode table when mount). Best regards. Coly P.S here is the result of my benchmark: I created 500000 zero byte files in a directory named "sub", record times for: 1, copy sub to another dir named "ordered1" in another harddisk. 2, copy dir "ordered1" to "ordered2" in another harddisk. 3, reboot the system and repeat 2 (change target to ordered3). 4, remove ordered3, ordered2, ordered 1. 5, remove sub. >From the benchmark, I found no much performance improved for hash ordered inode allocating when data=journal and data=ordered. created 500000 new file in dir called "sub" by this script: for i in `seq 1 50`;do for j in `seq 1 10000`;do touch `/usr/bin/keygen | head -c 8`;done;done ==== data=writeback ==== copy sub to another dir named "ordered1": real 7m17.616s user 0m1.456s sys 0m27.586s copy dir "ordered1" to "ordered2": real 0m45.231s user 0m1.340s sys 0m21.233s reboot copy dir "ordered2" to "ordered3": real 1m8.764s user 0m1.568s sys 0m26.050s remove ordered3 by rm -rf ordered3: real 0m9.200s user 0m0.168s sys 0m8.893s remove ordered2 by rm -rf ordered2: real 0m12.225s user 0m0.128s sys 0m8.857s remove ordered1 by rm -rf ordered1: real 0m37.493s user 0m0.076s sys 0m11.089s remove original dir "sub": real 9m49.902s user 0m0.220s sys 0m14.377s ==== data=journal ==== copy sub to another dir named "ordered1": real 6m54.151s user 0m1.696s sys 0m22.705s copy dir "ordered1" to "ordered2": real 7m7.696s user 0m1.416s sys 0m23.541s reboot copy dir "ordered1" to "ordered2": real 10m46.649s user 0m1.792s sys 0m28.778s remove ordered1 by rm -rf ordered1: real 12m54.271s user 0m0.192s sys 0m15.353s remove ordered2 by rm -rf ordered2: real 13m37.035s user 0m0.260s sys 0m15.009s remove ordered3 by rm -rf ordered3: real 7m43.703s user 0m0.216s sys 0m12.117s remove sub by rm -rf sub: real 10m41.150s user 0m0.188s sys 0m13.781s ===== data=ordered ==== copy sub to another dir named "ordered1": real 7m57.016s user 0m1.632s sys 0m25.558s copy dir "ordered1" to "ordered2": real 7m46.037s user 0m1.604.s sys 0m24.902s reboot copy dir "ordered2" to "ordered3": real 8m21.952s user 0m1.720s sys 0m28.290s remove ordered1 by rm -rf ordered1: real 10m12.652s user 0m0.272s sys 0m15.049s remove ordered2 by rm -rf ordered2: real 9m21.770s user 0m0.220s sys 0m15.025s remove ordered3 by rm -rf ordered3: real 6m32.278s user 0m0.176s sys 0m12.093s remove sub by rm -rf sub: real 10m17.966s user 0m0.236s sys 0m14.453s 在 2007-03-20二的 03:51 -0600,Andreas Dilger写道: > On Mar 20, 2007 17:22 +0800, coly wrote: > > 1, I did benchmark on large number of file copy and remove. The method > > is what you did and told me before (create many file in a dir, copy this > > dir, remove the new and original dirs). > > * In data=journal and data=ordered, not much performance improve will > > be gained from inode reservation. For every inode modification will be > > submitted into journal at once, no chance to merge multiple inode > > modification in one inode table into 1 journal submitting. > > That shouldn't be true. Whether operation is data=journal or data=writeback > the filesystem metadata (i.e. inode table, directory) will always be in the > journal. Unless operation is always sync'd then it should still be possible > to merge many filesystem operations into a single journal transaction (so > that they can share the changes to the same blocks). > > Now, whether the implementation matches the theory is a different question. > It would be interesting to figure out why your test results are not showing > the same performance between data=ordered and data=writeback. How large > are the files being unlinked? Maybe if they are large the truncate time is > long enough that the journal transaction is being committed? Maybe with > data=journal there is so much going into the journal that it also forces a > commit because the journal is full? > > > 2, In order to management reserved inode table for each directories, > > especially when files number of a directory exceeded the current > > reserved limitation, a list is needed to manage the reserved inode > > tables. I want to use some inode on disk as pointer. I think only by > > this way, we can avoid to change ext4 on disk meta data format. > > For some inodes used as pointers of list, I can assign MAGIC numbers > > for them, identify them from normal inodes. But fsck and mkfs should be > > modified to understand these MAGIC numbers. > > With helps for these pointers (inode with special MAGIC number), inode > > reservation can be implemented more easy. > > If you are making a magic inode, and it needs e2fsck and mke2fs support, > then this by nature is a change to the filesystem format (though possibly > one that allows an easy upgrade from existing filesystems). If we need > to change the on-disk format then there are a number of other changes we > could make, including having "inode in directory" format, which will avoid > this problem entirely because readdir and inode order are always the same. > > I would suggest emailing to the linux-ext4 list with details of findings > (performance, tests that have been run) so that everyone can read and > comment on it. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html