On Tue, Aug 26, 2014 at 06:04:52PM +0800, Zhang Qiang wrote: > Thanks Dave/Greg for your analysis and suggestions. > > I can summarize what I should do next: > > - backup my data using xfsdump > - rebuilt filesystem using mkfs with options: agcount=32 for 2T disk > - mount filesystem with option inode64,nobarrier Ok up to here. > - applied patches about adding free list inode on disk structure No, don't do that. You're almost certain to get it wrong and corrupt your filesysetms and lose data. > As we have about ~100 servers need back up, so that will take much effort, > do you have any other suggestion? Just remount them with inode64. Nothing else. Over time as you add and remove files the inodes will redistribute across all 4 AGs. > What I am testing (ongoing): > - created a new 2T partition filesystem > - try to create small files and fill whole spaces then remove some of them > randomly > - check the performance of touch/cp files > - apply patches and verify it. > > I have got more data from server: > > 1) flush all cache(echo 3 > /proc/sys/vm/drop_caches), and umount filesystem > 2) mount filesystem and testing with touch command > * The first touch new file command take about ~23s > * second touch command take about ~0.1s. So it's cache population that is your issue. You didn't say that first time around, which means the diagnosis was wrong. Again, it's having to search a btree with 220 million inodes in it to find the first free inode, and that btree has to be pulled in from disk and searched. Once it's cached, then each subsequent allocation will be much faster becaue the majority of the tree being searched will already be in cache... > I have compared the memory used, it seems that xfs try to load inode bmap > block for the first time, which take much time, is that the reason to take > so much time for the first touch operation? No. reading the AGI btree to find the first free inode to allocate is what is taking the time. If you spread the inodes out over 4 AGs (using inode64) then the overhead of the first read will go down proportionally. Indeed, that is one of the reasons for using more AGs than 4 for filesystems lik ethis. Still, I can't help but wonder why you are using a filesystem to store hundreds of millions of tiny files, when a database is far better suited to storing and indexing this type and quantity of data.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs