On Oct 27, 2008 10:30 +0100, Alex Bligh wrote: > --On 27 October 2008 11:40:21 +0200 Markus Peuhkuri <puhuri@xxxxxx> wrote: > >> However, as my delete script malfunctioned, and at one point it had >> 2x100 GB files to delete; thus running 'rm file' one after one for those >> 400 files, about 500 MB each. What then resulted was that the >> real-time data processing became too slow and and buffers overfload. > > Are all the files in the same directory? Even with HTREE there seem > to be cases where this is surprisingly slow. Look into using nested > directories (e.g. A/B/C/D/foo where A, B, C, D are truncated hashes > of the file name). > > Or, if you don't mind losing data in a power off and the job suits, > unlink the file name immediately your processor has opened it. Then > it will be deleted on close. No, it is likely the problem is with the ext3 indirect block pointer updates for large files. This will also put a lot of blocks into the journal and if the journal is full it can block all other operations. If you run with ext4 extents the unlink time is much shorter, though you should test ext4 yourself before putting it into production. Doing the "unlink; sleep 1" will keep the traffic to the journal lower, as would deleting fewer files more often to ensure you don't delete 200GB of data at one time if you have real-time requirements. If you are not creating files faster than 1/s unlinks should be able to keep up. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users