Frankly, I am quite losts in the sea of argument :-).....but let try sharing my points: >> >> I can see the inode locking for the entire duration of a file reorg >> would be unacceptable at some later point in the development / release >> cycle. >> > That's exactly we are intending. We are planning to get down to block > level in our later milestones. I have mentioned that earlier as well. > Yes, locking can be done either at the block or inode level. But I would like to suggest is Oracle's mechanism - NO LOCKS at all!!! To copy something from A to B, u either freeze all changes to A, and copy it, or, u just go straight and copy it, AND UNDO any unnecessary changes that have been done. The latter u can get from the journalling logs (errr....does not apply to ext2) - whereby all changes are in terms of transaction. So any data chnages which is not closed.....is considered incomplete, and therefore will be undone. Another criteria is TIME (explained later). But then again journalling have two types: it either hold all the latest data changes, or it does not, but just an indication of WHERE changes are made (only the metadata) - called writeback or data journalling (http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html). So based on all these it is possible to reconstruct the data changes. So now u compare - in your imagination - one operation with (locks + copy) x 1000000 times repeated, and another one of (copy) x 1000000 + undoing changes based on journalling......in scenario where there is very little chagnes.....the Oracle mechanism wins in term of performance. The emphasis here is that there is NO LOCKING OVERHEAD now. That is how "online backup in Oracle" works. And another criteria for slicing the journalling logs is time. Based on a certain point in time, all transactions before will be flushed out (meaning done), and after that undone, if data changes have been made. Oracle can do that is because ALL data changes are recorded in journalling (FULL journalling, and is always the case). In ext3 case, if we have data journalling (not writeback) then this is possible....this is default in my distro (Fedora) and mentioned in the ext3-faq as well. Another emphasis is point-in-time. Everything at a particular point in time is always consistent. But if u lock one file and unlock it and lock the other files.....different files copied at different times....u may end up one file being much more FORWARDED in its recency (or times of "updatedness" than another). Ie, data inconsistency. Which is why in Oracle backup procedures, it is never never locking at the per-file level - which is your inode locking. All files are always backup at the same time. Hopefully I have explained myself clear enough? -- Regards, Peter Teoh -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ