I thought about the 'ugly' for the last hour but for every problem I could
think of there is a solution. But it could be very complex. If you erase a subdirectory while one of the bricks is down and try to open a file inside it you would have to check the versioning of two directories before deciding ether to duplicate or erase the file. It would have to be a recursive algorithm that could be very costly.
This is not so much of a problem. If directory versioning check (fixing sub-entries) is done on the lookup() of every directory then we will not face such issues. Before a lookup()(following which open()) is done on an entry, a lookup() will _surely_ be done on its parent directory. So if you reach upto the level of a file, to say, open it, then lookup()'s on all its directories will already be done, starting top -> down. So fixing problems of directories only to its immediate-children level will work well. The creating of files doesn't open() the dir, so you would have to add
another call to change the version everytime something changes inside a dir (files, modes, permissions, ...).
yes, it is wrong to assume opendir() would have happened while altering sub-entries. Ideally all checks should have to be done on a lookup(). But lookup is an operation done VERY often and we wanted to keep lookup() lean for performance reasons (doing self-heal involves holding locks, perform check, perform fix optionally, unlock). Doing so much in open is reasonable (number of open calls : number of lookup calls ~ 1:10 or worse). Moving the sync-up to lookup() has to be done carefully to make sure performance is not hit, and we will definitely do this in the next release. regards, -- Anand V. Avati