On Tue, 21 May 2013 09:10:18 -0400 (EDT) Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote: > [...] > Solution: > Since we want to prevent two parallel self-heals. We let them compete in a separate "domain". Lets call the domain on which the locks have been taken on in previous approach as "data-domain". > > In the new approach When a self-heal is triggered it > acquires a full lock in the new domain "self-heal-domain". > After this it performs data-self-heal using the locks in "data-domain" in the following manner: > Acquire full file lock and get xattrs on file and decide source/sinks unlock full file lock. > acquire lock with range 0 - 128k, sync the data from source to sinks in range 0 - 128k unlock 0 - 128k lock. > acquire lock with range 128k+1 - 256k, sync the data from source to sinks in range 128k+1 - 256k, unlock 128k+1 - 256k lock. > ..... > until the end of file is reached do this. > acquire full file lock and decrement the pending counts then unlock the full file lock. > unlock the full file lock in "self-heal-domain" > > scenario-1 won't happen because there exists a chance for it to acquire truncate's full file lock after any 128k range sync happens. > Scenario-2 won't happen because extra self-heals that are launched on the same file will be blocked in self-heal-domain so the data-path's locks are not affected by this. > > Let me know if you see any problems/suggestions with this approach. > > Pranith. I am not quite sure if I understood the issue in full detail. But are you saying that you "split up" the current self-healing file in 128K chunks with locking/unlocking (over the network)? It sounds a bit like the locking takes more (cpu) time than the self-healing of the data itself. I mean this can be a 10 G link where a complete file could be healed in almost no time, even if the file is quite big. Sure WAN is different, but I really would like to have at least an option to drop the partial locking completely and lock the full file instead. -- Regards, Stephan