hi Xavi, Writev inodelk lock whole file, so write speed is bad. If inodelk(offset,len), IDA_KEY_SIZE xattr will be not consistent crossing bricks from unorder writev. So how about just use IDA_KEY_VERSION and bricks ia_size to check data crash? Drop IDA_KEY_SIZE, and lookup lock whole file, readv lock (offset,len). I guess, this can get good performance and data consistent. Thanks. -terrs > Hi, > > current implementation of ec xlator uses inodelk/entrylk before each operation > to guarantee exclusive access to the inode. This implementation blocks any > other request to the same inode/entry until the previous operation has > completed and unlocked it. > > This adds a lot of latency to each operation, even if there are no conflicts > with other clients. To improve this I was thinking to implement something > similar to eager-locking and piggy-backing. > > The following is an schematic description of the idea: > > * Each operation will build a list of things to be locked (this could be 1 > inode or up to 2 entries). > * For each lock in the list: > * If the lock is already acquired by another operation, it will add itself > to a list of waiting operations associated to the operation that > currently holds the lock. > * If the lock is not acquired, it will initiate the normal inodelk/entrylk > calls. > * The locks will be acquired in a special order to guarantee that there > couldn't be deadlocks. > * When the operation that is currently holding the lock terminates, it will > test if there are waiting operations on it before unlocking. If so, it will > resume execution of the next operation without unlocking. > * In the same way, xattr updating after operation will be delayed if another > request was waiting to modify the same inode. > > The case with 2 locks must be analyzed deeper to guarantee that intermediate > states combined with other operations don't generate deadlocks. > > To avoid stalls of other clients I'm thinking to use GLUSTERFS_OPEN_FD_COUNT > to see if the same file is open by other clients. In this case, the operation > will unlock the inode even if there are other operations waiting. Once the > unlock is finished, the waiting operation will restart the inodelk/entrylk > procedure. > > Do you think this is a good approximation ? > > Any thoughts/ideas/feedback will be welcome. > > Xavi > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel