rm -rf issues in Geo-replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Problem:
--------
Each geo-rep workers process Changelogs available in their bricks,
if worker sees RMDIR, it tries to remove that directory recursively.
Since rmdir is recorded in all the bricks, rm -rf is executed
in parallel.

Due to DHTs open issues of parallel rm -rf, Some of the directories
will not get deleted in Slave Volume(Stale directory layout). If same
named dir is created in Master, then Geo-rep will end up in inconsistent
state since GFID is different for the new directory and directory
exists in Slave.


Solution - Fix in DHT:
---------------------
Hold lock during rmdir, so that parallel rmdir will get blocked and
no stale layouts.


Solution - Fix in Geo-rep:
--------------------------
Temporarily we can fix in Geo-rep till DHT fixes this issue. Since
Meta Volume is available with each Cluster, Geo-rep can keep lock
for GFID of dir to be deleted.

For example,

when rmdir:
    while True:
        try:
# fcntl lock in Meta volume $METAVOL/.rmdirlocks/<GFID>
            get_lock(GFID)
            recursive_delete()
            release_and_del_lock_file()
            break
        except (EACCES, EAGAIN):
            continue

One worker will succeed and all other workers will get ENOENT/ESTALE,
which can be safely ignored.


Let us know your thoughts.

--
regards
Aravinda
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux