Geo-replication: Resolving GFID Conflict

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



GFID conflict is common problem Geo-replication faces today. Reasons
for GFID conflicts are,

1. ignore_deletes option is set in Geo-replication. If files/dirs
   created, deleted and created again. Slave volume will have the file
   with Old GFID because ignore_deletes is set. When Geo-rep tries to
   sync new file, Fails with GFID conflict.
2. Files copied to Slave Volume from Master Volume using external
   tools other than Geo-replication. GFID will be different for same
   file in Master and Slave Volume.
3. Unlink of a file is failed, and same file is created again.
4. Rename failures. If old named file still exists in Slave Volume and
   same file(Name same as Old name) is created again in Master Volume.
5. Pre existing Slave Volume with data same as Master but synced via
   external tools.
6. Cluster Failover and Failback. When Slave Volume becomes Master,
   IOs on Slave volume can change the GFID of the files.
7. Files edited in Slave Volume.(For example, open in vi editor, edit,
   save and close)

If we add intelligence to Geo-rep for auto resolving GFID conflicts
then

1. Rsync/Tarssh will not fail and skip with error 23.
2. Master Volume and Slave Volume will be in Sync.


How to fix
==========
gfid-heal
---------
To solve this problem, we need to add gfid-heal capabilities to
Geo-replication.


During create entry in Slave Volume, if fails with GFID conflict,

0. Entry Creation on Slave Volume gets EEXIST and disk GFID is not
   same as GFID from Changelog.
1. Check that PGFID/basename exists in Master Volume
2. If not exists, ignore
3. If exists, Compare GFID from Changelog with disk GFID. If both
   GFID are same then, Send GFID heal request to Slave
4. If GFID on disk is not same as Changelog GFID then ignore.


Archive it
----------
Vijay suggested to archive the conflict file instead of healing the
GFID.

0. Entry Creation on Slave Volume gets EEXIST and disk GFID is not
   same as GFID from Changelog.
1. Check that PGFID/basename exists in Master Volume
2. If not exists, ignore
3. If exists, Compare GFID from Changelog with disk GFID. If both
   GFID are same then rename the conflicted file/directory to
   .gfid_conflicts directory in mount. Add Timestamp to the moved file.
4. If GFID on disk is not same as Changelog GFID then ignore.



Second approach looks more cleaner and old files will be archived and
not overwritten as in the first approach. Admin can periodically look
in the .gfid_conflicts directory and cleanup the files/dirs.


Challenges
----------
1. Race between AFR self-heal and RENAME of directory. (BZ 1240333)


Let me know your thoughts. Thanks.

--
regards
Aravinda

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux