On 09/07/2011 05:06 PM, Noriko Hosoi wrote: > Rich Megginson wrote: >> The problem comes from the method we use to check if the changelog does >> not match the database in replica_check_for_data_reload(). The RUV in >> the database contains obsolete elements from replicas that are no longer >> in use. replica_check_for_data_reload() uses ruv_covers_ruv() to see if >> all of the max csns in the database ruv are in the changelog maxruv, and >> vice versa. It fails because the database ruv contains these obsolete >> elements not found in the changelog maxruv. >> >> My question is - why do we care? Isn't it sufficient to check that the >> replicageneration in the changelog is the same as the replicageneration >> in the database ruv? The replicageneration is supposed to be the unique >> identifier of the "starting point" of the replicated data. >> If the data >> is reloaded (e.g. from an ldif not created with db2ldif -r), a new >> replicageneration will be created, and the data will mismatch. > That's right. And the problem is the database RUV never be updated once > the data is reloaded from such an ldif file? If the data is reloaded from a "plain" ldif file, a new RUV and new replicageneration will be created. > Then, the server recreates > the changelog every time the server is restarted? If the data is reloaded from a "plain" ldif file, the server will see that the changelog does not match, and will erase the changelog. The reason why this bug is causing the server to recreate the changelog every time it is restarted is because of the extra ruv elements that do not match any of the ruv elements in the changelog max ruv. > You mentioned "remove > them" in the proposed warning. Is it the only way to adjust the > database RUV? As far as I can tell, the only way to adjust the database RUV is to 1) dump data using db2ldif -r 2) manually edit the file to remove the obsolete RUV elements 3) reload the data using ldif2db Note that, due to /export1/share/ds/ds.git(master)>git show e9fa8249|morecommit e9fa82493548d84ac7bd2fa1f857db0023ac800d Author: Nathan Kinder <nkinder@xxxxxxxxxx> Date: Tue Jan 18 08:29:50 2011 -0800 Bug 543633 - replication problems if supplier is killed under update load ldapmodify to fix the ruv entry will deadlock the server. See https://bugzilla.redhat.com/show_bug.cgi?id=590826 for details. We should definitely fix the deadlock too. >> Or, alternately, leave the check for all of the ruv elements in, but >> just warn if the database contains ruv elements not in the cl maxruv >> e.g. something like >> "WARNING: The database RUV contains these elements not present in the >> changelog max ruv: >> .... >> These elements may be obsolete, in which case you should remove them. >> If they are not obsolete, you should check those servers to make sure >> replication is occurring." > If the database RUV is not used at all, I think there is no benefit to > maintain it... Warning would rather confuse users, wouldn't it? We need to have some way to clean up obsolete ruv elements. I remember this issue coming up on the 389-users list some time ago, but I did not know that it could lead to data loss. I think the warning would be acceptable as long as we had clear procedures for removing the obsolete ruv elements and checking the status of the other replicas. > --noriko > -- > 389-devel mailing list > 389-devel@xxxxxxxxxxxxxxxxxxxxxxx > https://admin.fedoraproject.org/mailman/listinfo/389-devel -- 389-devel mailing list 389-devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-devel