On 09/07/2011 04:29 PM, Rich Megginson wrote: > On 09/07/2011 05:06 PM, Noriko Hosoi wrote: >> Rich Megginson wrote: >>> The problem comes from the method we use to check if the changelog does >>> not match the database in replica_check_for_data_reload(). The RUV in >>> the database contains obsolete elements from replicas that are no longer >>> in use. replica_check_for_data_reload() uses ruv_covers_ruv() to see if >>> all of the max csns in the database ruv are in the changelog maxruv, and >>> vice versa. It fails because the database ruv contains these obsolete >>> elements not found in the changelog maxruv. >>> >>> My question is - why do we care? Isn't it sufficient to check that the >>> replicageneration in the changelog is the same as the replicageneration >>> in the database ruv? The replicageneration is supposed to be the unique >>> identifier of the "starting point" of the replicated data. >>> If the data >>> is reloaded (e.g. from an ldif not created with db2ldif -r), a new >>> replicageneration will be created, and the data will mismatch. >> That's right. And the problem is the database RUV never be updated once >> the data is reloaded from such an ldif file? > If the data is reloaded from a "plain" ldif file, a new RUV and new > replicageneration will be created. > >> Then, the server recreates >> the changelog every time the server is restarted? > If the data is reloaded from a "plain" ldif file, the server will see > that the changelog does not match, and will erase the changelog. The > reason why this bug is causing the server to recreate the changelog > every time it is restarted is because of the extra ruv elements that do > not match any of the ruv elements in the changelog max ruv. >> You mentioned "remove >> them" in the proposed warning. Is it the only way to adjust the >> database RUV? > As far as I can tell, the only way to adjust the database RUV is to > 1) dump data using db2ldif -r > 2) manually edit the file to remove the obsolete RUV elements > 3) reload the data using ldif2db > > Note that, due to > /export1/share/ds/ds.git(master)>git show e9fa8249|morecommit > e9fa82493548d84ac7bd2fa1f857db0023ac800d > Author: Nathan Kinder<nkinder@xxxxxxxxxx> > Date: Tue Jan 18 08:29:50 2011 -0800 > > Bug 543633 - replication problems if supplier is killed under > update load > > ldapmodify to fix the ruv entry will deadlock the server. See > https://bugzilla.redhat.com/show_bug.cgi?id=590826 for details. > > We should definitely fix the deadlock too. Agreed. >>> Or, alternately, leave the check for all of the ruv elements in, but >>> just warn if the database contains ruv elements not in the cl maxruv >>> e.g. something like >>> "WARNING: The database RUV contains these elements not present in the >>> changelog max ruv: >>> .... >>> These elements may be obsolete, in which case you should remove them. >>> If they are not obsolete, you should check those servers to make sure >>> replication is occurring." >> If the database RUV is not used at all, I think there is no benefit to >> maintain it... Warning would rather confuse users, wouldn't it? > We need to have some way to clean up obsolete ruv elements. I remember > this issue coming up on the 389-users list some time ago, but I did not > know that it could lead to data loss. > > I think the warning would be acceptable as long as we had clear > procedures for removing the obsolete ruv elements and checking the > status of the other replicas. I think that a warning is fine too, though a ruv cleanup method is needed as you mention. >> --noriko >> -- >> 389-devel mailing list >> 389-devel@xxxxxxxxxxxxxxxxxxxxxxx >> https://admin.fedoraproject.org/mailman/listinfo/389-devel > -- > 389-devel mailing list > 389-devel@xxxxxxxxxxxxxxxxxxxxxxx > https://admin.fedoraproject.org/mailman/listinfo/389-devel -- 389-devel mailing list 389-devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-devel