[389-users] Multimaster replication out of sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 12/12/2009 12:06 AM, Rich Megginson wrote:
> Mitja Miheli? wrote:
>>
>>
>> On 12/07/2009 05:18 PM, Rich Megginson wrote:
>>> Mitja Mihelic wrote:
>>>> Hi!
>>>>
>>>> We have two instances of the DS in a multimaster replication setup.
>>>> We had to restore the database of one of the servers from backup.
>>>> While the second master was down, the first was receiving updates.
>>>> After we fired up the restored master it started receiving updates as
>>>> soon as a change occurred on the first master (i.e. after 15 minutes)
>>>> After the sync finished, we noticed they weren't identical.
>>>> Clicking "Send updates now" from the replication agreement does not 
>>>> help.
>>>>
>>>> Is there a way to get them synced up again ? Other than reinitializing
>>>> the second/restored master ?
>>> How long was the server down?  How old was the backup it was 
>>> restored from?
>> The server was not down long, but the backup was about 10 hours old.
>> This was a backup at filesystem level made by ufsdump. It was not a 
>> "regular" DS backup.
>> When we restored the database file from the dump the server booted OK.
>>
>> Then we made little test:
>> - made another ufsdump of the second master
>> - shut down the server
>> - let the primary master update for an hour
>> - restored the second master's database from the dump
>> - started the second master
>> - let them do their replication magic
>> - isolated both servers (i.e. no updates)
>> - compared the LDIF dumps
>> Again, they were not the same.
>>
>> We probably should have used the built in backup functionality, right ?
> Yes, although I'm not sure what would be causing the problems you see.
>
> In general, when the database state changes, you have to reinitialize 
> replication.
We tried the built-in backup:
/usr/lib/dirsrv/serverReplica/db2bak 
/var/lib/dirsrv/serverReplica/bak/`date +%Y_%m_%d_%H_%M_%S`

Executed the same test procedure as described above.

There are still entries on the primary server that do not get replayed 
on the secondary.

An error message (repeated every 5 minutes) from the primary master 
SERVER1 occurs when a record, that is missing on the secondary, gets 
updated on the primary:
[16/Dec/2009:10:26:02 +0100] NSMMReplicationPlugin - agmt="cn=MM to 
SERVER2" (SERVER2:389): Consumer failed to replay change (uniqueid 
25ab6e01-1dd211b2-bdbbda0a-92130000, CSN 4b28a7ac0000000b0000): No such 
object. Skipping.

My reasoning would be: if the entry does not exist on the consumer, 
create it. But I guest that is not how the mechanism works.
I'm still scratching my head about this one...

Regards,
Mitja




[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux