Re: replication from 1.2.8.3 to 1.2.10.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/13/2012 08:02 AM, Robert Viduya wrote:
I've enabled the core dump stuff, but now I can't seem to get it to crash.  But I'm still getting the changelog messages in the error logs whenever I restart.  In addition, the hub server keeps running out of disk space.  I tracked it down to the access log filling up with MOD messages from replication.  It looks like changes are coming down from our 1.2.8 servers and being applied over and over again.  As an example, one of our entries was modified three times today, and on all our other machines I see the following in the access log file:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"

But on the problematic hub server, I see:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
...

I truncated the output for brevity, but there's over 250 MODs to that one object.  It's as if the server isn't able to do the replication bookkeeping and is accepting changes over and over again.  Eventually the disk fills up.
Do you see error messages from the supplier suggesting that it is attempting to send the operation but failing and retrying?
No, there's nothing in the error logs on the supplier side.

Do all of these operations have the same CSN?  The csn will be logged with the RESULT line for the operation.  Also, what is the err=? for the MOD operations?  err=0?  Some other code?
Here's some sample out, again, limited for brevity.  Most of the RESULT lines don't have a CSN, just the first few.  All the err= codes are 0.  I've grepped out just the DN sample from my previous mail, again for brevity.  There's a lot more DNs being reported:

[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2000000000330000
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff200f000000330000
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff2027000000330000
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=170 MOD dn="gtdirguid=64898416edc9887656a2f933ae48a113,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=170 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b5000300330000
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=172 MOD dn="gtdirguid=e824607afc4eb02a105b633bcbf9e7c1,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=172 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000100330000
[12/Jul/2012:16:03:44 -0400] conn=3 op=172 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
[12/Jul/2012:16:03:44 -0400] conn=3 op=172 RESULT err=0 tag=120 nentries=0 etime=0
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:40:34 -0400] conn=3 op=173 MOD dn="gtdirguid=427dd677597bb6143e227143e771b811,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:40:34 -0400] conn=3 op=173 RESULT err=0 tag=103 nentries=0 etime=0 csn=4fff25b6000200330000
[12/Jul/2012:16:03:47 -0400] conn=3 op=173 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[12/Jul/2012:16:03:47 -0400] conn=3 op=173 RESULT err=0 tag=120 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 RESULT err=0 tag=103 nentries=0 etime=0
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 RESULT err=0 tag=103 nentries=0 etime=0

The upgrade to 1.2.10.12 seems to have fixed the issue however, I'm not seeing these repeated entries anymore nor am I seeing changelog error messages when I restart the server.  I know you're all working on 1.2.11, but are there any major problems with 1.2.10.12 that's keeping it from being pushed to stable?

The only thing 1.2.10.12 needs is testers to give it positive karma ("Works For Me") in https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-6265/389-ds-base-1.2.10.12-1.el5 or whatever your platform is.

If you don't have a FAS account or don't want to do this, do I have your permission to provide your name and email to the update as a user for which the update is working?

1.2.10.4 definitely isn't working for us.
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users



[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux