On 07/16/2013 04:49 PM, Rich Megginson
wrote:
On 07/16/2013 01:23 AM, Mitja Mihelič
wrote:
On 07/15/2013 05:28 PM, Rich
Megginson wrote:
On 07/15/2013 02:57 AM, Mitja
Mihelič wrote:
On 07/12/2013 05:55 PM, Rich
Megginson wrote:
On 07/12/2013 08:22 AM, Mitja
Mihelič wrote:
On 07/09/2013 03:34 PM,
Rich Megginson wrote:
On 07/09/2013 06:43 AM,
Mitja Mihelič wrote:
Hi!
We are having problems with some our 389-DS
instances. They crash after receiving an update from
the provider.
After looking at the stack trace, I think this is https://fedorahosted.org/389/ticket/47391
Yes, it looks like it might be it. When CONSUMER_ONE crashed
for the first time, the last thing replicated was a password
change.
Do you perhaps know, where I could get a 389DS version for
Centos6 that has the patch? The ticket says it was pushed to
1.2.11, but would seem that our 1.2.11.15-14 is still an
unpatched one and the repositories do not have any newer
versions.
Is that the 389-ds-base that is included with CentOS6?
Yes, the 389-ds-base-1.2.11.15-14.el6_4.x86_64 and
389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 are from the official
Centos6 updates repoository.
389-ds-base-debuginfo is from http://debuginfo.centos.org/6/
The rest are from epel.
Looking at the stack trace you sent earlier - there is only 1
thread? You ran
gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof ns-slapd` > stacktrace.`date +%s`.txt 2>&1
? If so, I have no idea what's going on - I've never seen the server deadlock itself with only 1 thread . . .
I ran
gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply
all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof -o 49171 ns-slapd`
> stacktrace.`date +%s`.txt 2>&1
The "-o 49171" is to exclude the pid of the config server instance,
so only the problematic pid was looked at.
If you get any more information regarding this crash it would be
very much appreciated.
It may be best if I removed all 389DS related data from both of the
consumer servers and start fresh. If they crash again I will send
the relevant stack traces.
The crash happened twice after about a
week of running without problems. The crashes
happened on two consumer servers but not at the same
time.
The servers are running CentOS 6x with the following
389DS packages installed:
389-ds-console-doc-1.2.6-1.el6.noarch
389-console-1.1.7-1.el6.noarch
389-adminutil-1.1.15-1.el6.x86_64
389-dsgw-1.1.10-1.el6.x86_64
389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64
389-admin-1.1.29-1.el6.x86_64
389-ds-console-1.2.6-1.el6.noarch
389-admin-console-doc-1.1.8-1.el6.noarch
389-ds-1.2.2-1.el6.noarch
389-ds-base-1.2.11.15-14.el6_4.x86_64
389-ds-base-libs-1.2.11.15-14.el6_4.x86_64
389-admin-console-1.1.8-1.el6.noarch
We are in the process of replacing the Centos 5x
base consumer+provider setup with a CentOS 6x base
one. For the time being, the CentOS 6 machines are
acting as consumers for the old server. They run for
a while and then the replicated instances crash
though not at the same time.
One of the servers did not want to start after the
crash,
Can you provide the error messages from the errors
log?
I have attached error logs from the provider
(2013-06-27-provider_error) and the consumer
(2013-06-27-server_two_error) in question.
so I have run db2index on its database.
It's been running for four days and it has still not
finished.
Try exporting using db2ldif, then importing using
ldif2db.
The export process hangs. After an hour strace still
shows:
futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL
The error log for this is attached as
2013-07-10-server_two-ldif_import_hangs.
Are you using db2ldif or db2ldif.pl? If you are using
db2ldif, is the server running? If not, please try first
shutting down the server and use db2ldif.
If db2ldif still hangs, then please follow the
instructions at http://port389.org/wiki/FAQ#Debugging_Hangs
to get a stack trace of the hung process.
I was using db2ldif with the server shut down. I tried it
again and it hung. The LDIF file was created but its size
was zero. The produced stack trace is attached as
server_two-db2ldif_hang-stacktrace.1373877200.txt.
All I get from db2index now are these
outputs:
[09/Jul/2013:13:29:11 +0200] - reindex db: Processed
65095 entries (pass 1104) -- average rate
53686277.5/sec, recent rate 0.0/sec, hit ratio 0%
How many entries do you have in your database?
The number revolves around 65400. It varies perhaps 2
user del/add operations a month and 20 attribute changes
per week, if that.
The other instance did start up, but the replication
process did not work anymore. I disabled the
replication to this host and set it up again. I
chose "Initialize consumer now" and the consumer
crashed every time.
Can provide a stack trace of the core when the server
crashes? This may be different than the stack trace
below.
The last provided stack trace was produced at the last
server crash. I will provide another stack trace when
CONSUMER_ONE crashes again. Currently it refuses to
crash at initialization time and keeps running.
I have enabled full error logging and
could find nothing.
I have read a few threads (not all, I admit) on this
list and
http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes
and tried to troubleshoot.
The crash produced the attached core dump and I
could use your help with understanding it. As well
as any help with the crash. If more info is needed I
will gladly provide it.
Regards, Mitja
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users
|