On 5/4/2014 2:18 PM, Graham Leggett wrote:
Nothing in the above seems to indicate an error that I can see, but we now see this two seconds later:
[04/May/2014:23:03:38 +0200] - ERROR bulk import abandoned
[04/May/2014:23:03:38 +0200] - import userRoot: Aborting all Import threads...
[04/May/2014:23:03:43 +0200] - import userRoot: Import threads aborted.
[04/May/2014:23:03:43 +0200] - import userRoot: Closing files...
[04/May/2014:23:03:43 +0200] - libdb: userRoot/uid.db4: unable to flush: No such file or directory
This indicates some sort of deep badness.
It appears that despite the initial sync as having failed, we ignore the above error and pretend all is ok, I suspect this is why we're getting the weird messages below.
Yes, the prime error seems to be the database file error above. Once you
have that, all bets are off.
So..hmm... "no such file" ENOENT is very very odd. Is there anything
peculiar about the filesystem this server is using ? Anything funky with
permissions (although you'd expect an EPERM in that case) ?
The file (uid.db4 et al) would be opened previously (or should have
been). It is perplexing as to why the error would show up on the
fsync(). How does a file exist one second, then not the next? I'm
guessing that the error code has been mangled, or means something
different than might be deduced from the text.
It might be worth using the "strace" command with -p <pid of ns-slapd>,
starting it prior to the replica init operation, and see what kernel
calls the process is making. Also try turning up the logging "to 11"
(not actually 11... but Spinal Tap - style -- I think it is 65535 to get
all logging output).
You could also try an "import" of some LDIF data into that same server.
It will exercise the same code as far as opening and writing to the
database files. It would be interesting to see whether that throws the
same ENOENT error, or not.
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users