Re: cosd multi-second stalls cause "wrongly marked me down"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jim,

We have seen this problem before. The usual suspects are the oom
killer (grep for "out of memory" in syslog).
Unfortunately, SIGKILL is uncatchable and that's what the OOM killer sends.

Another problem that can prevent core files from being generated is
bad ulimit -c settings or a bad setting for core_pattern and friends.
One problem I have a lot too is that the partition I'm writing core
files to fills up.

If none of that works, it's possible that someone is calling exit()
somewhere. You can attach a gdb to the process and put a breakpoint on
exit() to see if this is going on. There's a lot of "your foo is not
bar enough, I hate your config, exit(1)" type code that gets executed
while the daemon is starting up. It sounds like you should be past
that point, though.

Colin


On Wed, Mar 2, 2011 at 2:57 PM, Jim Schutt <jaschut@xxxxxxxxxx> wrote:
>
> On Wed, 2011-03-02 at 14:59 -0700, Jim Schutt wrote:
>>
>> On Wed, 2011-03-02 at 14:45 -0700, Sage Weil wrote:
>> > On Wed, 2 Mar 2011, Jim Schutt wrote:
>> > >
>> > > On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
>> > > > > I'll see if I see the same signature with master,
>> > > > > and post logs.
>> > > >
>> > > > Thanks!  Keep us posted.
>> > >
>> > > Hmmm, I'm not having much luck with master (commit
>> > > 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
>> > > lots of dead OSDs during startup.
>> >
>> > Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.
>>
>> I try it out, thanks!
>
> I don't get any more core files with master commit 67355779ecc.
> Now my cosds just die - no stack trace in the log, no core
> file, nothing in syslog or dmesg ...
>
> I'm not sure how to track down what's happening here...
>
> -- Jim
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux