Re: Trouble with paxos service for large PG count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/02/2013 09:36 PM, Jim Schutt wrote:
On 04/02/2013 01:16 PM, Jim Schutt wrote:
On 04/02/2013 12:28 PM, Joao Luis wrote:
Right.  I'll push a patch to bump that sort of output to 30 when I get home.

Thanks - but FWIW, I don't think it's the root cause of my
issue -- more below....

OK, I see now that you're talking about the output from list_proposals().
Indeed, when I cause list_proposals() in propose_queued() to only
be called at paxos debug 30, the stall in finish_queued_proposal()
mostly goes away.  Now, with 55392 PGs I'm seeing this:

2013-04-02 14:21:19.401820 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-02 14:21:19.535731 7ffff328d700  5 mon.cs31@0(leader).paxos(paxos active c 1..33) queue_proposal bl 19870402 bytes; ctx = 0x11ecff0
2013-04-02 14:21:19.535917 7ffff328d700  5 mon.cs31@0(leader).paxos(paxos preparing update c 1..33) propose_queued 34 19870402 bytes
2013-04-02 14:21:19.535925 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..33) begin for 34 19870402 bytes
2013-04-02 14:21:19.754326 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos updating c 1..33) commit 34
2013-04-02 14:21:19.975747 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..34) finish_queued_proposal finishing proposal
2013-04-02 14:21:19.975799 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..34) finish_queued_proposal finish it (proposal = 0x790c0c0)
2013-04-02 14:21:19.975802 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..34) finish_queued_proposal proposal took 0.439887 to finish
2013-04-02 14:21:19.975913 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) _active


If you're willing, try reducing the paxos debug level to 0 and let us know
if those delays in the proposal go away.

When I started debugging why I couldn't start new filesystems, several
days ago, I wasn't using any debugging.  It took me until last Friday
to track the issue into paxos, which is when I turned on its debugging.


With that list_proposals() debug output issue out of the way, I'm
still having trouble starting up a new filesystem -- still working
to try to isolate the source of the issue....

Thanks -- Jim



Good to know!

With regard to your troubles, what appears to go wrong?

  -Joao
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux