Re: Trouble with paxos service for large PG count

Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> · Tue, 02 Apr 2013 16:37:24 +0100

On 04/02/2013 02:41 PM, Joao Eduardo Luis wrote:
Hi Jim,

One thing to keep in mind is that with the monitor's rework we now share
the Paxos instance across all Paxos services.  That may slow things down
a bit, given paxos proposals for different services are now queued and
have to wait their turn.  But what's happening to you appears to be
something completely different -- see below.

On 04/01/2013 10:14 PM, Jim Schutt wrote:
[snip]

For this last configuration, after collecting the above
I waited a bit, started all the OSDs, waited a bit longer,
then collected this:

2013-04-01 14:54:37.364686 7ffff328d700 10
mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-01 14:54:37.433641 7ffff328d700  5
mon.cs31@0(leader).paxos(paxos active c 1..27) queue_proposal bl
10629660 bytes; ctx = 0x11ece50
2013-04-01 14:54:37.433750 7ffff328d700  5
mon.cs31@0(leader).paxos(paxos preparing update c 1..27)
propose_queued 28 10629660 bytes
2013-04-01 14:54:37.433755 7ffff328d700 10
mon.cs31@0(leader).paxos(paxos preparing update c 1..27)
propose_queued list_proposals 1 in queue:
2013-04-01 14:55:38.684532 7ffff328d700 10
mon.cs31@0(leader).paxos(paxos preparing update c 1..27) begin for 28
10629660 bytes
2013-04-01 14:55:38.814528 7ffff328d700 10
mon.cs31@0(leader).paxos(paxos updating c 1..27) commit 28
2013-04-01 14:55:38.937087 7ffff328d700 10
mon.cs31@0(leader).paxos(paxos active c 1..28) finish_queued_proposal
finishing proposal
2013-04-01 14:55:38.937120 7ffff328d700 10
mon.cs31@0(leader).paxos(paxos active c 1..28) finish_queued_proposal
finish it (proposal = 0x1c6e3c0)
2013-04-01 14:55:38.937124 7ffff328d700 10
mon.cs31@0(leader).paxos(paxos active c 1..28) finish_queued_proposal
proposal took 61.503375 to finish
2013-04-01 14:55:38.937168 7ffff328d700 10
mon.cs31@0(leader).paxosservice(pgmap) _active

It looks like finish_queued_proposal processing time is scaling
quadratically with the proposal length for pgmaps.

Ah! The reason is for such a delay just became obvious, and it's due to
a quite dumb mistake.  Basically, during Paxos::begin() we're running
the whole transaction on the JSON formatter, and then outputting it with
log level 30 -- we should be checking for the log level first to avoid
spending valuable time on that, specially when transactions are huge.

Well, this might not be right after all.

Besides that, while looking into another bug, I noticed that there's a
slight problem with the logic of monitor and, at a given point, each
transaction we create ends up not only containing an incremental but
also a full version, which is bound to slow things down and exacerbate
what I just described in the previous paragraph.

But this would certainly explain the 10MB transaction on 'Paxos::begin'.

I'm now actively looking into this and created ticket #4620 on the 
tracker [1].

  -Joao

[1] - http://tracker.ceph.com/issues/4620

   -Joao

FWIW, I don't believe I saw any issues of this sort for
versions 0.58 and earlier.

Please let me know if there is any other information I can
provide that will help to help fix this.

Thanks -- Jim

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html