Hi again,
I have now had time to do further research about this issue. I have been
able to produce a script (available on request) that reproduces the
problem, even in tables as small as 100 items and using a single thread,
and as a result may have located an area that may cause the problem: A
conditional index.
As mentioned in the attachment to my previous email the table which is
having the problem look like this:
Table
"public.probedata2_probequeue"
Column | Type |
Modifiers
----------------+----------------------+-------------------------------------
id | bigint | not null default
nextval('probedata2_probequeue_id_seq'::regclass)
part_of_run_id | integer | not null
server_id | integer | not null
state | character varying(1) |
Indexes:
"probedata2_probequeue_pkey" PRIMARY KEY, btree (id)
"run_server" UNIQUE CONSTRAINT, btree (part_of_run_id, server_id)
"probedata2_probequeue_finished" btree (id) WHERE state::text =
'F'::text
"probedata2_probequeue_run_idle" btree (part_of_run_id) WHERE
state::text = 'I'::text
"probedata2_probequeue_started" btree (part_of_run_id) WHERE
state::text = 'S'::text
Foreign-key constraints:
"probedata2_probequeue_part_of_run_id_fkey" FOREIGN KEY
(part_of_run_id) REFERENCES probedata2_proberun(id) DEFERRABLE INITIALLY
DEFERRED
"probedata2_probequeue_server_id_fkey" FOREIGN KEY (server_id)
REFERENCES probedata2_server(id) DEFERRABLE INITIALLY DEFERRED
In my test database I have been running tests without the
"probedata2_probequeue_run_idle"-equivalent and the other conditional
indexes.
Without the "run_idle"-index the queries (fetch idle candidates, mark as
started, fetch the records) complete in less than a second (<0.9
seconds),
as expected.
*With* the index, the time to complete that operation increases by a
factor ranging from 10-30 times for small sequences and a single thread,
to 1000 times for large sequences (23 million, 4-10 threads), taking up
to 20
minutes to complete an update of 6000 rows for each thread, running the
process at 100% CPU the whole time.
The purpose of the index is to provide quick access to the idle items
for
a given job, both records and counts. Normally, there will currently be
just a single active job in the table, and at the time the problem is
occurring all entries for the job will be in the index.
As I mentioned last time, the problematic command is be the UPDATE
command
UPDATE probedata2_probequeue SET "state"=E'S'
WHERE "state" = E'I' AND id IN ( .....)
RETURNING id
and I have confirmed that with the built-in Django debug query
information
in my test script.
On Sat, 08 Feb 2014 15:57:10 +0100, Yngve N. Pettersen
<yngve@xxxxxxxxxxxxx> wrote:
Hi again,
Sorry about the delay, but an unexpected 3 week trip combined with not
being
able to fix the system's router whose configuration had become
corrupted
before I left, meant that I could not perform any testing until this
week,
after the router had been fixed and reinstalled.
I just did a test using expanded logging of queries and duration, and
actually saw this happen with just 4 (four) processes, not the 10 I
have
seen before.
The relevant parts of the log, as well as an EXPLAIN and table info
dump
are attached in a zipfile; the large parts consisting of 6000
to-be-updated
IDs in the UPDATE commands have been removed for clarity (the sequences
were not overlapping, with a numerical distance of at least 80000 from
the other updates, and each sequence was within a range of 7500 IDs)
Background: Immediately before this operation the queue had been set
up with the
command
INSERT INTO probedata2_probequeue (part_of_run_id, server_id,
state)
SELECT '334' AS part_of_run_id, server_id, E'I' AS state FROM
probedata2_preparedqueueitem
WHERE part_of_queue_id = '2'
followed by a COMMIT, a process taking ~15 minutes to copy a 23 million
entry table into the queue. After this the total number of entries in
the
target table is 70 million, distributed across 3 runs.
Some SELECT count and ANALYZE operations have probably also been
performed
after the initialization operation, before processes
Shortly afterward (within a couple of minutes), four processes
initiated the SELECT and UPDATE
sequence I outlined earlier, each operation was performed between 1
and 10
seconds after one of the others, 20 seconds from start to last command
started.
SELECT id FROM probedata2_probequeue
WHERE state = E'I' AND part_of_run_id = 334
LIMIT 6000 OFFSET 85103
UPDATE probedata2_probequeue SET "state"=E'S'
WHERE "state" = E'I' AND id IN ( .....)
RETURNING id
The SELECT operations took 21-122ms to complete, while the UPDATEs took
1093000-1120000 ms (~18 minutes, to complete). During this time the
processes were running at 100% CPU. With more processes involved
earlier I
recall seeing even longer execution times for the UPDATEs, before I
killed
the processes (this is the first time I have seen this kind of
situation
be resolved without killing the processes).
For reference, once these 4 commands had completed (the system is set
up
to wait until it sees task completed messages from the first processes
that were started, before starting new ones), the next sequence of
these
commands took 122ms and 107ms, respectively, and the second took 50ms
and
108ms.
Any suggestions for where to investigate further?
I am considering setting up a small example to see if I can reproduce,
but
have not had time to do so yet.
On Sat, 04 Jan 2014 20:06:19 +0100, Yngve N. Pettersen
<yngve@xxxxxxxxxxxxx> wrote:
On Sat, 04 Jan 2014 19:40:31 +0100, Andrew Sullivan
<ajs@xxxxxxxxxxxxxxx> wrote:
On Sat, Jan 04, 2014 at 07:07:08PM +0100, Yngve N. Pettersen wrote:
I tried that before, but ran into some issues, IIRC a similar
looping
problem as this where queries never ended. I split it up in an
attempt to
solve that problem.
Pulling the data out into the application and sending it back in
won't
improve things. Exactly the same number of rows need to be visited,
but the way you have it now you have to marshall all the data and
ship
it to the application too. So it's automatically slower. Indeed,
making it slower might have masked your problem.
Could be
In the select/update case there is no sorting in the query; there
is an
offset/limit clause though, number of records retrieved are
currently
restricted to <10000 per query (out of 20 million in the active
subset).
SELECT id from queue where state = E'I' and job_id = <integer>
offset
<random 200..150000> limit <1-6000>
This could be part of problem. Are the different threads working on
different job_ids, or is that the same job_id? If you don't SORT
that
Same job_id, at least in the current system.
query before the OFFSET, then the rows will come back in whatever
order the system likes.
I suspect that a sort operation on (currently) 20+ million rows for
every query for just 6000 (previous version was 1500 entries) would
cause quite a bit more slowness than breaking up the query in two
operations, or the risk of collisions would, because each process
would have to load all that information (even if it is cached).
However, in the UPDATE case, the looping processes are all UPDATE
queries,
no SELECTs involved.
But you said it's all in the same transaction scope. The lock is a
the transaction scope.
But the statement it locks/loops on are only UPDATE statements, also
in the processes that are waiting.
Anyway, what I'd do is try to cause the condition and post the
pg_locks information. When I've done this in the past, usually the
best thing to do is also to have query logs on for everything (don't
forget to log the pid!) so you can see what the other transaction
you're waiting on touched already. You can usually find the
inversion
that way. Once you see it, it's always obvious what you've done, in
my experience (and completely mystifying before that, unfortunately).
Will take a look in a few days, probably midweek.