On Wed, Jan 17, 2018 at 9:34 PM, Jacek Kołodziej <kolodziejj@xxxxxxxxx> wrote:
Hi Tom,On Wed, Jan 17, 2018 at 7:56 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:=?UTF-8?Q?Jacek_Ko=C5=82odziej?= <kolodziejj@xxxxxxxxx> writes: That sounds problematic to me too, but how certain are you that the "other
> Here's what happening to me: the "A" query occasionally (in my case: on the
> order of tenths per day) returns an ID _higher_ than any ID present in
> second query's result (other conditions I haven't specified do _not_ filter
> any more rows than "id <= max ID") - as if some entries were visible for
> the first query, but not for the second one. This is an inconsistency that
> is very problematic for me.
conditions you haven't specified" aren't suppressing the last row? That'd
certainly be the least surprising explanation. If it isn't that, though,
this surely seems like a bug.
Yes, I'm fairly sure of that. When I execute that same "B" query again some time afterwards, it returns all expected rows - I mean, also these that were "included" in original "A" query and that were "missing" in "B" one first time around.Can you determine whether the row(s) missing in the second query are
freshly committed? Or have they been there awhile?
Depends on what would be considered "fresh", usually it's on the order of miliseconds or seconds.> Where am I wrong? What am I missing? What information may I provide to help
> with investigating this?
Probably the best thing to spend time on would be to try to extract a
publishable test case. It would be really hard to get to the bottom
of an issue like this without having a reproducer. It's okay if it
takes awhile to reproduce the fault ...
I'd certainly love to have a working repro. I won't be able to do it for the next few days but I'll work on this right after the weekend.Also, before spending a whole lot of time on this: are you on 9.6.6?
If not, update, just in case this is an already-fixed issue. The
symptoms don't sound familiar, but I don't want to waste a lot of
time only to find out it's some manifestation of a known bug.
regards, tom lane
I'm using 9.6.5; I'm not administrating it so it might take some time before updating but once it's done, I'll get back with whether that fixed the situation. In the meantime, when trying to reproduce it locally, I'll use both 9.6.5 and 9.6.6 to see whether it makes any difference.Thank you very much for the suggestions.--
Hello again,
unsurprisingly, the fault was on my side - it was discovered by my colleague. Source of the problem was disregarding of how sequences work - i.e., how they produce monotonic numbers at query time (in this case: IDs for the primary key) but these queries (and,in turn, numbers from the sequence) may be committed in a different order (T1, T2, T3 - transactions; T1 and T2 appends to the events table, T3 reads from it with "A" and "B" queries):
- T1 was inserting an event (it got ID 6) - it did not commit yet!
- T2 was inserting an event (it got ID 7) and it commits
- T3 has made a query "A" (which gets "max event's ID" equal to 7 at that time) - and selects events with query "B" (which gets event with ID 7, but not the one with ID 6 - because it has not been comitted yet)
- T1 commits but that event (ID equals to 6) has been omitted already and won't be picked up ever again
REPEATABLE READ could not help us in such situation.
We ended up - at least for now - forcing serialization of inserting the events (so that their IDs will always match the order in which they are inserted _and comitted_ into the database); while it may be suboptimal, it seems like a good-enough solution for our use case for the time being and we'll be working on more scalable solution in the future.
Thank you for looking into this and inspiration for further investigation.