On Thu, Jul 2, 2020 at 2:02 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Anders Steinlein <anders@xxxxxx> writes:
> We have a materialized view from which a customer reported some
> confusing/invalid results, leading us to inspect the query and not finding
> anything wrong. Running the query defining the matview manually, or
> creating a new (identical) materialized view returns the correct result.
> Obviously, we've done REFRESH MATERIALIZED VIEW just before doing the
> comparison, and all runs are in the same schema.
I suspect the query underlying the matviews is less deterministic than
you think it is. I did not study that query in any detail, but just
from a quick eyeball: the array_agg() calls with no attempt to enforce a
particular aggregation order are concerning, and so is grouping by
a citext column (where you'll get some case-folding of a common value,
but who knows which).
Thanks for the tip, but I'm having a hard time thinking that's the case, seeing as I'm unable to trigger the wrong result no matter how hard I try with a new definition/manual query. I've introduced random ordering to the first CTE-clause (where the initial citext values comes from, and casing thus could differ in some order) which doesn't change the result.
When the citext type is used throughout the query, shouldn't the grouping result be deterministic? The citext values are first "rolled up" with array_agg() and later unnested and finally grouped. Shouldn't the end result be the same, regardless of what particular case-folded version of the value it chooses to group on?
I've simplified the query for this particular customer case that, again, always returns the correct result no matter how often I try:
mm_prod=> SELECT sid, count(*) FROM (
WITH tagged_contacts AS (
SELECT lid, email, cl.skip_preexisting_campaigns AS skip_subscribed,
ct.skip_preexisting_campaigns AS skip_tags, ladded, tagname, created
FROM contacts_lists cl
LEFT JOIN contacts_tags ct USING (email)
WHERE lstatus = 'a'
ORDER BY random()
),
tagged_segments AS (
SELECT s.lid, cid, sid, sp.type, sp.mid, matchdelay, tagname, event,
count(*) OVER (PARTITION BY sid) AS requirements,
activated_at
FROM segments s
LEFT JOIN campaigns USING (cid)
INNER JOIN segments_predicates sp USING (sid)
WHERE
s.archived_at IS NULL
AND (cid IS NULL OR activated_at IS NOT NULL)
),
segments_contacts AS (
SELECT lid, sid, requirements,
CASE
WHEN type = 'has_tag' THEN (
SELECT array_agg(DISTINCT email::citext)
FROM tagged_contacts
WHERE
lid = s.lid
AND tagname = s.tagname
AND (matchdelay IS NULL OR created + matchdelay < now())
AND (
cid IS NULL
OR (
created >= activated_at
AND NOT COALESCE(skip_tags, false)
)
)
)
END AS emails
FROM tagged_segments s
),
unnested AS (
SELECT lid, sid, requirements, unnest(emails) AS email
FROM segments_contacts
)
SELECT lid, sid, email
FROM unnested
GROUP BY lid, sid, email, requirements
HAVING count(email) = requirements
) x
WHERE sid = 42259
GROUP BY sid;
SELECT lid, email, cl.skip_preexisting_campaigns AS skip_subscribed,
ct.skip_preexisting_campaigns AS skip_tags, ladded, tagname, created
FROM contacts_lists cl
LEFT JOIN contacts_tags ct USING (email)
WHERE lstatus = 'a'
ORDER BY random()
),
tagged_segments AS (
SELECT s.lid, cid, sid, sp.type, sp.mid, matchdelay, tagname, event,
count(*) OVER (PARTITION BY sid) AS requirements,
activated_at
FROM segments s
LEFT JOIN campaigns USING (cid)
INNER JOIN segments_predicates sp USING (sid)
WHERE
s.archived_at IS NULL
AND (cid IS NULL OR activated_at IS NOT NULL)
),
segments_contacts AS (
SELECT lid, sid, requirements,
CASE
WHEN type = 'has_tag' THEN (
SELECT array_agg(DISTINCT email::citext)
FROM tagged_contacts
WHERE
lid = s.lid
AND tagname = s.tagname
AND (matchdelay IS NULL OR created + matchdelay < now())
AND (
cid IS NULL
OR (
created >= activated_at
AND NOT COALESCE(skip_tags, false)
)
)
)
END AS emails
FROM tagged_segments s
),
unnested AS (
SELECT lid, sid, requirements, unnest(emails) AS email
FROM segments_contacts
)
SELECT lid, sid, email
FROM unnested
GROUP BY lid, sid, email, requirements
HAVING count(email) = requirements
) x
WHERE sid = 42259
GROUP BY sid;
sid | count
-------+-------
42259 | 98
(1 row)
-------+-------
42259 | 98
(1 row)
This is stale data for this customer, so no data changes are occurring to change the results. I can REFRESH MATERIALIZED VIEW as many times I was on the original segments_with_contacts matview, and I never see different results. If it were not deterministic, shouldn't I expect to see different results one in at least 100 times tried?
Thanks again for any insight to try and figure this out. Again, I could just re-create the matview we use in production and it would likely work (since I'm unable to get wrong results with a newly created case), but I would rather try to find out the root cause here first.
Best,
-- a.