Re: Custom opclass for column statistics?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 06, 2019 at 05:35:33PM +0200, Ancoron Luciferis wrote:
On 06/07/2019 15:38, Tomas Vondra wrote:
On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote:
Hi,

I've been wondering whether it is possible somehow to have the standard
column statistics to respect a certain operator class?

The reason why I am asking for this is that I have a UUID column with a
unique index at it using a custom operator class which implies a
different sort order than for the default UUID operator class.

This results into planner mistakes when determining whether to use the
index for row selection or not. Too often it falls back into sequential
scan due to this.


Can you share an example demonstrating the issue?


regards


Yes, I have an opclass as follows:

CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid
   USING btree AS
       OPERATOR        1       <*,
       OPERATOR        1       <~ (uuid, timestamp with time zone),
       OPERATOR        2       <=*,
       OPERATOR        2       <=~ (uuid, timestamp with time zone),
       OPERATOR        3       =,
       OPERATOR        3       =~ (uuid, timestamp with time zone),
       OPERATOR        4       >=*,
       OPERATOR        4       >=~ (uuid, timestamp with time zone),
       OPERATOR        5       >*,
       OPERATOR        5       >~ (uuid, timestamp with time zone),
       FUNCTION        1       uuid_timestamp_cmp(uuid, uuid),
       FUNCTION        1       uuid_timestamp_only_cmp(uuid, timestamp
with time zone),
       FUNCTION        2       uuid_timestamp_sortsupport(internal)
;

...and e.g. operator "<*" is defined as:

CREATE FUNCTION uuid_timestamp_lt(uuid, uuid)
RETURNS bool
AS 'MODULE_PATHNAME', 'uuid_timestamp_lt'
LANGUAGE C
IMMUTABLE
LEAKPROOF
STRICT
PARALLEL SAFE;

COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than';

CREATE OPERATOR <* (
   LEFTARG = uuid,
   RIGHTARG = uuid,
   PROCEDURE = uuid_timestamp_lt,
   COMMUTATOR = '>*',
   NEGATOR = '>=*',
   RESTRICT = scalarltsel,
   JOIN = scalarltjoinsel
);


The function "uuid_timestamp_lt" is basically defined as follows:
1. if not version 1 UUID fallback to standard uuid compare
2. extract timestamp values and compare
3. if equal timestamps fallback to standard uuid compare

...so that a chronological order is established.


The test table is created as follows:

CREATE TABLE uuid_v1_ext (id uuid);
CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops);


The values for "histogram_bounds" of the test table look like this (due
to the default sort order for standard type UUID):

00003789-97bf-11e9-b6bb-e03f49f7f733
008b88f8-6deb-11e9-901a-e03f4947f477
010a8b22-586a-11e9-8258-e03f49ce78f3
...
6f682e68-978d-11e9-901a-e03f4947f477
6ff412ee-926f-11e9-901a-e03f4947f477
7079ffe2-642f-11e9-b0cc-e03f49e7fd3b
70ffaeca-4645-11e9-adf9-e03f494677fb
...
fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b
ff779ce8-9e52-11e9-8258-e03f49ce78f3
ffff6bfc-4de4-11e9-b0d4-e03f49d6f6bf

...and I think that's where the planner gets the decision for a query
such as:

DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3';

...which then get's executed as sequential scan instead of an index scan.

I was also thinking about changing the selectivity function used by the
custom operator, but I didn't find any hints how to implement that
without duplicating a lot of internal code.


Not sure, I'm not very familiar with this code, so I'd have to play with
it and try things. But that's hard when I don't have any code. Would it
be possible to share a small self-contained test case?

I wonder what does uuid_timestamp_cmp do? I suppose it first compares by
a timestamp extracted from the UUID, right?

It'd be interesting to see

(a) statistics for the column from pg_stats, both for the table and
index (which should have been built using the custom opclass, I think).

(b) EXPLAIN ANALYZE for queries with your opclass, and perhaps with the
default one (that can't use the timestamp condition, but it should be
possible to generate smallers/largest uuid for a timestamp).

BTW which PostgreSQL version is this?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux