Re: Custom opclass for column statistics?

Ancoron Luciferis <ancoron.luciferis@xxxxxxxxxxxxxx> · Sat, 6 Jul 2019 17:35:33 +0200

On 06/07/2019 15:38, Tomas Vondra wrote:
> On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote:
>> Hi,
>>
>> I've been wondering whether it is possible somehow to have the standard
>> column statistics to respect a certain operator class?
>>
>> The reason why I am asking for this is that I have a UUID column with a
>> unique index at it using a custom operator class which implies a
>> different sort order than for the default UUID operator class.
>>
>> This results into planner mistakes when determining whether to use the
>> index for row selection or not. Too often it falls back into sequential
>> scan due to this.
>>
> 
> Can you share an example demonstrating the issue?
> 
> 
> regards
> 

Yes, I have an opclass as follows:

CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid
    USING btree AS
        OPERATOR        1       <*,
        OPERATOR        1       <~ (uuid, timestamp with time zone),
        OPERATOR        2       <=*,
        OPERATOR        2       <=~ (uuid, timestamp with time zone),
        OPERATOR        3       =,
        OPERATOR        3       =~ (uuid, timestamp with time zone),
        OPERATOR        4       >=*,
        OPERATOR        4       >=~ (uuid, timestamp with time zone),
        OPERATOR        5       >*,
        OPERATOR        5       >~ (uuid, timestamp with time zone),
        FUNCTION        1       uuid_timestamp_cmp(uuid, uuid),
        FUNCTION        1       uuid_timestamp_only_cmp(uuid, timestamp
with time zone),
        FUNCTION        2       uuid_timestamp_sortsupport(internal)
;

...and e.g. operator "<*" is defined as:

CREATE FUNCTION uuid_timestamp_lt(uuid, uuid)
RETURNS bool
AS 'MODULE_PATHNAME', 'uuid_timestamp_lt'
LANGUAGE C
IMMUTABLE
LEAKPROOF
STRICT
PARALLEL SAFE;

COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than';

CREATE OPERATOR <* (
    LEFTARG = uuid,
    RIGHTARG = uuid,
    PROCEDURE = uuid_timestamp_lt,
    COMMUTATOR = '>*',
    NEGATOR = '>=*',
    RESTRICT = scalarltsel,
    JOIN = scalarltjoinsel
);

The function "uuid_timestamp_lt" is basically defined as follows:
1. if not version 1 UUID fallback to standard uuid compare
2. extract timestamp values and compare
3. if equal timestamps fallback to standard uuid compare

...so that a chronological order is established.

The test table is created as follows:

CREATE TABLE uuid_v1_ext (id uuid);
CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops);

The values for "histogram_bounds" of the test table look like this (due
to the default sort order for standard type UUID):

00003789-97bf-11e9-b6bb-e03f49f7f733
008b88f8-6deb-11e9-901a-e03f4947f477
010a8b22-586a-11e9-8258-e03f49ce78f3
...
6f682e68-978d-11e9-901a-e03f4947f477
6ff412ee-926f-11e9-901a-e03f4947f477
7079ffe2-642f-11e9-b0cc-e03f49e7fd3b
70ffaeca-4645-11e9-adf9-e03f494677fb
...
fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b
ff779ce8-9e52-11e9-8258-e03f49ce78f3
ffff6bfc-4de4-11e9-b0d4-e03f49d6f6bf

...and I think that's where the planner gets the decision for a query
such as:

DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3';

...which then get's executed as sequential scan instead of an index scan.

I was also thinking about changing the selectivity function used by the
custom operator, but I didn't find any hints how to implement that
without duplicating a lot of internal code.

Cheers,

	Ancoron