Search Postgresql Archives

Re: Efficient rows filter for array inclusion with gin index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Replying to myself after more investigation.

On 28/02/2024 12:05, Shanti-Dominique wrote:



2)
    SELECT  *
    FROM    items i1
            JOIN item_paths p1 ON i1.ref_id = p1.ref_id
            JOIN items i2 ON ARRAY[i2.ref_id] <@ p1.item_path
    WHERE   ...

2) uses the operator <@ which is supported by the gin index, the test for inclusion is fast and the query does not run a sequential scan over the whole "item_paths" table. However, because of the ARRAY[i2.ref_id] construct, it performs a sequential scan on i2.
I was under the assumption that the ARRAY[] construct prevented postgresql from efficiently using the index on the other side of the operator, but I think I was mistaken. On a database full of data, I tried getting around this but did not see any improvement of performance.

First I tried to add an index on the single element array:

CREATE FUNCTION uuidarr(ref_id uuid) RETURNS uuid[]
  LANGUAGE SQL
  IMMUTABLE
  RETURNS NULL ON NULL INPUT
  RETURN ARRAY[ref_id];

CREATE INDEX items_ref_id_arr2_idx ON items USING gin (uuidarr(ref_id));

EXPLAIN
SELECT  i2.*
FROM    items i1
        JOIN item_paths p1 ON i1.ref_id = p1.ref_id
        JOIN items i2 ON uuidarr(i2.ref_id) <@ p1.item_path
WHERE   i1.name = 'a';


The performance was even worse. Then I tried with a generated column:

CREATE TABLE items (
    ref_id uuid DEFAULT public.gen_random_uuid() NOT NULL,
    ref_id_array uuid[] GENERATED ALWAYS AS (uuidarr(ref_id)) STORED,
    name character varying,
    parent_ref_id uuid
);

CREATE INDEX items_ref_id_array_idx ON items USING gin (ref_id_array);

EXPLAIN
SELECT  i2.*
FROM    items i1
        JOIN item_paths p1 ON i1.ref_id = p1.ref_id
        JOIN items i2 ON i2.ref_id_array <@ p1.item_path
WHERE   i1.name = 'a';

The performance was very similar to the query with ARRAY[...]

It seems there is no good solution for the general case, apart from changing the structure of my dataset and removing the use of arrays entirely.

I think I'll update my codebase and use <@ where it makes sense and = ANY in other places, but it'll be difficult to know for sure without running the query which one will be better.






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux