Gist indexing performance with cidr types

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I'm trying to get a query to run fast enough for interactive use. I've gotten
some speed-up, but still not there. It is for a tool called IRRExplorer
(http://irrexplorer.nlnog.net/) that correlates IP routes between Internet
Route Registries and real-world routing information. We landed on PostgreSQL
largely due to indexing of the cidr type with gist indexing.

* Preliminaries:

The query is about getting data from this table:

irrexplorer=> \d routes
      Table "public.routes"
Column | Type | Modifiers -----------+---------+-----------
 route     | cidr    | not null
 asn       | bigint  | not null
 source_id | integer | not null
Indexes:
    "unique_entry" UNIQUE CONSTRAINT, btree (route, asn, source_id)
    "route_gist" gist (route inet_ops)
Check constraints:
    "positive_asn" CHECK (asn > 0)
Foreign-key constraints:
    "routes_source_id_fkey" FOREIGN KEY (source_id) REFERENCES sources(id)

Complete DDL: https://github.com/job/irrexplorer/blob/master/data/schema.sql

Data set: 125 MB on disk, 2.3 million rows.

Running stock PostgreSQL 9.4.4 on Ubuntu 14.10 (hosted on VMWare on OS X)

Have done VACUUM, ANALYZE, and checked memory settings, and tried to increase
work_mem, but with no success. The issue seems cpu bound (100% cpu load during
query).


* Query

When a user inputs an AS number a simple match on the asn column will return
the stuff relevant. However, the interesting thing to display is
conflicting/rogue routes. This means matching routes with the && operator to
find all covered/covering routes. This would look something like this:

SELECT rv.route, rv.asn, rv.source
FROM routes_view rv
LEFT OUTER JOIN routes_view r ON (rv.route && r.route)
WHERE rv.route && r.route AND r.asn = %s

While this is fairly fast if the initial set of routes is relatively small (<100) it runs with a second or so, but if the number of routes matching the asn is large (>1000), it takes quite a while (+30 seconds). Explain analyze link:

http://explain.depesz.com/s/dHqo

I am not terribly good at reading the output, but it seem most of the time is
actually spend on the bitmap scan for the gist index. It there another type of
indexing that would behave better here?

Since there often identical routes in the initial set of routes (from the AS number matching), I tried reducing the initial set of matching routes:
ORDER BY rv.route;

SELECT rv.route, rv.asn, rv.source
FROM
    (SELECT DISTINCT route FROM routes_view WHERE asn = %s) r
LEFT OUTER JOIN routes_view rv ON (r.route && rv.route)

This often cuts the set of initial matching routes by 25-50%, and cuts a
similar amount of time from the query time. Explain analyze link:

http://explain.depesz.com/s/kf13

I tried further reducing the routes to the minimal set:

WITH distinct_routes AS (SELECT DISTINCT route FROM routes_view WHERE asn = %s)
SELECT route FROM distinct_routes
EXCEPT
SELECT r1.route
FROM distinct_routes r1
    INNER JOIN distinct_routes r2 ON (r1.route << r2.route);

But typically only yields 10-20% reduction of the inital route set, and adds query complexity (putting the above in a CTE/WITH seems to make the query significantly slower for some reason).

The main issue seem to be with the gist bitmap index. Is there a better way to
approach this style of query?


    Best regards, Henrik



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux