On Thu, Feb 14, 2013 at 9:35 AM, Nicolas Charles <nicolas.charles@xxxxxxxxxxxxx> wrote: > Hello, > > I've been struggling to understand what's happening on my databases/query > for several days, and I'm turning to higher mind for a logical answer. > > I'm dealing with a fairly large database, containing logs informations, that > I crunch to get data out of it, with several indexes on them that I hoped > were logical > > \d ruddersysevents > Table « public.ruddersysevents » > Colonne | Type | > Modificateurs > --------------------+--------------------------+-------------------------------------------------- > id | integer | non NULL Par défaut, > nextval('serial'::regclass) > executiondate | timestamp with time zone | non NULL > nodeid | text | non NULL > directiveid | text | non NULL > ruleid | text | non NULL > serial | integer | non NULL > component | text | non NULL > keyvalue | text | > executiontimestamp | timestamp with time zone | non NULL > eventtype | character varying(64) | > policy | text | > msg | text | > Index : > "ruddersysevents_pkey" PRIMARY KEY, btree (id) > "component_idx" btree (component) > "configurationruleid_idx" btree (ruleid) > "executiontimestamp_idx" btree (executiontimestamp) > "keyvalue_idx" btree (keyvalue) > "nodeid_idx" btree (nodeid) > Contraintes de vérification : > "ruddersysevents_component_check" CHECK (component <> ''::text) > "ruddersysevents_configurationruleid_check" CHECK (ruleid <> ''::text) > "ruddersysevents_nodeid_check" CHECK (nodeid <> ''::text) > "ruddersysevents_policyinstanceid_check" CHECK (directiveid <> ''::text) > > > It contains 11018592 entries, with the followinf patterns : > 108492 distinct executiontimestamp > 14 distinct nodeid > 59 distinct directiveid > 26 distinct ruleid > 35 distinct serial > > Related table/index size are > relation | size > ----------------------------------------+--------- > public.ruddersysevents | 3190 MB > public.nodeid_idx | 614 MB > public.configurationruleid_idx | 592 MB > public.ruddersysevents_pkey | 236 MB > public.executiontimestamp_idx | 236 MB > > > I'm crunching the data by looking for each nodeid/ruleid/directiveid/serial > with an executiontimestamp in an interval: > > explain analyze select executiondate, nodeid, ruleid, directiveid, serial, > component, keyValue, executionTimeStamp, eventtype, policy, msg from > RudderSysEvents where 1=1 and nodeId = > '31264061-5ecb-4891-9aa4-83824178f43d' and ruleId = > '61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10 and > executiontimestamp between to_timestamp('2012-11-22 16:00:16.005', > 'YYYY-MM-DD HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467', > 'YYYY-MM-DD HH24:MI:SS.MS') ORDER BY executionTimeStamp asc; > Sort (cost=293125.41..293135.03 rows=3848 width=252) (actual > time=28628.922..28647.952 rows=62403 loops=1) > Sort Key: executiontimestamp > Sort Method: external merge Disk: 17480kB > -> Bitmap Heap Scan on ruddersysevents (cost=74359.66..292896.27 > rows=3848 width=252) (actual time=1243.150..28338.927 rows=62403 loops=1) > Recheck Cond: ((nodeid = > '31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid = > '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)) > Filter: ((serial = 10) AND (executiontimestamp >= > to_timestamp('2012-11-22 16:00:16.005'::text, 'YYYY-MM-DD > HH24:MI:SS.MS'::text)) AND (executiontimestamp <= to_timestamp('2013-01-25 > 18:53:52.467'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text))) > -> BitmapAnd (cost=74359.66..74359.66 rows=90079 width=0) (actual > time=1228.610..1228.610 rows=0 loops=1) > -> Bitmap Index Scan on nodeid_idx (cost=0.00..25795.17 > rows=716237 width=0) (actual time=421.365..421.365 rows=690503 loops=1) > Index Cond: (nodeid = > '31264061-5ecb-4891-9aa4-83824178f43d'::text) > -> Bitmap Index Scan on configurationruleid_idx > (cost=0.00..48562.32 rows=1386538 width=0) (actual time=794.490..794.490 > rows=1381391 loops=1) > Index Cond: (ruleid = > '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) > Total runtime: 28657.352 ms > > > > I'm surprised that the executiontimestamp index is not used, since it seems > to be where most of the query time is spent. > > For all my tests, I removed all the incoming logs, so that this table has > only selects and no writes > > I'm using Postgres 8.4, on a quite smallish VM, with some process runnings, > with the following non default configuration > shared_buffers = 112MB > work_mem = 8MB > maintenance_work_mem = 48MB > max_stack_depth = 3MB > wal_buffers = 1MB > effective_cache_size = 128MB > checkpoint_segments = 6 > > Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the > performances (less than 10%). I would have expected it to improve, since the > indexes would all fit in RAM > Every explain analyze made after modification of configuration and indexes > where done after a complete batch of crunch of logs by our apps, to be sure > the stats were corrects > > I tested further with the indexes, and reached these results : > 1/ dropping the unique index "configurationruleid_idx" btree (ruleid), > "executiontimestamp_idx" btree (executiontimestamp), "keyvalue_idx" btree > (keyvalue), "nodeid_idx" btree (nodeid) to replace them by a unique index > did lower the perfs : > > create index composite_idx on ruddersysevents (executiontimestamp, ruleid, > serial, nodeid); > > Index Scan using composite_idx on ruddersysevents (cost=0.01..497350.22 > rows=3729 width=252) (actual time=7.989..83717.901 rows=62403 loops=1) > Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22 > 16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND > (executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text, > 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid = > '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (nodeid > = '31264061-5ecb-4891-9aa4-83824178f43d'::text)) > > > 2/ Removing nodeid from the index did lower again the perf > > create index composite2_idx on ruddersysevents (executiontimestamp, ruleid, > serial); > > Index Scan using composite2_idx on ruddersysevents (cost=0.01..449948.98 > rows=3729 width=252) (actual time=23.507..84888.349 rows=62403 loops=1) > Index Cond: ((executiontimestamp >= to_timestamp('2012-11-22 > 16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND > (executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text, > 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND (ruleid = > '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10)) > Filter: (nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text) > > 3/ Removing executiontimestamp from the composite index makes the query > performs better at the begining of its uses (around 17 secondes), but over > time it degrades (I'm logging query longer than 20 secondes, and there are > very rare in the first half of the batch, and getting more and more common > at the end) to what is below > > create index composite3_idx on ruddersysevents (ruleid, serial, nodeid); > Sort (cost=24683.44..24693.07 rows=3849 width=252) (actual > time=60454.558..60474.013 rows=62403 loops=1) > Sort Key: executiontimestamp > Sort Method: external merge Disk: 17480kB > -> Bitmap Heap Scan on ruddersysevents (cost=450.12..24454.23 rows=3849 > width=252) (actual time=146.065..60249.143 rows=62403 loops=1) > Recheck Cond: ((ruleid = > '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (nodeid > = '31264061-5ecb-4891-9aa4-83824178f43d'::text)) > Filter: ((executiontimestamp >= to_timestamp('2012-11-22 > 16:00:16.005'::text, 'YYYY-MM-DD HH24:MI:SS.MS'::text)) AND > (executiontimestamp <= to_timestamp('2013-01-25 18:53:52.467'::text, > 'YYYY-MM-DD HH24:MI:SS.MS'::text))) > -> Bitmap Index Scan on composite3_idx (cost=0.00..449.15 > rows=6635 width=0) (actual time=129.102..129.102 rows=62403 loops=1) > Index Cond: ((ruleid = > '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND (nodeid > = '31264061-5ecb-4891-9aa4-83824178f43d'::text)) > Total runtime: 60484.022 ms > > > And with all these tests, the difference of the whole database processing > with each index time is within 10% error margin (2h10 to 2h25), even if most > of the time is spent on the SQL query. > > So my question is : > "Why *not* indexing the column which is not used makes the query slower over > time, while not slowing the application?" > > If you have some clues, I'm really interested. This query can be optimized to zero (assuming data in cache and not too many dead rows). create one index on nodeId, ruleId, serial, executiontimestamp columns and do: SELECT executiondate, nodeid, ruleid, directiveid, serial, component, keyValue, executionTimeStamp, eventtype, policy, msg FROM RudderSysEvents WHERE (nodeId, ruleId, serial, executiontimestamp) >= ( '31264061-5ecb-4891-9aa4-83824178f43d', '61713ff1-aa6f-4c86-b3cb-7012bee707dd', 10, to_timestamp('2012-11-22 16:00:16.005', 'YYYY-MM-DD HH24:MI:SS.MS')) AND (nodeId, ruleId, serial, executiontimestamp) <= ( '31264061-5ecb-4891-9aa4-83824178f43d', '61713ff1-aa6f-4c86-b3cb-7012bee707dd', 10, to_timestamp('2013-01-25 18:53:52.467', 'YYYY-MM-DD HH24:MI:SS.MS')) ORDER BY nodeId, ruleId, serial, executiontimestamp merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance