Apology for sending you emails directly but I do see you guys responding on email related to performance so thought of copying you folks.
Folks,
I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.
"citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using
lower
to get case-insensitive matching."Here is what I have done
drop table test;
drop table testci;
CREATE TABLE test (
id INTEGER PRIMARY KEY,
name character varying(254)
);
CREATE TABLE testci (
id INTEGER PRIMARY KEY,
name citext
);
INSERT INTO test(id, name)
SELECT generate_series(1000001,2000000), (md5(random()::text));
INSERT INTO testci(id, name)
SELECT generate_series(1,1000000), (md5(random()::text));
Now, I have done sequential search
explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
- Plan:
Node Type: "Seq Scan"
Parallel Aware: false
Relation Name: "test"
Alias: "test"
Startup Cost: 0.00
Total Cost: 23334.00
Plan Rows: 5000
Plan Width: 37
Actual Startup Time: 0.016
Actual Total Time: 680.199
Actual Rows: 1
Actual Loops: 1
Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
Rows Removed by Filter: 999999
Planning Time: 0.045
Triggers:
Execution Time: 680.213
explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
- Plan:
Node Type: "Seq Scan"
Parallel Aware: false
Relation Name: "testci"
Alias: "testci"
Startup Cost: 0.00
Total Cost: 20834.00
Plan Rows: 1
Plan Width: 37
Actual Startup Time: 0.017
Actual Total Time: 1184.485
Actual Rows: 1
Actual Loops: 1
Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
Rows Removed by Filter: 999999
Planning Time: 0.029
Triggers:
Execution Time: 1184.496
You can see sequential searches with lower working twice as fast as citext.
Now I added index on citext and equivalent functional index (lower) on text.
CREATE INDEX textlowerindex ON test (lower(name));
create index textindex on test(name);
Index creation took longer with citext v/s creating lower functional index.
Now here comes execution with indexes
explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
- Plan:
Node Type: "Bitmap Heap Scan"
Parallel Aware: false
Relation Name: "test"
Alias: "test"
Startup Cost: 187.18
Total Cost: 7809.06
Plan Rows: 5000
Plan Width: 37
Actual Startup Time: 0.020
Actual Total Time: 0.020
Actual Rows: 1
Actual Loops: 1
Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
Rows Removed by Index Recheck: 0
Exact Heap Blocks: 1
Lossy Heap Blocks: 0
Plans:
- Node Type: "Bitmap Index Scan"
Parent Relationship: "Outer"
Parallel Aware: false
Index Name: "textlowerindex"
Startup Cost: 0.00
Total Cost: 185.93
Plan Rows: 5000
Plan Width: 0
Actual Startup Time: 0.016
Actual Total Time: 0.016
Actual Rows: 1
Actual Loops: 1
Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
Planning Time: 0.051
Triggers:
Execution Time: 0.035
explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
- Plan:
Node Type: "Index Scan"
Parallel Aware: false
Scan Direction: "Forward"
Index Name: "citextindex"
Relation Name: "testci"
Alias: "testci"
Startup Cost: 0.42
Total Cost: 8.44
Plan Rows: 1
Plan Width: 37
Actual Startup Time: 0.049
Actual Total Time: 0.050
Actual Rows: 1
Actual Loops: 1
Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
Rows Removed by Index Recheck: 0
Planning Time: 0.051
Triggers:
Execution Time: 0.064