Hey everyone --
I am debugging an issue with our Postgres machine running on EC2. We are experiencing slowness when retrieving about 14k rows from a larger table of 140MM rows. Initially I thought it was an indexing problem (doing VACUUM FULL reduced the index size from 12gb to 8gb), but the slowness persisted.
I created another table with only the subset of data we are interested in, and simply doing SELECT * on the table takes 21ms, as opposed to 2ms on my MBP. I examined the relationship between the length of the table scan on this table with 14032 rows and the query time, and I got these results:
Table "public.patrick_component"
Column | Type | Modifiers
------------------+---------+-----------
id | integer |
case_id | integer |
type_id | integer |
offset | integer |
length | integer |
internal_id | integer |
parent_id | integer |
right_sibling_id | integer |
# Rows MBP EC2
1 0.035 ms 0.076 ms
1 0.035 ms 0.076 ms
10 0.017 ms 0.048 ms
100 0.033 ms 0.316 ms
1000 0.279 ms 3.166 ms
10000 2.477 ms 31.006 ms
100000 4.375 ms 42.634 ms # there are fewer than 100k rows in the table; for some reason LIMIT is slower than without LIMIT
As such, I have decided that it's not an issue with the index. To me this looks disk caching related, however, the entire table is only 832k, which should be plenty small to fit entirely into memory (I also ran this multiple times and in reverse, and the results are the same).
100 0.033 ms 0.316 ms
1000 0.279 ms 3.166 ms
10000 2.477 ms 31.006 ms
100000 4.375 ms 42.634 ms # there are fewer than 100k rows in the table; for some reason LIMIT is slower than without LIMIT
As such, I have decided that it's not an issue with the index. To me this looks disk caching related, however, the entire table is only 832k, which should be plenty small to fit entirely into memory (I also ran this multiple times and in reverse, and the results are the same).
The machine has 30gb of memory for a 45g database. The machine's only purpose is for Postgres.
Here are the relevant performance tweaks I have made:
shared_buffers = 8448MB
work_mem = 100MB
maintenance_work_mem = 1024MB
wal_buffers = 8MB
effective_cache_size = 22303MB
I have been struggling to make these types of query fast because they are very common (basically fetching all of the metadata for a document, and we have a lot of metadata and a lot of documents). Any help is appreciated!
Thanks,
Patrick