The actual thing that might be good to see is the query plan (explain). It is commonly regarded an issue to select ‘*’, in many cases only a subset of the rows are needed, but I don’t know your exact case. If a limited number of columns are actually needed from the table, it might help to create an index which has got all the columns in the index, either directly for the index, or included with the index. This is called a covering index, and could prevent the need to read the actual table, which is visible by the row source 'index only scan’. But that potential can only be assessed by looking at the explain output. A covering index needs the visibility map to be recent for the blocks, otherwise a table visit must be done to get the latest tuple state. This can be done by vacuuming. When your query is as efficient as it can be, there are two things left. One is that blocks in the database buffer cache that are not frequently accessed will age out in favour of blocks that are accessed more recently. On the operating system, the same mechanism takes place, postgres reads data buffered, which means the operating system caches the IOs for the database blocks too. This means that if you query data that is stored in blocks that are not recently used, these will not be present in the database cache, and not in the operating system cache, and thus require a physical IO from disk to be obtained. If the amount of blocks relative to the caches is modest, another execute of the same SQL can take advantage, and thus result in much lower latency. You describe the query to be using a timestamp. If the timestamp moves forward in time, and the amount of data is equal over time, then the latency for the two scenario’s should remain stable. If the amount of data increases over time, and thus more blocks are needed to be read because more rows are stored that needs scanning to get a result, then the latency will increase. Frits Hoogland
|