On 25/09/18 00:28, Vladimir Ryabtsev
wrote:
> it is not unusual to have 1GB cache or more... and do
not forget to drop the cache between tests + do a sync
I conducted several long runs of dd, so I am sure that this
numbers are fairly correct. However, what worries me is that I
test sequential read speed while during my experiments
Postgres might need to read from random places thus reducing
real read speed dramatically. I have a feeling that this can
be the reason.
I also reviewed import scripts and found the import was done
in DESCENDING order of IDs. It was so to get most recent
records sooner, may be it caused some inefficiency in the
storage... But again, it was so for both ranges.
> - how big is your index?
pg_table_size('articles_pkey') = 1561 MB
> - how big is the table?
pg_table_size('articles') = 427 GB
pg_table_size('pg_toast.pg_toast_221558') = 359 GB
Since you have a very big toast table, given you are using spinning
disks, I think that increasing the block size will bring benefits.
(Also partitioning is not a bad idea.)
If my understanding of TOAST is correct, if data will fit blocks of
let's say 16 or 24 KB then one block retrieval from Postgres will
result in less seeks on the disk and less possibility data gets
sparse on your disk. (a very quick and dirty calculation, shows your
average block size is 17KB)
One thing you might want to have a look at, is again the RAID
controller and your OS. You might want to have all of them aligned
in block size, or maybe have Postgres ones a multiple of what OS and
RAID controller have.
> - given the size of shared_buffers, almost 2M blocks
should fit, but you say 2 consecutive runs still are hitting
the disk. That's strange indeed since you are using way more
than 2M blocks.
TBH, I cannot say I understand your calculations with
number of blocks...
shared_buffers = 15GB IIRC (justpaste link is gone)
15 * 1024 *1024 = 15728640 KB
using 8KB blocks = 1966080 total blocks
if you query shared_buffers you should get the same number of total
available blocks
But to clarify: consecutive runs with SAME parameters do
NOT hit the disk, only the first one does, consequent ones
read only from buffer cache.
I m a bit confused.. every query you pasted contains 'read':
Buffers: shared hit=50
read=2378
and 'read' means you are reading from disk (or OS cache). Or not?
> - As Laurenz suggested (VACUUM FULL), you might want
to move data around. You can try also a dump + restore to
narrow the problem to data or disk
I launched VACUUM FULL, but it ran very slowly, according to
my calculation it might take 17 hours. I will try to do copy
data into another table with the same structure or spin up
another server, and let you know.
cool, that should also clarify if the reverse order matters or not
> - You might also want to try to see the disk graph
of Windows, while you are running your tests. It can show
you if data (and good to know how much) is actually fetching
from disk or not.
I wanted to do so but I don't have access to Hyper-V server,
will try to request credentials from admins.
Couple more observations:
1) The result of my experiment is almost not affected by
other server load. Another user was running a query (over
this table) with read speed ~130 MB/s, while with my query
read at 1.8-2 MB/s.
2) iotop show higher IO % (~93-94%) with slower read
speed (though it is not quite clear what this field is). A
process from example above had ~55% IO with 130 MB/s while
my process had ~93% with ~2MB/s.
I think because you are looking at 'IO' column which indicates (from
manual) '..the percentage of time the thread/process spent [..]
while waiting on I/O.'
regards,
fabio pardi
|