Re: Why could different data in a table be processed with different performance?

Fabio Pardi <f.pardi@xxxxxxxxxxxx> · Tue, 25 Sep 2018 11:14:47 +0200



    On 25/09/18 00:28, Vladimir Ryabtsev
      wrote:

    
        > it is not unusual to have 1GB cache or more...  and do
          not forget to drop the cache between tests + do a sync

          I conducted several long runs of dd, so I am sure that this
          numbers are fairly correct. However, what worries me is that I
          test sequential read speed while during my experiments
          Postgres might need to read from random places thus reducing
          real read speed dramatically. I have a feeling that this can
          be the reason.

          I also reviewed import scripts and found the import was done
          in DESCENDING order of IDs. It was so to get most recent
          records sooner, may be it caused some inefficiency in the
          storage... But again, it was so for both ranges.

          
          > - how big is your index? 

          pg_table_size('articles_pkey') = 1561 MB

          
          > - how big is the table?

            pg_table_size('articles') = 427 GB

            pg_table_size('pg_toast.pg_toast_221558') = 359 GB

            
    Since you have a very big toast table, given you are using spinning
    disks, I think that increasing the block size will bring benefits.
    (Also partitioning is not a bad idea.)

    
    If my understanding of TOAST is correct, if data will fit blocks of
    let's say 16 or 24 KB then one block retrieval from Postgres will
    result in less seeks on the disk and less possibility data gets
    sparse on your disk. (a very quick and dirty calculation, shows your
    average block size is 17KB)

    
    One thing you might want to have a look at, is again the RAID
    controller and your OS. You might want to have all of them aligned
    in block size, or maybe have Postgres ones a multiple of what OS and
    RAID controller have.

    
          > - given the size of shared_buffers, almost 2M blocks
            should fit, but you say 2 consecutive runs still are hitting
            the disk. That's strange indeed since you are using way more
            than 2M blocks.
          TBH, I cannot say I understand your calculations with
            number of blocks... 
        
      
    shared_buffers = 15GB IIRC (justpaste link is gone)

    
    15 * 1024 *1024 = 15728640 KB

    
    using 8KB blocks = 1966080 total blocks

    
    if you query shared_buffers you should get the same number of total
    available blocks

    
          But to clarify: consecutive runs with SAME parameters do
            NOT hit the disk, only the first one does, consequent ones
            read only from buffer cache.
          

    I m  a bit confused.. every query you pasted contains 'read':

    
            Buffers: shared hit=50
      read=2378

      
      and 'read' means you are reading from disk (or OS cache). Or not?
      

          > - As Laurenz suggested (VACUUM FULL), you might want
            to move data around. You can try also a dump + restore to
            narrow the problem to data or disk

            I launched VACUUM FULL, but it ran very slowly, according to
            my calculation it might take 17 hours. I will try to do copy
            data into another table with the same structure or spin up
            another server, and let you know.
          

    cool, that should also clarify if the reverse order matters or not

    
          > - You might also want to try to see the disk graph
            of Windows, while you are running your tests. It can show
            you if data (and good to know how much) is actually fetching
            from disk or not.

            I wanted to do so but I don't have access to Hyper-V server,
            will try to request credentials from admins.
          

          Couple more observations:
          1) The result of my experiment is almost not affected by
            other server load. Another user was running a query (over
            this table) with read speed ~130 MB/s, while with my query
            read at 1.8-2 MB/s.

          
          2) iotop show higher IO % (~93-94%) with slower read
            speed (though it is not quite clear what this field is). A
            process from example above had ~55% IO with 130 MB/s while
            my process had ~93% with ~2MB/s.
          

    I think because you are looking at 'IO' column which indicates (from
    manual) '..the percentage of  time  the  thread/process  spent [..] 
    while waiting on I/O.'

    
          Regards,
          Vlad
          

    regards,

    
    fabio pardi