Search Postgresql Archives

Re: [HACKERS] No heap lookups on index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 18, 2006 at 10:11:26PM -0500, Bruce Momjian wrote:
> Glen Parker wrote:
> > Tom Lane wrote:
> > >>What ever happened to grouped heap reads, i.e. building a list of tuples 
> > >>from the index, sorting in heap order, then reading the heap in a batch? 
> > > 
> > > 
> > > Done in 8.1.  I'm uncertain whether Scott knows about that ...
> > 
> > That's GREAT news!  Is that the "Bitmap Scan" item in the what's new 
> > list (http://www.postgresql.org/docs/whatsnew)?  I didn't even notice it 
> 
> Yes.

But note that some recent testing indicated that even if you read a file
in sequential order, just skipping over random sections, as soon as you
hit the point where you're reading ~5% of the file you might as well
just read the entire thing, so the amount this helps may be
questionable. The thread was about using block sampling instead of row
sampling for analyze.

I suspect the issue is that rotational delay is becomming just as
'damaging' as track-to-track seek delay. If that's true, the only way to
improve things would be to order reads taking both track seek time and
rotational position into account. Theoretically the drive could do this,
though I don't know if any actually do.

If my guess is correct then random reads may not be that much more
expensive than a sequential read that skips large chunks of the file.
This is because most files will cover a fairly small number of tracks,
so head positioning time will be minimal compared to rotational delay.
It would be interesting to modify the test code that was posted (see
attached) so that it read randomly instead of just skipping random
amounts.

Just for grins, I just ran seqtest.c a number of times, using various
percents and file sizes. Results also attached...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@xxxxxxxxxxxxx
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <time.h>
#include <fcntl.h>
#include <unistd.h>

#include <stdio.h>
#include <stdlib.h>

#define BLOCKSIZE 8192

int main(int argc, char *argv[], char *arge[]) 
{
  char *fn;
  int fd;
  int perc;
  struct stat statbuf;
  struct timeval tv1,tv2;
  off_t size, offset;
  char *buf[BLOCKSIZE];
  int b_toread, b_toskip, b_read=0, b_skipped=0;
  long us;

  fn = argv[1];
  perc = atoi(argv[2]);

  fd = open(fn, O_RDONLY);
  fstat(fd, &statbuf);
  size = statbuf.st_size;
  
  size = size/BLOCKSIZE*BLOCKSIZE;
  
  gettimeofday(&tv1, NULL);

  srandom(getpid()^tv1.tv_sec^tv1.tv_usec);

  b_toread = size/BLOCKSIZE*perc/100;
  b_toskip = size/BLOCKSIZE-b_toread;

  for(offset=0;offset<size;offset+=BLOCKSIZE) {
    if (random()%(b_toread+b_toskip) < b_toread) {
      lseek(fd, offset, SEEK_SET);
      read(fd, buf, BLOCKSIZE);
      b_toread--;
      b_read++;
    } else {
      b_toskip--;
      b_skipped++;
    }
  }
  
  gettimeofday(&tv2, NULL);
  
  us = (tv2.tv_sec-tv1.tv_sec)*1000000 + (tv2.tv_usec-tv1.tv_usec);
  
  fprintf(stderr,
	  "Reading %d%% (%d/%d blocks %ld bytes) total time %ldus MB/s %.2f effective MB/s %.2f\n",
	  perc,
	  b_read, b_read+b_skipped, size,
	  us,
	  (double)b_read*BLOCKSIZE/us,
	  (double)size/us
	  );
  exit(0);
}

Run via...

foreach file ( 10M 100M 1G 10G )
foreach? foreach pct ( 1 2 3 4 5 6 7 8 9 10 )
foreach? ../a.out $file $pct >> & test.txt
foreach? ../clearmem 730; sleep 3
foreach? end
foreach? end

..a.out is seqtest.c. clearmem is this:

/*
 * $Id: clearmem.c,v 1.1 2003/06/29 20:41:33 decibel Exp $
 *
 * Utility to clear out a chunk of memory and zero it. Useful for flushing disk buffers
 */

int main(int argc, char *argv[]) {
    if (!calloc(atoi(argv[1]), 1024*1024)) { printf("Error allocating memory.\n"); }
}

Reading 1% (12/1280 blocks 10485760 bytes) total time 74907us MB/s 1.31 effective MB/s 139.98
Reading 2% (25/1280 blocks 10485760 bytes) total time 133133us MB/s 1.54 effective MB/s 78.76
Reading 3% (38/1280 blocks 10485760 bytes) total time 187906us MB/s 1.66 effective MB/s 55.80
Reading 4% (51/1280 blocks 10485760 bytes) total time 158496us MB/s 2.64 effective MB/s 66.16
Reading 5% (64/1280 blocks 10485760 bytes) total time 176840us MB/s 2.96 effective MB/s 59.30
Reading 6% (76/1280 blocks 10485760 bytes) total time 170083us MB/s 3.66 effective MB/s 61.65
Reading 7% (89/1280 blocks 10485760 bytes) total time 184551us MB/s 3.95 effective MB/s 56.82
Reading 8% (102/1280 blocks 10485760 bytes) total time 173368us MB/s 4.82 effective MB/s 60.48
Reading 9% (115/1280 blocks 10485760 bytes) total time 196054us MB/s 4.81 effective MB/s 53.48
Reading 10% (128/1280 blocks 10485760 bytes) total time 178013us MB/s 5.89 effective MB/s 58.90
Reading 1% (128/12800 blocks 104857600 bytes) total time 855065us MB/s 1.23 effective MB/s 122.63
Reading 2% (256/12800 blocks 104857600 bytes) total time 1262796us MB/s 1.66 effective MB/s 83.04
Reading 3% (384/12800 blocks 104857600 bytes) total time 1569894us MB/s 2.00 effective MB/s 66.79
Reading 4% (512/12800 blocks 104857600 bytes) total time 1790379us MB/s 2.34 effective MB/s 58.57
Reading 5% (640/12800 blocks 104857600 bytes) total time 1808079us MB/s 2.90 effective MB/s 57.99
Reading 6% (768/12800 blocks 104857600 bytes) total time 3608305us MB/s 1.74 effective MB/s 29.06
Reading 7% (896/12800 blocks 104857600 bytes) total time 1845969us MB/s 3.98 effective MB/s 56.80
Reading 8% (1024/12800 blocks 104857600 bytes) total time 1877635us MB/s 4.47 effective MB/s 55.85
Reading 9% (1152/12800 blocks 104857600 bytes) total time 1961489us MB/s 4.81 effective MB/s 53.46
Reading 10% (1280/12800 blocks 104857600 bytes) total time 2604093us MB/s 4.03 effective MB/s 40.27
Reading 1% (1280/128000 blocks 1048576000 bytes) total time 8759689us MB/s 1.20 effective MB/s 119.70
Reading 2% (2560/128000 blocks 1048576000 bytes) total time 14724686us MB/s 1.42 effective MB/s 71.21
Reading 3% (3840/128000 blocks 1048576000 bytes) total time 16214562us MB/s 1.94 effective MB/s 64.67
Reading 4% (5120/128000 blocks 1048576000 bytes) total time 19091129us MB/s 2.20 effective MB/s 54.92
Reading 5% (6400/128000 blocks 1048576000 bytes) total time 22287356us MB/s 2.35 effective MB/s 47.05
Reading 6% (7680/128000 blocks 1048576000 bytes) total time 20911989us MB/s 3.01 effective MB/s 50.14
Reading 7% (8960/128000 blocks 1048576000 bytes) total time 26735770us MB/s 2.75 effective MB/s 39.22
Reading 8% (10240/128000 blocks 1048576000 bytes) total time 26432167us MB/s 3.17 effective MB/s 39.67
Reading 9% (11520/128000 blocks 1048576000 bytes) total time 23488590us MB/s 4.02 effective MB/s 44.64
Reading 10% (12800/128000 blocks 1048576000 bytes) total time 27407314us MB/s 3.83 effective MB/s 38.26
Reading 1% (12800/1280000 blocks 10485760000 bytes) total time 91128747us MB/s 1.15 effective MB/s 115.07
Reading 2% (25600/1280000 blocks 10485760000 bytes) total time 142194305us MB/s 1.47 effective MB/s 73.74
Reading 3% (38400/1280000 blocks 10485760000 bytes) total time 184682509us MB/s 1.70 effective MB/s 56.78
Reading 4% (51200/1280000 blocks 10485760000 bytes) total time 204736943us MB/s 2.05 effective MB/s 51.22
Reading 5% (64000/1280000 blocks 10485760000 bytes) total time 217606651us MB/s 2.41 effective MB/s 48.19
Reading 6% (76800/1280000 blocks 10485760000 bytes) total time 231965339us MB/s 2.71 effective MB/s 45.20
Reading 7% (89600/1280000 blocks 10485760000 bytes) total time 236010971us MB/s 3.11 effective MB/s 44.43
Reading 8% (102400/1280000 blocks 10485760000 bytes) total time 243517092us MB/s 3.44 effective MB/s 43.06
Reading 9% (115200/1280000 blocks 10485760000 bytes) total time 250622714us MB/s 3.77 effective MB/s 41.84
Reading 10% (128000/1280000 blocks 10485760000 bytes) total time 245938205us MB/s 4.26 effective MB/s 42.64

decibel@xxxxxx[8:51]~/tmp:47>dd if=10G of=/dev/null bs=8k
1280000+0 records in
1280000+0 records out
10485760000 bytes transferred in 253.023843 secs (41441786 bytes/sec)
decibel@xxxxxx[9:01]~/tmp:48>dd if=10G of=/dev/null bs=1m
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 252.158686 secs (41583973 bytes/sec)
decibel@xxxxxx[9:07]~/tmp:49>

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux