On Wed, Jan 18, 2006 at 10:11:26PM -0500, Bruce Momjian wrote: > Glen Parker wrote: > > Tom Lane wrote: > > >>What ever happened to grouped heap reads, i.e. building a list of tuples > > >>from the index, sorting in heap order, then reading the heap in a batch? > > > > > > > > > Done in 8.1. I'm uncertain whether Scott knows about that ... > > > > That's GREAT news! Is that the "Bitmap Scan" item in the what's new > > list (http://www.postgresql.org/docs/whatsnew)? I didn't even notice it > > Yes. But note that some recent testing indicated that even if you read a file in sequential order, just skipping over random sections, as soon as you hit the point where you're reading ~5% of the file you might as well just read the entire thing, so the amount this helps may be questionable. The thread was about using block sampling instead of row sampling for analyze. I suspect the issue is that rotational delay is becomming just as 'damaging' as track-to-track seek delay. If that's true, the only way to improve things would be to order reads taking both track seek time and rotational position into account. Theoretically the drive could do this, though I don't know if any actually do. If my guess is correct then random reads may not be that much more expensive than a sequential read that skips large chunks of the file. This is because most files will cover a fairly small number of tracks, so head positioning time will be minimal compared to rotational delay. It would be interesting to modify the test code that was posted (see attached) so that it read randomly instead of just skipping random amounts. Just for grins, I just ran seqtest.c a number of times, using various percents and file sizes. Results also attached... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@xxxxxxxxxxxxx Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
#include <sys/types.h> #include <sys/stat.h> #include <sys/time.h> #include <time.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #define BLOCKSIZE 8192 int main(int argc, char *argv[], char *arge[]) { char *fn; int fd; int perc; struct stat statbuf; struct timeval tv1,tv2; off_t size, offset; char *buf[BLOCKSIZE]; int b_toread, b_toskip, b_read=0, b_skipped=0; long us; fn = argv[1]; perc = atoi(argv[2]); fd = open(fn, O_RDONLY); fstat(fd, &statbuf); size = statbuf.st_size; size = size/BLOCKSIZE*BLOCKSIZE; gettimeofday(&tv1, NULL); srandom(getpid()^tv1.tv_sec^tv1.tv_usec); b_toread = size/BLOCKSIZE*perc/100; b_toskip = size/BLOCKSIZE-b_toread; for(offset=0;offset<size;offset+=BLOCKSIZE) { if (random()%(b_toread+b_toskip) < b_toread) { lseek(fd, offset, SEEK_SET); read(fd, buf, BLOCKSIZE); b_toread--; b_read++; } else { b_toskip--; b_skipped++; } } gettimeofday(&tv2, NULL); us = (tv2.tv_sec-tv1.tv_sec)*1000000 + (tv2.tv_usec-tv1.tv_usec); fprintf(stderr, "Reading %d%% (%d/%d blocks %ld bytes) total time %ldus MB/s %.2f effective MB/s %.2f\n", perc, b_read, b_read+b_skipped, size, us, (double)b_read*BLOCKSIZE/us, (double)size/us ); exit(0); }
Run via... foreach file ( 10M 100M 1G 10G ) foreach? foreach pct ( 1 2 3 4 5 6 7 8 9 10 ) foreach? ../a.out $file $pct >> & test.txt foreach? ../clearmem 730; sleep 3 foreach? end foreach? end ..a.out is seqtest.c. clearmem is this: /* * $Id: clearmem.c,v 1.1 2003/06/29 20:41:33 decibel Exp $ * * Utility to clear out a chunk of memory and zero it. Useful for flushing disk buffers */ int main(int argc, char *argv[]) { if (!calloc(atoi(argv[1]), 1024*1024)) { printf("Error allocating memory.\n"); } } Reading 1% (12/1280 blocks 10485760 bytes) total time 74907us MB/s 1.31 effective MB/s 139.98 Reading 2% (25/1280 blocks 10485760 bytes) total time 133133us MB/s 1.54 effective MB/s 78.76 Reading 3% (38/1280 blocks 10485760 bytes) total time 187906us MB/s 1.66 effective MB/s 55.80 Reading 4% (51/1280 blocks 10485760 bytes) total time 158496us MB/s 2.64 effective MB/s 66.16 Reading 5% (64/1280 blocks 10485760 bytes) total time 176840us MB/s 2.96 effective MB/s 59.30 Reading 6% (76/1280 blocks 10485760 bytes) total time 170083us MB/s 3.66 effective MB/s 61.65 Reading 7% (89/1280 blocks 10485760 bytes) total time 184551us MB/s 3.95 effective MB/s 56.82 Reading 8% (102/1280 blocks 10485760 bytes) total time 173368us MB/s 4.82 effective MB/s 60.48 Reading 9% (115/1280 blocks 10485760 bytes) total time 196054us MB/s 4.81 effective MB/s 53.48 Reading 10% (128/1280 blocks 10485760 bytes) total time 178013us MB/s 5.89 effective MB/s 58.90 Reading 1% (128/12800 blocks 104857600 bytes) total time 855065us MB/s 1.23 effective MB/s 122.63 Reading 2% (256/12800 blocks 104857600 bytes) total time 1262796us MB/s 1.66 effective MB/s 83.04 Reading 3% (384/12800 blocks 104857600 bytes) total time 1569894us MB/s 2.00 effective MB/s 66.79 Reading 4% (512/12800 blocks 104857600 bytes) total time 1790379us MB/s 2.34 effective MB/s 58.57 Reading 5% (640/12800 blocks 104857600 bytes) total time 1808079us MB/s 2.90 effective MB/s 57.99 Reading 6% (768/12800 blocks 104857600 bytes) total time 3608305us MB/s 1.74 effective MB/s 29.06 Reading 7% (896/12800 blocks 104857600 bytes) total time 1845969us MB/s 3.98 effective MB/s 56.80 Reading 8% (1024/12800 blocks 104857600 bytes) total time 1877635us MB/s 4.47 effective MB/s 55.85 Reading 9% (1152/12800 blocks 104857600 bytes) total time 1961489us MB/s 4.81 effective MB/s 53.46 Reading 10% (1280/12800 blocks 104857600 bytes) total time 2604093us MB/s 4.03 effective MB/s 40.27 Reading 1% (1280/128000 blocks 1048576000 bytes) total time 8759689us MB/s 1.20 effective MB/s 119.70 Reading 2% (2560/128000 blocks 1048576000 bytes) total time 14724686us MB/s 1.42 effective MB/s 71.21 Reading 3% (3840/128000 blocks 1048576000 bytes) total time 16214562us MB/s 1.94 effective MB/s 64.67 Reading 4% (5120/128000 blocks 1048576000 bytes) total time 19091129us MB/s 2.20 effective MB/s 54.92 Reading 5% (6400/128000 blocks 1048576000 bytes) total time 22287356us MB/s 2.35 effective MB/s 47.05 Reading 6% (7680/128000 blocks 1048576000 bytes) total time 20911989us MB/s 3.01 effective MB/s 50.14 Reading 7% (8960/128000 blocks 1048576000 bytes) total time 26735770us MB/s 2.75 effective MB/s 39.22 Reading 8% (10240/128000 blocks 1048576000 bytes) total time 26432167us MB/s 3.17 effective MB/s 39.67 Reading 9% (11520/128000 blocks 1048576000 bytes) total time 23488590us MB/s 4.02 effective MB/s 44.64 Reading 10% (12800/128000 blocks 1048576000 bytes) total time 27407314us MB/s 3.83 effective MB/s 38.26 Reading 1% (12800/1280000 blocks 10485760000 bytes) total time 91128747us MB/s 1.15 effective MB/s 115.07 Reading 2% (25600/1280000 blocks 10485760000 bytes) total time 142194305us MB/s 1.47 effective MB/s 73.74 Reading 3% (38400/1280000 blocks 10485760000 bytes) total time 184682509us MB/s 1.70 effective MB/s 56.78 Reading 4% (51200/1280000 blocks 10485760000 bytes) total time 204736943us MB/s 2.05 effective MB/s 51.22 Reading 5% (64000/1280000 blocks 10485760000 bytes) total time 217606651us MB/s 2.41 effective MB/s 48.19 Reading 6% (76800/1280000 blocks 10485760000 bytes) total time 231965339us MB/s 2.71 effective MB/s 45.20 Reading 7% (89600/1280000 blocks 10485760000 bytes) total time 236010971us MB/s 3.11 effective MB/s 44.43 Reading 8% (102400/1280000 blocks 10485760000 bytes) total time 243517092us MB/s 3.44 effective MB/s 43.06 Reading 9% (115200/1280000 blocks 10485760000 bytes) total time 250622714us MB/s 3.77 effective MB/s 41.84 Reading 10% (128000/1280000 blocks 10485760000 bytes) total time 245938205us MB/s 4.26 effective MB/s 42.64 decibel@xxxxxx[8:51]~/tmp:47>dd if=10G of=/dev/null bs=8k 1280000+0 records in 1280000+0 records out 10485760000 bytes transferred in 253.023843 secs (41441786 bytes/sec) decibel@xxxxxx[9:01]~/tmp:48>dd if=10G of=/dev/null bs=1m 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 252.158686 secs (41583973 bytes/sec) decibel@xxxxxx[9:07]~/tmp:49>