Re: Vacuum Verbose output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 31 October 2005 22:59, Tom Lane wrote:
> Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> writes:
> > On Mon, 2005-10-31 at 16:34, Tomeh, Husam wrote:
> >> Pre-allocating space will prevent extending the datafile during
> >> loading massive data (batch processing) and would improve the overall
> >> batch write performance.
> >
> > Have you got any file system benchmarks that back up this assertion?  I
> > would love to see something that shows one way or the other if that
> > really makes any difference.
>
> Barring some pretty solid evidence, you're unlikely to attract any
> enthusiasm among pghackers for this sort of thing.  We are generally
> disinclined to reinvent functionality that properly belongs to the
> kernel or filesystem layer.  "Oracle does it" cuts no ice in this
> connection, because Oracle is designed around a twenty-year-old
> assumption that the database is smarter than the kernel, and the world
> has changed a lot since then.
>
> In short: show us some numbers that prove this is worth our attention.
>

I'm not terribly excited about the idea, but it might be worth hearing a 
better argument. (FWIW I think this is somewhat debunkable too, but it gives 
one something to think about)

"PostgreSQL unlike other commercial databases does not allow database files to 
pregrow to certain sizes. So if you are loading multiple tables via different 
connections there are two things that hurts scalability: One is the semaphore 
locking which it needs to perform IO to the database files and second is file 
fragmentation since it creates all tables in the same file system and grows 
them as needed. So if both the tables are loaded then both files are growing 
at "same" time which typically is seralized as blocks are allocated to each 
of the file one at a time which means they will be dispersed and not 
contiguous. How this hurts? Well if you do total row scans and compare the 
time you can easily huge degradations. (I have seen about 50% degradations). 
This means you have to load 1 table at a time. However if there was a way to 
increase the space for the tables (pre-grown them) then it will be a bit 
easier to load multiple tables simultaneously. (Of course the semaphore 
problem is still there and that needs to be more granular also). Duh.. I 
forgot the workaround here.. TABLESPACES are finally available in PostgreSQL 
8. But semaphore problems are still existing and pre-growing files will still 
help a lot since "growing" the files will be in your "1" process connection 
timeline. "

taken from an interesting post at 
http://blogs.sun.com/roller/page/jkshah?anchor=postgres_what_needs_to_be

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux