Re: Partition table in 9.0.x?

AJ Weber <aweber@xxxxxxxxxxx> · Sun, 06 Jan 2013 10:27:01 -0500



    All fair questions...

    
    Thank you for your detailed response!

    
    On 1/4/2013 11:03 PM, Jeff Janes wrote:
    On Friday, January 4, 2013, AJ Weber wrote:

      Hi all,

        
        I have a table that has about 73mm rows in it and growing.  
      

      How big is the table in MB?  Its indexes?
    
    Not sure on this.  Will see if pgAdmin tells me.

    
        ...

      
        The server has 12GB RAM, 4 cores, but is shared with a big
        webapp running in Tomcat -- and I only have a RAID1 disk to work
        on.  Woes me...

        
      By a RAID1 disk, do you mean two disks in a RAID1
        configuration, or a single RAID1 composed of an unspecified
        number of disks?
      

      Often spending many thousands of dollars in DBA time can save
        you from having to buy many hundreds of dollars in hard drives.
        :)  On the other hand, often you end up having to buy the extra
        disks anyway afterall.
      

    I mean I have two disks in a RAID1 configuration.  The server is
    currently in a whitebox datacenter and I have zero control over the
    hardware, so adding disks is unfortunately out of the question.  I
    completely understand the comment, and would love to have a larger
    SAN available to me that I could configure...I just don't and have
    no way of getting one anytime soon.

    
        Anyway, this table is going to continue to grow, and it's used
        frequently (Read and Write). 
      

      Are all rows in the table read and written with equal vigor,
        or are there hot rows and cold rows that can be recognized based
        on the row's values?
    
    No, I could probably figure out a way to setup an "archive" or
    "older" section of the data that is updated much less frequently. 
    Deletes are rare.  Inserts/Updates "yes".  Select on existing rows
    -- very frequent.

    
       From what I
        read, this table is a candidate to be partitioned for
        performance and scalability.  I have tested some scripts to
        build the "inherits" tables with their constraints and the
        trigger/function to perform the work.

        
        Am I doing the right thing by partitioning this? 
      

      Probably not.  Or at least, you haven't given us the
        information to know.  Very broadly speaking, well-implemented
        partitioning makes bulk loading and removal operations take less
        IO, but makes normal operations take more IO,  or if lucky
        leaves it unchanged.  There are exceptions, but unless you can
        identify a very specific reason to think you might have one of
        those exceptions, then you probably don't.
    
    I know you can't believe everything you read, but I thought I saw
    some metrics about when a table's size exceeds some fraction of
    available RAM, or when it approaches 100mm rows, it's a big
    candidate for partitioning.

    
      Do you have a natural partitioning key?  That is, is there a
        column (or _expression_) which occurs as a selective component in
        the where clause of almost all of your most io consuming SQL and
        DML?  If so, you might benefit from partitioning on it.  (But in
        that case, you might be able to get most of the benefits of
        partitioning, without the headaches of it, just by revamping
        your indexes to include that column/_expression_ as their leading
        field).
      

      If you don't have a good candidate partitioning key, then
        partitioning will almost surely make things worse.
      

    The table is a "detail table" to its master records.  That is, it's
    like an order-details table where it will have a 1-n rows joined to
    the master ("order") table on the order-id.  So I can partition it
    based on the order number pretty easily (which is a bigint, btw).

    
         If so, and I can afford some downtime, is dumping the table via
        pg_dump and then loading it back in the best way to do this?

      
      To do efficient bulk loading into a partitioned table, you
        need to specifically target each partition, rather than
        targeting with a trigger.  That pretty much rules out pg_dump,
        AFAIK, unless you are going to parse the dump file(s) and
        rewrite them.
      

        Should I run a cluster or vacuum full after all is done?

      
      Probably not.  If a cluster after the partitioning would be
        beneficial, there would be a pretty good chance you could do a
        cluster *instead* of the partitioning and get the same benefit.
         
      
    I did try clustering the table on the PK (which is actually 4
    columns), and it appeared to help a bit.  I was hoping partitioning
    was going to help me even more.

    
      If you do some massive deletes from the parent table as part
        of populating the children, then a vacuum full of the parent
        could be useful.  But if you dump the parent table, truncate it,
        and reload it as partitioned tables, then vacuum full would
        probably not be useful.
      

      Really, you need to identify your most resource-intensive
        queries before you can make any reasonable decisions.
      

        Is there a major benefit if I can upgrade to 9.2.x in some way
        that I haven't realized?

      
      If you have specific queries that are misoptimized and so are
        generating more IO than they need to, then upgrading could help.
         On the other hand, it could also make things worse, if a
        currently well optimized query becomes worse. 
      

    Is there some new feature or optimization you're thinking about with
    this comment?  If so, could you please just send me a link and/or
    feature name and I'll google it myself?

    
      But, instrumentation has improved in 9.2 from 9.0, so
        upgrading would make it easier to figure out just which queries
        are really bad and have the most opportunity for improvement.  A
        little well informed optimization might obviate the need for
        either partitioning or more hard drives.
      

    This is interesting too.  I obviously would like the best available
    options to tune the database and the application.  Is this detailed
    in the release notes somewhere, and what tools could I use to take
    advantage of this?  (Are there new/improved details included in the
    EXPLAIN statement or something?)

    
        Finally, if anyone has any comments about my settings listed
        above that might help improve performance, I thank you in
        advance.

      
      Your default statistics target seemed low.  Without knowing
        the nature of your most resource intensive queries or how much
        memory tomcat is using, it is hard to say more.
    
    Tomcat uses 4G of RAM, plus we have nginx in front using a little
    and some other, smaller services running on the server in addition
    to the usual Linux gamut of processes.

    
      Cheers,
      

      Jeff