Re: How to do faster DML

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Wed, 14 Feb 2024 11:13:15 -0800



    On 2/14/24 10:11 AM, veem v wrote:

    
            On Wed, 14 Feb, 2024,
              10:30 am Ron Johnson, <ronljohnsonjr@xxxxxxxxx>
              wrote:

            
                On Tue, Feb 13, 2024 at 4:17 PM veem v
                  <veema0000@xxxxxxxxx>
                  wrote:

                
                  [sni[] 
                  
                    
                      One question here, if we
                        have defined one column as a fixed length data
                        type "integer" and slowly we noticed the length
                        of data keeps increasing (in case of a sequence
                        generated PK column which will keep increasing),
                        and we want to alter the column to "bigint" now.
                        In such scenario, will it append/pad the
                        additional spaces to all the existing values
                        which were already stored  with integer type
                        initially in the table. And that would be then
                        an increase to the existing table storage.
                        Please correct me if I'm wrong.

                      
                  ALTER TABLE foo ALTER
                      COLUMN bar BIGINT; rewrites the whole
                    table.  Do yourself a favor, and start with BIGINT.

                  
        Not yet tried, but that's interesting. I was
          initially thinking as it will be increasing the length, so it
          would be just a metadata change and finish within seconds.
      
    
    It depends:
    https://www.postgresql.org/docs/current/sql-altertable.html

    
    "Adding a column with a volatile DEFAULT
      or changing the type of an existing column will require the entire
      table and its indexes to be rewritten. As an exception, when
      changing the type of an existing column, if the USING clause does not change the column
      contents and the old type is either binary coercible to the new
      type or an unconstrained domain over the new type, a table rewrite
      is not needed. However, indexes must always be rebuilt unless the
      system can verify that the new index would be logically equivalent
      to the existing one. For example, if the collation for a column
      has been changed, an index rebuild is always required because the
      new sort order might be different. However, in the absence of a
      collation change, a column can be changed from text
      to varchar (or vice versa) without
      rebuilding the indexes because these data types sort identically.
      Table and/or index rebuilds may take a significant amount of time
      for a large table; and will temporarily require as much as double
      the disk space.
    "
    

    create table int_test(int_fld integer);
    insert into int_test select * from generate_series(1, 10000, 1);
    

    select ctid, int_fld from int_test ;

    
    ctid   | int_fld 

      ----------+---------

       (0,1)    |       1

       (0,2)    |       2

       (0,3)    |       3

       (0,4)    |       4

       (0,5)    |       5

       (0,6)    |       6

       (0,7)    |       7

       (0,8)    |       8

       (0,9)    |       9

       (0,10)   |      10
    

    alter table int_test alter column int_fld set data type bigint;
    select ctid, int_fld from int_test ;
      ctid   | int_fld 

      ----------+---------

       (0,1)    |       1

       (0,2)    |       2

       (0,3)    |       3

       (0,4)    |       4

       (0,5)    |       5

       (0,6)    |       6

       (0,7)    |       7

       (0,8)    |       8

       (0,9)    |       9

       (0,10)   |      10

    
    update int_test set  int_fld = int_fld;
    select ctid, int_fld from int_test  order by int_fld; 

    
    (63,1)    |       1

       (63,2)    |       2

       (63,3)    |       3

       (63,4)    |       4

       (63,5)    |       5

       (63,6)    |       6

       (63,7)    |       7

       (63,8)    |       8

       (63,9)    |       9

       (63,10)   |      10
    

    Where ctid is:
    https://www.postgresql.org/docs/current/ddl-system-columns.html
    

    "
    
      ctid 

      
        The physical location of the row version within its table.
          Note that although the ctid
          can be used to locate the row version very quickly, a row's ctid will change if it is updated
          or moved by VACUUM FULL.
          Therefore ctid is useless as
          a long-term row identifier. A primary key should be used to
          identify logical rows.
      
      "

      
         But as you mentioned, it seems to be the effect
          of "fixed length data type" which is why it's going to rewrite
          whole table even we just increases the column length. Hope it
          won't be the case in variable length data type. 
        
          
    -- 
Adrian Klaver
adrian.klaver@xxxxxxxxxxx