Re: Sample data generator for performance testing

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Wed, 3 Jan 2024 10:31:20 -0800



    On 1/3/24 9:50 AM, arun chirappurath
      wrote:

    
            On Wed, 3 Jan, 2024, 23:03
              Adrian Klaver, <adrian.klaver@xxxxxxxxxxx>
              wrote:

            
            On
              1/3/24 09:24, arun chirappurath wrote:

              > Hi Adrian,

              > 

              > Thanks for your mail.

              > 

              > Is this for all tables in the database or a subset?
              Yes

              
              Yes all tables or yes just some tables?

              All tables.except some which has user details. 
          
        
              > 

              > Does it need to deal with foreign key relationships?
              No

              > 

              > What are the sizes of the existing data and what size
              sample data do you

              > want to produce?1Gb and 1Gb test data. 

              
              If the source data is 1GB and the test data is 1GB then
              there is no 

              sampling, you are using the data population in its
              entirety.

              
              Yes.would like to double the load and test.
          
        
    Does that mean you want to take the 1GB of your existing data and
      double it to 2GB while maintaining 
    the data distribution from the original data?

    
        Also do we have any standard methods for
          sampling and generating test data
      
    
    Something like?:
    

    https://www.postgresql.org/docs/current/sql-select.html
    

    "TABLESAMPLE sampling_method ( argument [, ...] ) [
          REPEATABLE ( seed )
          ]
    
      
        A TABLESAMPLE clause after a table_name indicates
          that the specified sampling_method
          should be used to retrieve a subset of the rows in that table.
          This sampling precedes the application of any other filters
          such as WHERE clauses. The
          standard PostgreSQL
          distribution includes two sampling methods, BERNOULLI and SYSTEM,
          and other sampling methods can be installed in the database
          via extensions 

        
      ...
      "
    
    Read the rest of the documentation for TABLESAMPLE to get the
      details.

    
              > 

              > On Wed, 3 Jan, 2024, 22:40 Adrian Klaver, <adrian.klaver@xxxxxxxxxxx
              

              > <mailto:adrian.klaver@xxxxxxxxxxx>>
              wrote:

              > 

              >     On 1/2/24 23:23, arun chirappurath wrote:

              >      > Hi All,

              >      >

              >      > Do we have any open source tools which can
              be used to create

              >     sample data

              >      > at scale from our postgres databases?

              >      > Which considers data distribution and
              randomness

              > 

              > 

              > 

              >      >

              >      > Regards,

              >      > Arun

              > 

              >     -- 

              >     Adrian Klaver

              >     adrian.klaver@xxxxxxxxxxx
              <mailto:adrian.klaver@xxxxxxxxxxx>

              > 

              
              -- 

              Adrian Klaver

              adrian.klaver@xxxxxxxxxxx