Re: How different is AWS-RDS postgres?

Sam Gendler <sgendler@xxxxxxxxxxxxxxxx> · Thu, 27 May 2021 15:31:18 -0700

The docs for the connection pool you are using: https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html and a screenshot for the relevant configuration params.

On Thu, May 27, 2021 at 3:24 PM Rob Sargent <robjsargent@xxxxxxxxx> wrote:

    On 5/27/21 4:10 PM, Ron wrote:

      On 5/27/21 4:58 PM, Rob Sargent wrote:

        On 5/27/21 3:08 PM, Sam Gendler
          wrote:

          The same JDBC connection that is resulting in
            lost data?  Sounds to me like you aren't connecting to the
            DB you think you are connecting to.

        I almost wish that were true.

        However, looking at AWS "Performance Insights" is see the sql 
        statements generate by my app begin executed on the server.  Not
        coincidentally this is from the "Top SQL (10)/Load by waits"
        view.  Now that view does NOT, in point of fact, name the
        database in which the sql is running, but the rest of my
        environment pretty much rules out silently switching tracks on
        myself.  I have to read from the correct database, using a UUID,
        to get data to analyze, then save the results of the analysis
        back. I'm using my wonderful webapp to run this and I've
        successfully analyzed and stored result for small starting data
        sets.

        I just notice the "Top database" tab on that screen:  there is
        only one and it's the correct one.

        I've reproduced the behaviour.  I'm pretty convinced it a size
        thing, but which part of the system is being max'd out is not
        clear.  Not CPU, but that's the only resource the "Performance
        Insight" mentions (suggesting this UI wasn't designed by a
        database person).

        The loss of the staging table is most spectacular.  It filled
        from a file which has 7.5M records. It's clear in the tomcat
        logs that is was created and written to, one record read.  Then
        the "top sql" happens: 

        insert
            into segment select * from
            bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as
            s where s.probandset_id >= ? and s.probandset_id < ?

        the "bulk" table is the staging table. The params are filled in
        with a quasi-uuid which grab roughly 1/16th of the data.  In the
        stack trace on my tomcat server I get

        Caused by: org.jooq.exception.DataAccessException:
          SQL [insert into segment select * from
          bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s
          where  s.probandset_id >=
          '30000000-0000-0000-0000-00000000\

          0000' and s.probandset_id <
          '40000000-0000-0000-0000-000000000000' ]; An I/O error
          occurred while sending to the backend.

        So this would have been the fourth such insert statement read
        from that staging table.

        That table is not deleted by my code.  It is renamed after the
        last insert into segment, by appending "_done" to the name.  But
        that table, by either name, is nowhere to be seen on the server.

        For those scoring at home, the trouble in the tomcat log start
        with

        0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7]
          INFO edu.utah.camplab.jx.PayloadFromMux -

      Is there any way to replicate this in psql?

    I have the json file which feeds the staging table and I have code
    which is designed to load such files.  I suppose an sql file with
    those 16 insert statements would approximate the the apps call.  Let
    me give that a shot.

      -- 

        Angular momentum makes the world go 'round.

Attachment:
Screen Shot 2021-05-27 at 3.29.47 PM.png

Description: PNG image