Search Postgresql Archives

Re: Moving delta data faster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/3/24 22:24, yudhi s wrote:

On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver <adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>> wrote:

    On 4/3/24 20:54, yudhi s wrote:
     > On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver
    <adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>
     > <mailto:adrian.klaver@xxxxxxxxxxx
    <mailto:adrian.klaver@xxxxxxxxxxx>>> wrote:
     >
     >    > Thank you Adrian.
     >
     > And one thing i forgot to mention this target postgresql database
    would
     > be on AWS RDS whereas the source Oracle databases is on premise.
    I think
     > we don't have the FDW extension currently in place but we can get
    that.
     > I am just not able to understand clearly  though, but do you mean
    export
     > the data from source using CSV and do truncate and import on
    target. And
     > as these data will be moved through the network won't that cause
    slowness?
     >
     > The source database here is Oracle database. Correct me if wrong, it
     > looks like foreign data wrapper is like a DB link. Or do you mean
     > writing a query on the target database (which can be UPSERT or
    MERGE)
     > but will be joining the table from the source database through the
     > DBlink/DDW? But my question was whether we should use UPSERT or
    MERGE
     > for comparing and loading the delta records to the target postgresql
     > database. Want to understand which is more performant , as I see
    in the
     > past Merge having performance issues in the past, but not very sure
     > about that.

    My motivation was to get some basic information about your setup and
    what you are trying to achieve.

    If I understand correctly you have:

    1) An Oracle database with tables that you want to copy the complete
    data from to a Postgres database. For this sort of thing
    COPY(https://www.postgresql.org/docs/current/sql-copy.html
    <https://www.postgresql.org/docs/current/sql-copy.html>) on the
    Postgres end using CSV data generated from the source is probably the
    quickest bulk load method.

    2) After the initial load you want to do follow up INSERT/UPDATEs based
    on a delta of the source tables relative to the initial load. This is
    still a bit of mystery to me. How are determining the delta: a) On the
    source end entirely or b) Target relative to source? Also what is the
    anticipated size of the delta per transfer?

    Additional information needed:

    1) Network distance between source and target?

    2) Network capacity?

    3) Expected load on both source and target servers from other
    operations?


Thank you. Actually I was trying to understand how to cater the delta load after the one time load is done . The delta change in records is planned to be found based on the primary keys on the tables. If it found the key it will update the records if it does not find the keys it will insert the rows.

Basically the select query from the source database will fetch the data with a certain time interval(based on the latest update timestamp or create timestamp if they are available or else full dump) and put it on S3 and then from the S3 it will be picked and gets merged to the target postgres database. As upsert and merge both were looking similar , so was wondering what we should use here for loading the delta records?

S3 is not a database. You will need to be more specific about '... then from the S3 it will be picked and gets merged to the target postgres database.'

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux