Search Postgresql Archives

Re: Moving delta data faster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/3/24 20:54, yudhi s wrote:
On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver <adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>> wrote:

    On 4/3/24 13:38, yudhi s wrote:
     >   Hi All,
     >   It's postgresql database version 15.4. We have a requirement in
    which
     > we will be initially moving full table data for 3-4 tables, from
    source
     > database to target(i.e. postgres) . Maximum number of rows will be
     > ~10million rows in those tables. Then subsequently these rows
    will be
     > inserted/updated based on the delta number of rows that got
     > inserted/updated in the source database. In some cases these changed
     > data can flow multiple times per day to the downstream i.e. postgres
     > database and in other cases once daily.

    What is the source database?

    Can it be reached with a FDW?:

    https://wiki.postgresql.org/wiki/Foreign_data_wrappers
    <https://wiki.postgresql.org/wiki/Foreign_data_wrappers>

    Can the delta on the source be output as CSV?



Thank you Adrian.

And one thing i forgot to mention this target postgresql database would be on AWS RDS whereas the source Oracle databases is on premise. I think we don't have the FDW extension currently in place but we can get that. I am just not able to understand clearly  though, but do you mean export the data from source using CSV and do truncate and import on target. And as these data will be moved through the network won't that cause slowness?

The source database here is Oracle database. Correct me if wrong, it looks like foreign data wrapper is like a DB link. Or do you mean writing a query on the target database (which can be UPSERT or MERGE) but will be joining the table from the source database through the DBlink/DDW? But my question was whether we should use UPSERT or MERGE for comparing and loading the delta records to the target postgresql database. Want to understand which is more performant , as I see in the past Merge having performance issues in the past, but not very sure about that.

My motivation was to get some basic information about your setup and what you are trying to achieve.

If I understand correctly you have:

1) An Oracle database with tables that you want to copy the complete data from to a Postgres database. For this sort of thing COPY(https://www.postgresql.org/docs/current/sql-copy.html) on the Postgres end using CSV data generated from the source is probably the quickest bulk load method.

2) After the initial load you want to do follow up INSERT/UPDATEs based on a delta of the source tables relative to the initial load. This is still a bit of mystery to me. How are determining the delta: a) On the source end entirely or b) Target relative to source? Also what is the anticipated size of the delta per transfer?

Additional information needed:

1) Network distance between source and target?

2) Network capacity?

3) Expected load on both source and target servers from other operations?

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux