Le 31/07/2019 à 18:02, Amol Tarte a écrit :
Hi,
Before using impressive commercial
products you can try some additional configuration with Ora2pg.
The only solution to improve data
migration performances is to use parallelism. The problem with
BLOB is that they have to be converted into hex to be inserted
into a bytea. The internal function in Ora2Pg that responsible of
this job is _escape_lob(). The other problem is that Oracle is
very slow to send the BLOB to Ora2Pg, I don't know if commercial
products are better at this point but I have challenged Ora2Pg
with Kettle some years ago without do much differences. So what
you should do first is to set the following in your ora2pg.conf:
This will force Oracle to send the full
content of the BLOB in a single pass otherwise it will use small
chunks. The only drawback is that you have to set LONGREADLEN to
the highest BLOB size in your table to not throw and LONGTRUNC
error.
That also mean that for each ROW
returned DBD::Oracle (Perl driver for Oracle) will allocate at
least LONGREADLEN in memory and we want to extract as much rows as
possible. You have understood that your 8GB of memory will limit
the quantity of rows that can be exported at the same time.
The other point is that Oracle is slow
so you have to parallelize data export. Use -J 4 at command line
to create 4 simultaneous process to data export. Parallelization
on Oracle side is only possible if you have a numeric column that
can be used to split the data using modulo 4 in this case. This is
a basic implementation but it is enough in most cases.
Converting BLOB to Bytea consume lot of
cpu cycle too so it is a good practice to parallelize this work
too. Use -j 2 or -j 3 for this work. The number of parallelisation
process should be tested because there is a limit where you will
not win anything.
If you have, let's say 32GB of memory
and 12 cpu you could try a command like :
ora2pg -c ora2pg.conf -J 4 -j 3 -t
CLIENT_DB_AUDIT_LOG -L 500
If you have less resources don't forget
that -J and -j must be multiplied to have the number of process
that Ora2Pg will parallelize. The -L option (DATA_LIMIT) is used
to reduce the number of row extracted at a time. Here with a value
of 500 it will process 50 (DATA_LIMIT/10) rows with BLOB at a
time. If the table do not have any BLOB it will use 500 row at a
time. For most tables this parameter should be set to 10000 up to
250000. If you have lot of memory the value can be higher. If you
think it is too low you can set BLOB_LIMIT in ora2pg.conf to set
it at a higher value.
However Ora2Pg will show you the data
migration speed so you can adjust all these parameters to see if
you have some performances gains. If you want to know exactly at
which speed Oracle is able to send the data add --oracle_speed to
the ora2pg command. Ora2Pg will only extract data from Oracle,
there will be no bytea transformation or data writing, just the
full Oracle speed. You can do some test with the value of the -J
option to see what is the best value. On the other side you can
use --ora2pg_speed option to see at which speed Ora2Pg is able to
convert the data, nothing will be written too. Use it to know if
you have some win with the value of the -j option. Don't forget to
do some additional test with the BLOB_LIMIT value to see if there
some more improvement. If someone can prove me that they have
better performances at Oracle data extraction side I will be
pleased to look at this code.
I hope this will help.
Regards,
-- Gilles Darold http://www.darold.net/ |