Re: how to make duplicate finding query faster?

Sachin Kumar <sachinkumaras@xxxxxxxxx> · Wed, 30 Dec 2020 18:54:14 +0530

Hi Scott,
Yes, I am checking one by one because my goal is to fail the whole upload if there is any duplicate entry and to inform the user that they have a duplicate entry in the file.

Regards
Sachin

On Wed, Dec 30, 2020 at 6:43 PM Scott Ribe <scott_ribe@xxxxxxxxxxxxxxxx> wrote:
> On Dec 30, 2020, at 12:36 AM, Sachin Kumar <sachinkumaras@xxxxxxxxx> wrote:

> 

> Hi All,

> 

> I am uploading data into PostgreSQL using the CSV file and checking if there is any duplicates value in DB it should return a duplicate error.  I am using below mention query.

> 

> if Card_Bank.objects.filter( Q(ACCOUNT_NUMBER=card_number) ).exists(): 

>         flag=2

>       else:

>         flag=1

> it is taking too much time i am using 600k cards in CSV.

> 

> Kindly help me in making the query faster.

> 

> I am using Python, Django & PostgreSQL.

> -- 

> 

> Best Regards, 

> Sachin Kumar

Are you checking one-by-one because your goal is not to fail the whole upload that contains the duplicates, but rather to skip only the duplicates?

If that's the case, I think you'd be better off copying the CSV straight into a temp table, using a join to delete duplicates from it, then insert the remainder into the target table, and finally drop the temp table.

-- 

Best Regards, 
Sachin Kumar