Search Postgresql Archives

Re: Import large data set into a table and resolve duplicates?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eugene:

On Sun, Feb 15, 2015 at 6:36 PM, Eugene Dzhurinsky <jdevelop@xxxxxxxxx> wrote:
​...​
 
Since the "dictionary" already has an index on the "series", it seems that
patch_data doesn't need to have any index here.
​....
At this point "patch_data" needs to get an index on "already_exists = false",
which seems to be cheap.

​As I told you before, do not focus in the indexes too much. When you do bulk updates like this they tend to be much slower than a proper sort.

The reason is locality of reference. When you do the things with sorts you do two or three nicely ordered passes on the data, using full pages. When you use indexes you spend a lot of time parsing index structures and switching read-index, read-data, index, data, .... ( They are cached, but you have to switch to them anyway ). Also, with your kind of data indexes on series are going to be big, so less cache available​ for data.


As I said before, it depends on your data anyway, with the current machines this day what I'll do with this problem would be to just make a program ( in perl, seems adequate for this ), copy dictionary to client memory and just read the patch spitting the result file and inserting the needed lines along the way, seems it should fit in 1Gb without problems, which is not much by today standards.

Regards.
Francisco Olarte.




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux