On 2023-02-14 17:04:51 -0500, Ben Chrobot wrote: > We have a large table (~470 million rows) with integer primary key id (not > null) on a Postgres 14.5 cluster. A third-party tool is attempting to perform a > SELECT-based full table copy in preparation for log-based sync with a query > like the following: > > SELECT "id", "other_column_a", "other_column_b", "created_at", "updated_at" > FROM "public"."my_large_table" > WHERE (("id" > ? OR "id" IS NULL)) AND (("id" <= ?)) > ORDER BY "id" LIMIT 50000; That doesn't make sense. ("id" <= ?) implies that ("id" IS NULL) is FALSE. So the where clause can be simplified to WHERE (("id" > ? OR FALSE)) AND (("id" <= ?)) and then WHERE (("id" > ?)) AND (("id" <= ?)) even without the knowledge that "id" is a primary key (and therefore can never be null). Even if the column could contain NULL values, those would never be selected. It could therefore be argued that the query as written is broken and should be fixed. OTOH it could also be argued that the optimizer should be able to perform the same simplifications as I did above and produce the same code for WHERE (("id" > ? OR "id" IS NULL)) AND (("id" <= ?)) as for WHERE (("id" > ?)) AND (("id" <= ?)). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature