On Wed, Feb 16, 2022 at 08:11 PM David G. Johnston wrote:
You said they are more or less the same. Problems like these tend to
hide in the "less" portion of the inequality.
On of the virtualized servers was created as a clone of the other one
(using VMware to clone the VM). So, basically, these are very equal. Of
course, they diverged over time.
Focusing on PostgreSQL, here are the differences of postgresql.conf,
comparing testing system and production system:
< work_mem = 8MB # min 64kB
---
> work_mem = 4MB # min 64kB
417c417
< #log_statement = 'all' # none, ddl, mod, all
---
> #log_statement = 'none' # none, ddl, mod, all
Both PostgreSQL server have the same roles and users, that same
extensions installed and no preloaded libraries.
Given that this isn't working as expected it doesn't make for a great
testing system. Install and initdb 14.2 on this machine and let's see
what PostgreSQL produces.
The testing system runs since 2015. I don't know whether the problem was
present from the beginning. But I don't think so, as we also have
"correct" double values in that database. Now, since "binary equality"
of the double precision values is a new requirement, we started to
notice, that (at least not) newly added (UPDATEd) values, e. g. 1.56 are
not binary equal to already present values:
Table abc, column xyz: currently (before) 1.56
UPDATE abc SET xyz = 1.56;
Table abc, column xyz: after 1.559999999
We have a trigger, that monitors such updates and it reports a changed
value, which is not correct. The problem is, that the assignment
SET xyz = 1.56
actually means
SET xyz = 1.559999999
since getting a double value from the string 1.56 yields 1.559999999.
Yes, moving to the latest PostgreSQL version might fix that error.
However, this is a customer's testing system. Actually, it is intended
to be reinstalled with Ubuntu 22.04 LTS which brings PostgreSQL 14. But
prior to that, we need to complete a project on the testing system that
requires that "binary equality" of double values.
What is the precise version of libc that is installed for one. Exact
ESX releases too.
Both VM servers run on
ESXi 6 (correct behavior)
ESXi 6.5 (misbehaving)
All machines use libc version 2.19 (libc-2.19.so).
This isn't really all that interesting a report for the project if it
only exists in one ancient system that cannot be experimented with.
Maybe it's a faulty register on that machine's CPU. There is more
double-checking and comparing that can be done here but it seems
unlikely to be productive. It is more plausible that the snowflake
machine in question just has issues and needs to be retired. Installing
a newer version of PostgreSQL on it before junking it is about the right
amount of experimental effort.
I just wanted to ask whether someone knows something about this or has
ever heard about such a behavior. You say, the snowflake machine has
issues... I don't believe in hardware issues, since it runs in VMware
and likely on many different CPUs. Isn't it more than unlikely that such
a constantly occurring error is caused by one faulty CPU (among that
many CPUs an ESX server typically has)?
And, keep in mind that strtod function works as expected from a simply C
testing program. I guess that the parsed double's value gets modified
somewhere in PostgreSQL after strtod was called. However, I do not yet
see where and why. I was hoping that someone of you could help.
Carsten