On 09/11/2017 16:46, Vincent Lefevre wrote: > On 2017-11-09 10:01:45 +0100, Mason wrote: > >>> On 08/11/17 23:13 +0200, Nil Geisweiller wrote: >>> >>>> The following >>>> >>>> --------------------------------------------------------------------- >>>> #include <iostream> >>>> #include <cmath> >>>> >>>> int main() >>>> { >>>> double v = 4.6; >>>> std::cout << "v = " << v << std::endl; >>>> std::cout << "v*100 = " << v*100 << std::endl; >>>> std::cout << "floor(v*100) = " << std::floor(v*100) << std::endl; >>>> } >>>> --------------------------------------------------------------------- >>>> >>>> outputs >>>> >>>> ------------------ >>>> v = 4.6 >>>> v*100 = 460 >>>> floor(v*100) = 459 >>>> ------------------ >>>> >>>> It that a bug? >> >> There is a boring and excruciatingly long paper floating (ha!) around: >> "What Every Computer Scientist Should Know About Floating-Point Arithmetic" >> https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html >> >> This other site tries to be more accessible: >> http://floating-point-gui.de/ >> >> The short version is that it is not possible to represent 0.1 >> (or any multiple thereof) in (fractional) base 2. > > Actually, that's mainly a language design bug, which doesn't show > the error for 460 (which is representable exactly). It's not clear to me exactly /what/ you are calling a language design bug? And in what language? C? IEEE 754? (AFAIK, C is pretty loose with the floating point spec.) >> Basically, using "IEEE 754 double-precision binary floating-point format" >> 4.6 is approximated as D=4.5999999999999996447286321199499070644378662109375 >> ( https://en.wikipedia.org/wiki/Double-precision_floating-point_format ) >> >> Thus, it is obvious why floor(D*100) equals 459. > > No, this is not obvious. The multiplication by 100 introduces a > rounding error. Thus this doesn't prove that the rounded value > D*100 is strictly less than 460. Let's take 1.7 for example. In IEEE double precision, it is represented as 3ff'b333333333333, i.e. 1 + 0.5 + sum(n=1..13, 3/16^n) = D0 1.7 * 2 = 400'b333333333333 = D1 1.7 * 8 = 402'b333333333333 = D3 D0 = 0x1.b333333333333p0; D1 = 0x1.b333333333333p1; D3 = 0x1.b333333333333p3; Scaling D1 to p3 D1 = 0x0.6ccccccccccccp3 0x1.b333333333333p3 0x0.6ccccccccccccp3 ------------------- 0x2.1ffffffffffffp3 0x2.1ffffffffffffp3 = 0x1.0fffffffffffffp4 which would be rounded up to 0x1.1p4 = 17 (exact representation) OK so the error between 17 and D1+D3 is too small to represent in a double, thus the sum is rounded up. However, if D1+D3 is stored in a longer intermediate type, such as x87 long double, then the error can be represented. Floating point is full of unintuitive surprises. $ cat fp.bc scale=60 sum=0 for (n = 1; n <= 13; n = n + 1) sum = sum + 3 / 16^n res = 1 + 0.5 + sum res res * 10 17 - res * 10 $ bc < fp.bc 1.699999999999999955591079014993738383054733276367187500000000 16.999999999999999555910790149937383830547332763671875000000000 0.000000000000000444089209850062616169452667236328125000000000 $ cat fp.c #include <stdio.h> volatile double d = 1.7; int main(void) { long double ld = (long double)d * 10; printf("%.60f %016llx\n", d, *(unsigned long long *)&d); d = d * 10; printf("%.60f %016llx\n", d, *(unsigned long long *)&d); printf("%.60Lf\n", ld); return 0; } $ gcc-7 -O2 -ffloat-store fp.c && ./a.out 1.699999999999999955591079014993738383054733276367187500000000 3ffb333333333333 17.000000000000000000000000000000000000000000000000000000000000 4031000000000000 16.999999999999999555910790149937383830547332763671875000000000 Regards.