From: Crt Mori > Sent: 20 December 2017 17:30 > To: David Laight ... > > I think this version works. > > It doesn't have the optimisation for small values. > > > > unsigned int sqrt64(unsigned long long x) > > { > > unsigned int x_hi = x >> 32; > > > > unsigned int b = 0; > > unsigned int y = 0; > > unsigned int i; > > > > for (i = 0; i < 32; i++) { > > b <<= 2; > > b |= x_hi >> 30; > > x_hi <<= 2; > > if (i == 15) > > x_hi = x; > > y <<= 1; > > if (b > y) > > b -= ++y; > > } > > return y; > > } > > Wow, thanks. This seems like a major change. Not really, it is just doing it slightly differently. The algorithm is the 'schoolboy' one that used to be taught at school. (Although I found it in a boot from the 1930s.) I compared the object code for amd64 (doesn't need to reload 'x_hi' half way through) against the original loop. They are much the same size, execution speed will depend the lengths of the dependency chains. > I did a quick run through unit tests for the sensor and the results > are way off. On the sensor I had to convert double calculations to > integer calculations and target was to get end result within 0.02 degC > (with previous approximate sqrt implementation) in sensor working > range. This now gets into 3 degC delta at least and some are way off. > It might be off because of some scaling on the other hand during the > equation (not exactly comparing sqrt implementations here). > > I will need a bit more testing of this to see if it is replaceable. You probably need to put values through both (all 3) functions to see where you are getting errors. It might be rounding, or there might be a fubar somewhere. David ��.n��������+%������w��{.n�����{��(��)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥