c - Calculate the range of double -
as part of exercise "the c programming language" trying find way calculate maximum possible float , maximum possible double on computer. technique shown below works float
s (to calculate max float) not double
:
// max float: float f = 1.0; float last_f; float step = 9.0; while(1) { last_f = f; f *= (1.0 + step); while (f == infinity) { step /= 2.0; f = last_f * (1.0 + step); } if (! (f > last_f) ) break; } printf("calculated float max : %e\n", last_f); printf("limits.h float max : %e\n", flt_max); printf("diff : %e\n", flt_max - last_f); printf("the expected value? : %s\n\n", (flt_max == last_f)? "yes":"no"); // max double: double d = 1.0; double last_d; double step_d = 9.0; while(1) { last_d = d; d *= (1.0 + step_d); while (d == infinity) { step_d /= 2.0; d = last_d * (1.0 + step_d); } if (! (d > last_d) ) break; } printf("calculated double max: %e\n", last_d); printf("limits.h double max : %e\n", dbl_max); printf("diff : %e\n", dbl_max - last_d); printf("the expected value? : %s\n\n", (dbl_max == last_d)? "yes":"no");
and results to:
calculated float max : 3.402823e+38 limits.h float max : 3.402823e+38 diff : 0.000000e+00 expected value? : yes calculated double max: 1.797693e+308 limits.h double max : 1.797693e+308 diff : 1.995840e+292 expected value? : no
it looks me still calculates using single precision in second case.
what missing?
op's approach works when calculations done wider precision float
in first case , wider double
in 2nd case.
in first case, op reports flt_eval_method == 0
float
calculations done float
, double
done double
. note float step ... 1.0 + step
double
calculation.
the below code forces calculation double , can replicate op's problem flt_evel_method==2
(use long double
internal calculations.)
volatile double d = 1.0; volatile double last_d; volatile double step_d = 9.0; while(1) { last_d = d; d *= (1.0 + step_d); while (d == infinity) { step_d /= 2.0; volatile double sum = 1.0 + step_d; d = last_d * sum; //d = last_d + step_d*last_d; } if (! (d > last_d) ) { break; } } diff : 1.995840e+292 expected value? : no
instead op should use following not form inexact sum of 1.0 + step_d
when step_d
small, rather forms exact product of step_d*last_d
. 2nd form results in more accurate calculation new d
, providing additional bit of calculation precision in d
. higher precision fp not needed employ op's approach.
d = last_d + step_d*last_d; diff : 0x0p+0 0.000000e+00 expected value? : yes
wiki
Comments
Post a Comment