c - Calculate the range of double -
as part of exercise "the c programming language" trying find way calculate maximum possible float , maximum possible double on computer. technique shown below works floats (to calculate max float) not double:
// max float: float f = 1.0; float last_f; float step = 9.0; while(1) { last_f = f; f *= (1.0 + step); while (f == infinity) { step /= 2.0; f = last_f * (1.0 + step); } if (! (f > last_f) ) break; } printf("calculated float max : %e\n", last_f); printf("limits.h float max : %e\n", flt_max); printf("diff : %e\n", flt_max - last_f); printf("the expected value? : %s\n\n", (flt_max == last_f)? "yes":"no"); // max double: double d = 1.0; double last_d; double step_d = 9.0; while(1) { last_d = d; d *= (1.0 + step_d); while (d == infinity) { step_d /= 2.0; d = last_d * (1.0 + step_d); } if (! (d > last_d) ) break; } printf("calculated double max: %e\n", last_d); printf("limits.h double max : %e\n", dbl_max); printf("diff : %e\n", dbl_max - last_d); printf("the expected value? : %s\n\n", (dbl_max == last_d)? "yes":"no"); and results to:
calculated float max : 3.402823e+38 limits.h float max : 3.402823e+38 diff : 0.000000e+00 expected value? : yes calculated double max: 1.797693e+308 limits.h double max : 1.797693e+308 diff : 1.995840e+292 expected value? : no it looks me still calculates using single precision in second case.
what missing?
op's approach works when calculations done wider precision float in first case , wider double in 2nd case.
in first case, op reports flt_eval_method == 0 float calculations done float , double done double. note float step ... 1.0 + step double calculation.
the below code forces calculation double , can replicate op's problem flt_evel_method==2 (use long double internal calculations.)
volatile double d = 1.0; volatile double last_d; volatile double step_d = 9.0; while(1) { last_d = d; d *= (1.0 + step_d); while (d == infinity) { step_d /= 2.0; volatile double sum = 1.0 + step_d; d = last_d * sum; //d = last_d + step_d*last_d; } if (! (d > last_d) ) { break; } } diff : 1.995840e+292 expected value? : no instead op should use following not form inexact sum of 1.0 + step_d when step_d small, rather forms exact product of step_d*last_d. 2nd form results in more accurate calculation new d, providing additional bit of calculation precision in d. higher precision fp not needed employ op's approach.
d = last_d + step_d*last_d; diff : 0x0p+0 0.000000e+00 expected value? : yes wiki
Comments
Post a Comment