The Persistence of Vision Raytracer (POVRay).
This is the legacy Bug Tracking System for the POVRay project. Bugs listed here are being migrated to our github issue tracker. Please refer to that for new reports or updates to existing ones on this system.
FS#272  Minor change, significant speedup in cubic polynomial solver
Opened by Simon (infoised)  Friday, 22 February 2013, 20:54 GMT
Last edited by William F Pokorny (wfpokorny)  Tuesday, 14 February 2017, 13:42 GMT

DetailsWhile familiarizing myself with the code, I found some small changes in the solve_cubic function that lead to a significant speedup. In my experience, “pow” is by far the slowest function in math.h and replacing it with simpler functions usually makes a tremendous impact on the speed (it’s an order of magnitude slower than sqrt/exp/cbrt/log). solve_cubic has a “pow” function that can be replaced by cbrt (cubic root), which is standard in ISOC99 and should be available on all systems. Separate benchmarks of solve_cubic function show this change almost doubles the speed and does not lower the accuracy. As solve_cubic is part of the solution of quartic equation, this improves the speed for many primitives. Testing with a scene containing many torus intersection tests (attached below) I still observed almost 10% speedup (Intel, 4 threads, 2 hyperthreaded cores, antialiasing on, 600×600: from 91 to 84 seconds). And this is for a torus, where a lot of time is spent in the solve_quartic and cubic solver is only called once! Similar speedup should be expected for prism, ovus, sor and blob. I do believe the cubic solver can be done without trigonometry, but that would mean changing the algorithm, introducing new bugs and requiring a lot of testing. However, the trigonometric evaluation can still be simplified (3% speedup in full torus benchmark). These changes don’t affect the algorithm at all, they are mathematically identical to the existing code, so the changes can be applied immediately. I also included other changes just as suggestions. Every change is commented and marked with [SC 2.2013]. This sadly does not speedup the sturm solver, which uses bisection and regulafalsi and looks very optimized already. The test scene I used has a lot of torus intersections from various directions (shadow rays, main rays, transmitted rays). 
Thanks for your investigation.
Unfortunately, while cbrt is indeed C99 standard, it is not C++03 standard, and is indeed unavailable on some C++ compilers (such as MS Visual C++ 2010, which is one of our main target compilers).
I understand.
It may be worth checking for its existence in ./configure (there are standard macros for that) and have a fallback like
#ifndef HAVE_CBRT_SET
inline DBL cbrt(DBL x){
return pow(x,1/3.0);
}
#endif
I used this exact same trick in another project, with a configure line
AC_CHECK_FUNC([cbrt],AC_DEFINE([HAVE_CBRT_SET],[1],[c99 cbrt function is available.]))
The trigonometric changes work in any case (but the speedup is smaller).
I'm putting this on my todo list for version 3.7.1, as we've already frozen 3.7.0 except for bugfixes. The good news is that this opens up the possibility to do a more radical change of the algorithm, so if you feel like trying to get rid of the trig functions your help would be greatly appreciated.
Well technically, trigs are unavoidable if you want a closedform solution. For double precision, I thought some iterative formulas (like 2 steps of Newton method from a reasonable initial approximation) would be quicker, but I tested some options and they are at approximately the same speed (and they are also less readable, which we don't want in this code). As we only need 3 trig functions (acos, cos and sin), it's almost as good as you can go. The main slowdown really comes from pow() in the case of a single real root.
However, I found that we can split the speed of sturm algorithm in half. The algorithm itself is ok, but the initial bracket (0,MAX_DISTANCE) is horrible  you need around 20 bisections (and regulafalsi suffers as well) to get from MAX_DISTANCE to normal distances of order 1 to 100. There are simple formulas that set the upper bound on roots, and the simplest one that I tried immediately improved the speed. On my test scene, it went from 233s to 119s.
I will keep investigating the bounding algorithm to see if I can further improve the performance, and test if it breaks anything (it shouldn't, but just in case). Ideally, the sturm solver should be quick enough to completely replace the quartic formula, we just need to be smart.
I know that this part of povray code is very old and does not rely on external libraries. As polynomial solvers are available in standard open libraries (such as boost), we could consider using them in the far future.
That's good news indeed.
Do you have numbers for a comparison of quartic solver vs. your improvement to the sturm solver?
Probably, yes. Using a wellestablished external library is prone to get us code that is better tested, better optimized, and better analyzed with respect to precision issues, and we're already using part of boost anyway.
Numbers:
In torus scene:
old sturm: 233s
new sturm: 116s
quartic: 83s
1000000 randomly generated polynomials of fourth order with coefficients between 50 and 50 (with symmetric initial bracket, so it finds all of them, not just the positive  for better comparison):
old sturm: 3.205s
new sturm: 0.830s
quartic: 0.339s
This test is not entirely realistic: in random polynomials, coefficients are more unpredictable and it is common to have no roots at all (average of 0.4 real roots per polynomial). In reality, you get less cases without real roots.
Now tracked on github as issue #236.