Task Description
While familiarizing myself with the code, I found some small changes in the solve_cubic function that lead to a significant speedup.
In my experience, “pow” is by far the slowest function in math.h and replacing it with simpler functions usually makes a tremendous impact on the speed (it’s an order of magnitude slower than sqrt/exp/cbrt/log).
solve_cubic has a “pow” function that can be replaced by cbrt (cubic root), which is standard in ISOC99 and should be available on all systems. Separate benchmarks of solve_cubic function show this change almost doubles the speed and does not lower the accuracy. As solve_cubic is part of the solution of quartic equation, this improves the speed for many primitives. Testing with a scene containing many torus intersection tests (attached below) I still observed almost 10% speedup (Intel, 4 threads, 2 hyperthreaded cores, antialiasing on, 600×600: from 91 to 84 seconds). And this is for a torus, where a lot of time is spent in the solve_quartic and cubic solver is only called once! Similar speedup should be expected for prism, ovus, sor and blob.
I do believe the cubic solver can be done without trigonometry, but that would mean changing the algorithm, introducing new bugs and requiring a lot of testing. However, the trigonometric evaluation can still be simplified (3% speedup in full torus benchmark).
These changes don’t affect the algorithm at all, they are mathematically identical to the existing code, so the changes can be applied immediately. I also included other changes just as suggestions. Every change is commented and marked with [SC 2.2013].
This sadly does not speedup the sturm solver, which uses bisection and regulafalsi and looks very optimized already.
The test scene I used has a lot of torus intersections from various directions (shadow rays, main rays, transmitted rays).
