Codec2, being a sound codec, makes frequent use of FFTs. Here's the result of gprof running over c2dec:
% cumulative self self total time seconds seconds calls us/call us/call name 83.52 3.65 3.65 28116 129.82 129.82 four1a 7.78 3.99 0.34 11246 30.23 160.05 aks_to_H 3.43 4.14 0.15 5623 26.68 156.50 aks_to_M2 2.75 4.26 0.12 599720 0.20 0.20 sample_log_amp 1.37 4.32 0.06 11246 5.34 135.15 synthesise ...You can see that four1a, the FFT routine, is taking up 3.65 seconds, and the total turns out to be 4.37! So the FFT is a great candidate for improvement, and happily the current code uses a very slow FFT, the one from Numerical Recipes in C.
With a bit of work, I swapped in the kiss FFT code. This is by no means top-notch in the speed derby, but it should be better. It has the advantage of being small and simple. Here's the same process, on the same machine using that code:
% cumulative self self total time seconds seconds calls us/call us/call name 49.43 1.30 1.30 28116 46.24 46.24 kf_work 15.97 1.72 0.42 11246 37.35 92.12 aks_to_H 9.13 1.96 0.24 28116 8.54 54.77 four1 6.46 2.13 0.17 5623 30.23 85.01 aks_to_M2 6.08 2.29 0.16 11246 14.23 69.00 synthesise ...Total time turns out to be 2.63 sec. We have to be careful to charge both kf_work and four1 against FFT, because the 'four1' routine is my bridging code. Still, this is quite the improvement.
On decode, anyway, FFT remains the big time-hog. We can improve it further, by about another 50%, if we treat all transforms as entailing real numbers. kiss FFT has a real number routine on board, but I'm having trouble interfacing this with Dave's existing code. Something's just not aligning properly.
Finally, the gold standard in C code FFT speed appears to be the FFTW library. Implementing this should improve matters greatly again, although its interface seems a bit hairy. After that, it's hardware DSP code. The N800, for example, actually has a DSP engine which could be engaged to this purpose quite well, but I can't find if or where a FFT routine is exposed usefully. In any case, the N800 is a pretty old machine: what barely works on it should do just fine on more recent hardware.