96 kHz.org
Advanced Audio Recording

Advantages of FPGAs in the field of Audio Processing

At first sight, audio processing appears to be an easy task in modern signal processing since the sample rates are mostly low. A more closer look discovers that because of the desired accuracy (16 Bits and more) quite a significant demand of calculation power is required to fulfill peoples needs regarding signal processing such as with physical modeling. To understand the principles a deeper look into the architecture of FPGAs is required:

Advantages of FPGAs in Signal Processing

FPGAs can perform basic calculations such as MUL / SUM and decision very much quicker than Microcontrollers and also FPGAs can process many "tasks" parallely in real time. For example, a current DSP operating at 120MHz performs a 2nd order differential equation describing a sine oscillation in about half of 1 us using sequential processing, variable treatment and RAM access, while an FPGA each single step of an equation can be processed parallely leading to a so called pipeline where all resources are free again to be used in the next clock cycle. So, many channels / voices / cases can be processed. There will be only a latency of  dedicated number of clock cycles. As long as the result of the calculation is not required to continue with the processing directly, tasks can thus be processed much more effectively in total. Although the basic system frequency might be higher with DSPs or CPUs, FPGAs can easily become much quicker than e.g. DSP solution. FPGAs are appropriate mostly for applications which require parallel processing. The more channels required - the better is the utilization on an FPGA.

The subsequent example shows a timing comparison for both FPGA and DSP for a 128 TAP sequential filter (equalizer):

 

Comparison of FPGAs and DSPs - equalizer example

 

Here the DSP (left side of the table) needs 12 clock cycles to process one sample and it's corresponding coefficient. All TAPs of the filter are processed at 120 MHz leading to enough speed to be ready within the period given by the 48kHz sampling frequency and have time for further operations. More TAPs required more operation frequency of the DSPs. Unlike that the FPGA consists of combinatorial logic forming deciders and calculation modules which all could do operate simultaneously. This only leads to a latency of 10 clock cycles including RAM wait states required for this clock speed. The final result of 2400000 time steps is based from the fact that all actions are done in parallel, only 128 clock cycles + latency are required.

The common issue with FPGAs is the latency problem with internal elements like multipliers and adders, which might be not quick enough to complete their operation during one clock cycle. This can be solved by partial parallelization like show here for a multiplier structure:

Comparison of FPGAs and DSP - pipelining methods - Juergen Schuhmacher

Pipelining method to increase throughput at a DSP system in FPGAs.

 

Conclusion

FPGAs typically run at lower speeds than DSPs when synthesis constraints are set that way that a balanced tradeoff between speed and area is focused where not too many additional FFs will have to be added in order to achieve the desired system frequency. Usually, this is about 3 times lower. On the other hand, FPGAs do process many operations within one single step where DSPs need 2 or more and thus come closer again to the DSPs in final data operation speed. However a ratio of 1:2 might persist at this point of time, which is valid for total sequential operations.

But there is room for improvement: Because of full pipelined operation any residing clock cycle which is not required to complete the total number of operations of the channel which have to be done during one sample period can be used to generate more channels in simple pipelined systems. For fully pipelined systems, the latency has no effect anymore on the resulting number of channels. Only a further set of variables / signals is required for this, so balancing the pipeline delay with the architecture width is required. Tweaking the internal architecture that way, that complex operations like filtering are done the parallel way, saves pipeline delay and latency and increases the used area only moderately, where doubling the number of voices in a DSP system requires up to the doubled operation frequency.

See the audio dsp synthesizer for an example of a pipelined system.

 

Read the former article

 

© 2009 - Jürgen Schuhmacher