FPGA Pro Contra - VEMS wiki www.vems.hu

FPGAProContra (2004-04-18 14:10:49)

Targeted for developers

FPGA Technology Primer: FPGA stands for Field Programmable Gate Array and refers to today's high density Programmable Logic (PL) Chips.

Visit the ChipTechnologyPrimer to learn how to recognize and categorize the available generations of Integrated Circuits.

This page discusses how to split a controlling task into subtasks that better be done in a microcontroller and subtasks that should go into a FPGA

Where are FPGA considered? - everywhere.

The GenBoard is a working (for more than a year), now a manufacturable powerful implementation that covers engine management and can easily (measure of hours, not days!) be made to control or monitor other common processes (of reasonable size). It works very well with AVR, but complex features in the future would benefit (mainly development costs!) from an ARM uC (see [BrainStorming]), and maybe from adding FPGA too.

The GoBox, which project is in the dreaming phase, without any working code or refined concept, but with some good estimates for requirements of the underlying hardware (GoBox/TargetSpecifications) similar to GenBoard's may reuse the existing expertise and code and start to make a working solution within months or years.

Most tasks can be solved by either microcontroller or FPGA. The question is usually cost (which is development efforts, development time, and manufacturing costs, maintenance, cost of failures, training etc..). Some tasks can only be solved in Programmable Logic or even Application Specific Integrated Circuits (ASIC).

Many solutions are available off-the-shelf in very efficient form. Some are not, and PL design can solve the problem.

It can be very cost efficient to integrate various off-the-shelf components in one FPGA, at least compared to the cost of a custom chips design.

If one considers the long run (if tenth of thousand of products are produced) where a mature product of any design, let it be uC or FPGA dominated can be turned into an ASIC reducing cost and increasing performance further (eg. footprint and can be very nice in the end, only some hundred thousands Euros are needed for development and to start manufacturing.) But at that point, flexibility of the HW part (that used to be in FPGA) is limited to what is connected externally. Flexibility of SW part is not gone, since the software is stored in flash (rarely harddisk) and can be replaced later, any time.

Which is better?

FPGA (programmed logic)
or microcontroller (uC or general processors)

The question is ambiguous. It depends.

FPGA can include uC cores likely faster than the Mega128 (requiring external RAM and Flash) (http://www.opencores.org/projects.cgi/web/pavr/overview) allocating a large percentage of Logic Area for it. A proven high performance ARM micro controller, and a parallel interface to a smaller FPGA (which does not include a hard-to-rely on soft core) will surely be cheaper. Unless you need a solution for space-travel, than you will integrate the ARM core and the added logic into a custom chip.

The option of choosing a core with the limited performance of the Mega128 is complemented by the spare FPGA area in which additional cores or specialized Arithmetic logic, or State machines can be implemented.

Choosing available processors (uC) many times the performance for a few extra Euros is another option. The question boils down to the talent available to work with the technology.

The FPGA design allows developers to declare internal parallelisms naturally which is nice practice in the long run: that information is nice to have even for a design which is initially targeted for sequential processing, sharing the single interrupted thread of a uC.

Which suits a given application better? This is a good question. It can be answered if you have the specifications, and some model of how it could be implemented.

We'll see how it adds up for GoBox/TargetSpecifications

Let's see the different aspects that should be considered:

A majority of Engineers knows how to write code for general processors (uC). It is about time that Engineers even in developing countries overcome their fear to design anything more than using ancient Von Neuman Architecture.

The top speed of most applications can always be made faster on custom HW than the uC implementation. Given the designed PL fits into the size limitations of the choosen FPGA

If the target specification is easily satisfied with an offtheshelf IC and this can be a uC or FPGA.

one has to determine if the uC can handle the timing requirements (Number and Types of Inputs and outputs timers PWM etc). The data storage and computation requirements shouldn't be a problem with reasonable processor (if it is thatâ€™s a whole different ballpark).

one has to determine if the FPGA can handle the required complexity requirements of the application. If so, the input output timing shouldn't be a problem. (If data storage is required one may consider connecting a wide variety of RAM or ROM technologies externally)

If one struggles to get the desired timing that would yield real-world benefits (e.g. if timing is so sloppy that better timing would mean more engine power), he should consider upgrading to a better controller, or putting some tasks to HW: e.g. a TPU or an FPGA.

Why is the above?

The general purpose processor processes the instructions sequentially, one after the other.

The programmed logic can process the instructions in parallel.

If you have code that can be executed in parallel, it may not lead to equivalent results to execute it on a general processor sequentially (time multiplexing).

If you have code for general purpose processor, you can decide to convert the code to a Sequential or parallel structure.

Some Think that writing code for general processors is simpler or that it costs less or can be stable faster this of course depends on the trajectory of the designer. Development costs are a major factor. Good thing that there is already a working solution the MS-AVR and several other uC based engine management designs. This gives those that want to acquire the skill to design superior solutions learning VHDL or use of other related tools, the opportunity to invest their time in peace. Only the Automobile makers have yet made working FPGA based implementations.

So why would one choose the FPGA if uC is easier to program

Because

for some tasks the response time of the uC program is not satisfactory for the app
and parallelizing some subtasks can make better response
and to reach the target response time, parallelizing to such extent is needed where the subtasks are so primitive, that an uC's overhead would be big wastage to solve them.

If one uses general processor for something that should be done with FPGA, he'll notice that his uC is doing large number of simple operations on a relatively small set of data (many iterations with few transistors being used at any given time).

If one uses FPGA for something that should be done with general processor, he'll notice that he has chip space inactive, however a lot of transistors can work in parallel at any time.

Anyone can buy a general computer that keeps Megabytes (in memory) available to the processor within usec. Not every type of chip gives as many options to what type of DRAM or Flash one can connect externally and in parallel (at the same time to individual user selectable pins).

Large amount of data that is accessed rarely is a reason for specifying external memory.

Having parallelizable subtasks is not a reason to make execution parallel. Only if parallelization is required to meet the specifications, or if somehow parallelization is cheaper should allow for parallelization.

FPGA at todayâ€™s densities are our first experiences in parallel computing. Modern 90nm logic brings 100 thousands of gate-equivalents into affordable reach (small if used as memory, huge if used for real processing and kept busy).

What are the requirements for an Engine Management System

Calculations and Timing

Calculations

In engine management system you need calculations that should be done in less than 10msec (100/sec). Even if you can do them in 1nsec, that will be not necessarily a measurable gain in output power compared to if those calculations take 1msec. The reason is that you don't have much chance to interfere before that time anyway. You have to wait for the next injection cycle, for the next ignition event.

Processing continuous Data Streams

Ideal algorithms use the least resources if they continuously process the data. This opens up the capacity to implement Fuzzy design that does not pretend to know exactly what the engine reality is but represents a number of indicators an which further decision are continuously based.

DetonationDetection

These are good examples of signals where there are theoretical limits for minimal input data to wait for before calculation can be carried out.

Also, for both, the results can be used for purely nothing (literally!) right away. Results are needed within about 6 msec, for the next spark timing of the same cyl (10 msec is the same engine phase in the next period at 12000 RPM).

Knockdetect is solved pretty well by offtheshelf solutions. Both HW and SW solutions exist. GenBoard v3 has a dedicated HW for it (you're not likely do better for this target app than Texas or Hipersil did, but who knows?) This is an acoustic signal (usually 3..10 kHz) and more than one wave is needed to be waited before good calc can be made. (I want continuous spectral analysis !?)

IonSense

This is a long (n*100usec) analog signal to be waited for before calculation can be done. (? Nobody has tried to continuously differentiate the signal and read the desired data from the first second and third derivative ?)

The system builds up of frequency-limited signals. It is meaningful to say that you need the result in 10msec or 1msec or maybe smaller for a given output result. If we had infinite frequency actuators, there might have been a point in decreasing response times to very small values. But simply, there is no such thing. Even in (hypotethical) applications where part of the Fuel (GoBox/Electrolysis) is produced continuously and on demand there are inherent delays for the gases to travel (eg. traveling 0.5m with 100 m/sec takes 5000 usec) and other signals to be measured.

The simplest is to use one or more faster processor, eg. a cheap PC uses 100..1000 times faster processor than the one we control 8cyl 10000 RPM engines with. But just for timing it's not needed, cause more precise timing may bring only unmeasurable gains. A bigger uC is still considered (some nice RISC like ARM GoBox/CPU ) because programming is easier (bigger wordlenght, less chance of incidental errors like overflow). Programming the calculations in HW has acquired the same easiness as programming uC thanks to the tools available today.

processor inside FPGA

When doing arithmetical calculations in an FPGA, the compilation result is essentially the same as the engineers of the ARM-CPU got, but you can choose which instructions you need and which you may want to eliminate or extend. For example if you choose to only need 1/2 of the instructions, you might make a core with maybe 70% of the logic (familiarize with microcode, RISC and VLIW designs, and you will find that you cannot eliminate more than 30% of the core's transistors by just eliminating instructions). This 70% in the FPGA will consume about 20 times more space in FPGA than on the ARM.

( Visit GoBox/CPU for considerations to choose a new uC for the next versions of GenBoards)

Timing

In other words TPU functions:

input capture (measures time of input events)
output timing (PWM, time edges).

There are some timers in every uC that can be used for the above. If more are needed, it can be extended with

EventQueue (precision is not as good as in Programmable Logic, but still satisfactory for most functions: more precise than the mechanical parts in the engine (e.g. timing-belt) (OnlineCourse/EventQueue )
An uC with more TIMERs (some have dozens)
A FPGA where the number of Timers, PWM, ADC is only limited by the number of pins.
Addition of small programmed logic array (FPGA).

Other applications

In other applications, such as high-speed networking there are some more that like FPGA:

crypto-algorithms (not applicable to the present challenge)
network packet switching (widely implemented in target HW, and hardly used in engine management)

Note that FFT (as most arithmetical calc) used to be most efficiently done in up-to-date DSPs.

Now that one can choose to do them fast and parallel or sequential and slower (less resource intensive) in FPGA. We can find many examples of them implemented in FPGA or in the stage where they have already been shrunk into an ASIC such as in your Digital Video Camera.

This is a list of what the FPGA can be better at (note that buzzwords are not design targets. They are vain unless they translate to a better overall implementation cost, which includes development, manufacturing.... you know that...):

Everything, every interaction between signals can happen in real time:

~10nanosec response time is possible (which may not be necessary to control an engine. The performance of an engine depends on other factors.

There are no cases, where a FFT based reaction is desired within 40 usec. If you have little clue about engine management, study it first. If you find something, say the estimation of benefits from faster replies, but please do not guess. As a comparison, actually there are no processed signals where a full wave takes less than 40usec (25kHz).

Better than 4 micro Seconds timing precision could add another 0.00% percent engine power:) compared to an ignition timing resolution of 4usec, where a +/-2..3 usec average delay is common due to event-conflicts - which conflicts are necessarily rare. )

FPGA in comparison to the (relatively) single consciousness of a microcontroller can add functions running in parallel and independent to other tasks (again, the question is if you need parallelization at all? Will it result in measurable gains or just added complexity and more errors?)
Numbers of timers, PWM, ADC (using Saw Tooth? Ramp and comparators) only limited by number of pins (very true)
Large number of input captures
Precise output timing for multiple channels even in the concurrent case.

This is a list of things, which make no score for neither FPGA nor uC:

Many different Chips support the same code -> growth path (like in case of generic uC: we often use the same source on 8MHz,4kbyte .. 2GHz,16Gbyte computers)
In system programmable through parallel ports
FPGA generally have more (???) Pins and have them user assignable (but there are also uC-s with 256..384 and more pins)
crankdegree resolution: also, when you have a 36..60..135..136 tooth crankwheel, you can gain absolutely nothing by adding more teeth. The captured time of teeth, and the internal quartz clock of the microcontroller gives you the high resolution Crank Angle, with about 10 times the resolution compared to what can be done without the intertooth-quartzclock interpolation, just by counting the teeth of a crazy-toothcount (max 1440 teeth per crank rotation). You'll use clock in any case, just toothcounting is inferior.
You can to find and implement open source cores of many types for example high definition FFT etc http://www.opencores.org/browse.cgi/by_category
You can also find many libraries that are implemented for generic uC.
FPGA designs can eventually be converted into ASICs with superior performance and less power consumption (like any design. If much calculation is used, like the fuel calculations, VE learning and ionsense, the power consumption will be higher of a custom design than the power consumption that the ARM or even the AVR development team once achieved when they designed their core)

If you badly need the concurrency - as in a TPU (what's a TPU? see definition above. The TPU is a small but important part of the system, because it takes care of the timing critical stuff. Chose an uC that provides the necessary TPU channels and IO, or if you somehow cannot find, add programmed logic next to a microcontroller. Otherwise use only a microcontroller - any sane uC already has some TPU channels internally called input capture and output timing - look for PWM channels.

uC is way better than FPGA when development costs are considered:

There are high level languages to write modules in (Standards like C, VHDL, Java-HDL, Schematic Entry). It still takes about 5..10 times the effort to write the same in Handel C? or other HW-generation-capable (concurrency-declaration) language than it takes to write in vanilla C with fixed-point (fixed-point is preferred for efficiency), which takes about 2x as much as write with floating point - say - on a PC.

Microprocessor, RAM and ROM can be realized internally or externally if it was ever necessary (at extra cost though! And yes, it is very much necessary)

Except for cracking cryptographic keys there is no numerical challenge that could not be solved given sufficient FPGA space. One can crunch IonSense data on the fly. We are looking forward to see IonSense data be more efficiently calculated in a microcontroller.

The key is will there be a measurable gain by doing parallelization or just added complexity?

Errors result from unpredictable behavior as new uC code additions interfere with the interrupt processing and stacks, heaps etc. Code additions in FPGA do not necessarily affect previously established structures - thereby increasing its overall easy of use and evolution.

Added by MembersPage/MarcellGal: Please familiarize yourself with how mainloop and interrupts works, and estimate the measurable benefits in parallelization. Numbers please, no blablah. I give a hint for this: the generated power depends on ignition advance with a sin(x) like function x being in the range of 60 .. 90 degrees. At 90 degrees it is flat, but engines often operate in the area where slope is finite. 1 crankdegree can get upto about 1..2 % power in the extreme case. However the hard part is to get the calculations for where the ignadv should be, and having the trigger support the precision mechanically (forget cambelt-driven trigger for this) not the timing. We need DetonationDetection and IonSense to get close.

Reliability issues

Testability is a key concept for making reliable and robust design. When reusing an IP Core(such as the core of the uC) for several tasks (programs), the reused part must be tested only once. It is especially easy to test memories. When you dedicate individual HW to each task, you need to design it testable. This is the hardest part of HW design. Even though testability only consumes about 10% or chip area in the typical case, the effort to make it testable is typically 50..60% of the design (yes: costs more than the base function). Testability is also an issue for SW: the algorithm must be good (see JUnit in the java land, for example). But testability is double concern with HW: both the HW and the algorithm must be good.

Why do experienced people (those who work a lot with electronics and FPGA and teach VHDL) say that they would not like to rely (for their daily drives or flights) on a uC or FPGA to start up each time at powerup?

Note that it is impossible to design Hardware (e.g. FPGA) without Software (writing code first). One always makes a working model first.

Apparently hardware and software meets in FPGA technology there you can reprogram infinitely often and use Software based simulation and Hardware based debugging (which you can program yourself - routing testsignals to unused pins if necessary connecting to a Logic Analyzer). But that is just at the low-level side of things. The truth is, you always need a model first to do anything useful (SW or HW). And the model is always (except maybe the most trivial tasks, such as address decoding and such) easier to run on a general processor (most of the time on PCs first) than on a dedicated HW.

Note that replacing the microcontroller is not sane. The microcontroller can do mathematical calculations, table storage much more efficiently than the FPGA (think about multiplication, division (!!!!), eeprom and flash, ADCs and SRAM the uC provides). The only thing the programmed logic (FPGA, CPLD) is better at is TPU functions (Time Processing Unit):

This implies that a bottom-up design (which means design targets the general case, without exact preliminary specifications) is almost have no users with only an FPGA. With an uC, or uC+FPGA that is completely different. A top-down design can sometimes be solved with just an FPGA, and it might more or less work to get it done better than with other tool-combinations, depending on specs (if it's mostly timing:FPGA. If most of it is calculations: uC).

Adding FPGA next to a uC could yield gains

Note that some microcontrollers have multichannel TPU in them, and all have some TPU channels.

I suggest the LPC2119 Philips ARM is used as uC, the apparent winner as on the GoBox/CPU page. I don't know which programmed logic is sane to use, but since GoBox has yet not revealed specifications or design targets or example algorithms to implement, it would be too early anyway. It's not clear at what point adding the programmed logic has a benefit (but it's certain that programmed logic adds 0.0 engine power gains - for huge amount of work - before IonSense uC implementation is finished).

MS-AVR, a working engine management system would eat up about 1/500 of the calculation power of today's fastest processors (TPU functions are backed up by helper-HW present inside uCs or standalone chips, otherwise could be done in FPGA), therefore it can run on very cool (low power consumption) and cost efficient chips very well.
same application would eat up about 25% of the most powerful FPGA, most variables sitting there consuming expensive gates instead of cheap RAM, doing nothing 99.999% of the time. (the consumption wouldn't as bad as could be derived from the prohibitive gate usage, since variables would truly be just sitting waiting, preventing utilizing the computation power of FPGA in this application).

Incorrect to be removed:

If you want to implement an ALU (arithmethical logical unit, which is the most usual way to carry out large number of computations) in the FPGA, you will get about 1..5% of the computing power of the processor (with worse reliability figures) of the same technology and die-size.

If you need large number of computations, in most cases you can do that most efficiently in a DSP or RISC or VLIW processor or uC. Most tasks can be mapped very well to execute on powerful ALUs, but there are very few tasks (eg. network packet switching) that can be mapped poorly: eg. TPU functions in case of an engine management, therefore these are always backed up by HW-capabilities in any sane design,