FPGAProContra (2004-04-18 14:10:49)

Targeted for developers

FPGA Technology Primer: FPGA stands for Field Programmable Gate Array and refers to today's high density Programmable Logic (PL) Chips.

Visit the ChipTechnologyPrimer to learn how to recognize and categorize the available generations of Integrated Circuits.

This page discusses how to split a controlling task into subtasks that better be done in a microcontroller and subtasks that should go into a FPGA


Where are FPGA considered? - everywhere.

Most tasks can be solved by either microcontroller or FPGA. The question is usually cost (which is development efforts, development time, and manufacturing costs, maintenance, cost of failures, training etc..). Some tasks can only be solved in Programmable Logic or even Application Specific Integrated Circuits (ASIC).

Many solutions are available off-the-shelf in very efficient form. Some are not, and PL design can solve the problem.

It can be very cost efficient to integrate various off-the-shelf components in one FPGA, at least compared to the cost of a custom chips design.

If one considers the long run (if tenth of thousand of products are produced) where a mature product of any design, let it be uC or FPGA dominated can be turned into an ASIC reducing cost and increasing performance further (eg. footprint and can be very nice in the end, only some hundred thousands Euros are needed for development and to start manufacturing.) But at that point, flexibility of the HW part (that used to be in FPGA) is limited to what is connected externally. Flexibility of SW part is not gone, since the software is stored in flash (rarely harddisk) and can be replaced later, any time.


Which is better?

The question is ambiguous. It depends.

FPGA can include uC cores likely faster than the Mega128 (requiring external RAM and Flash) (http://www.opencores.org/projects.cgi/web/pavr/overview) allocating a large percentage of Logic Area for it. A proven high performance ARM micro controller, and a parallel interface to a smaller FPGA (which does not include a hard-to-rely on soft core) will surely be cheaper. Unless you need a solution for space-travel, than you will integrate the ARM core and the added logic into a custom chip.

The option of choosing a core with the limited performance of the Mega128 is complemented by the spare FPGA area in which additional cores or specialized Arithmetic logic, or State machines can be implemented.

Choosing available processors (uC) many times the performance for a few extra Euros is another option. The question boils down to the talent available to work with the technology.

The FPGA design allows developers to declare internal parallelisms naturally which is nice practice in the long run: that information is nice to have even for a design which is initially targeted for sequential processing, sharing the single interrupted thread of a uC.

Which suits a given application better? This is a good question. It can be answered if you have the specifications, and some model of how it could be implemented.

We'll see how it adds up for GoBox/TargetSpecifications


Let's see the different aspects that should be considered:

A majority of Engineers knows how to write code for general processors (uC). It is about time that Engineers even in developing countries overcome their fear to design anything more than using ancient Von Neuman Architecture.

The top speed of most applications can always be made faster on custom HW than the uC implementation. Given the designed PL fits into the size limitations of the choosen FPGA

If the target specification is easily satisfied with an offtheshelf IC and this can be a uC or FPGA.

If one struggles to get the desired timing that would yield real-world benefits (e.g. if timing is so sloppy that better timing would mean more engine power), he should consider upgrading to a better controller, or putting some tasks to HW: e.g. a TPU or an FPGA.


Why is the above?

The general purpose processor processes the instructions sequentially, one after the other.

The programmed logic can process the instructions in parallel.

If you have code that can be executed in parallel, it may not lead to equivalent results to execute it on a general processor sequentially (time multiplexing).

If you have code for general purpose processor, you can decide to convert the code to a Sequential or parallel structure.

Some Think that writing code for general processors is simpler or that it costs less or can be stable faster this of course depends on the trajectory of the designer. Development costs are a major factor. Good thing that there is already a working solution the MS-AVR and several other uC based engine management designs. This gives those that want to acquire the skill to design superior solutions learning VHDL or use of other related tools, the opportunity to invest their time in peace. Only the Automobile makers have yet made working FPGA based implementations.

So why would one choose the FPGA if uC is easier to program

Because

If one uses general processor for something that should be done with FPGA, he'll notice that his uC is doing large number of simple operations on a relatively small set of data (many iterations with few transistors being used at any given time).

If one uses FPGA for something that should be done with general processor, he'll notice that he has chip space inactive, however a lot of transistors can work in parallel at any time.

Anyone can buy a general computer that keeps Megabytes (in memory) available to the processor within usec. Not every type of chip gives as many options to what type of DRAM or Flash one can connect externally and in parallel (at the same time to individual user selectable pins).

FPGA at today’s densities are our first experiences in parallel computing. Modern 90nm logic brings 100 thousands of gate-equivalents into affordable reach (small if used as memory, huge if used for real processing and kept busy).


What are the requirements for an Engine Management System

Calculations and Timing


Calculations

In engine management system you need calculations that should be done in less than 10msec (100/sec). Even if you can do them in 1nsec, that will be not necessarily a measurable gain in output power compared to if those calculations take 1msec. The reason is that you don't have much chance to interfere before that time anyway. You have to wait for the next injection cycle, for the next ignition event.


Processing continuous Data Streams

Ideal algorithms use the least resources if they continuously process the data. This opens up the capacity to implement Fuzzy design that does not pretend to know exactly what the engine reality is but represents a number of indicators an which further decision are continuously based.

DetonationDetection

These are good examples of signals where there are theoretical limits for minimal input data to wait for before calculation can be carried out.

Also, for both, the results can be used for purely nothing (literally!) right away. Results are needed within about 6 msec, for the next spark timing of the same cyl (10 msec is the same engine phase in the next period at 12000 RPM).

Knockdetect is solved pretty well by offtheshelf solutions. Both HW and SW solutions exist. GenBoard v3 has a dedicated HW for it (you're not likely do better for this target app than Texas or Hipersil did, but who knows?) This is an acoustic signal (usually 3..10 kHz) and more than one wave is needed to be waited before good calc can be made. (I want continuous spectral analysis !?)

IonSense

This is a long (n*100usec) analog signal to be waited for before calculation can be done. (? Nobody has tried to continuously differentiate the signal and read the desired data from the first second and third derivative ?)

The system builds up of frequency-limited signals. It is meaningful to say that you need the result in 10msec or 1msec or maybe smaller for a given output result. If we had infinite frequency actuators, there might have been a point in decreasing response times to very small values. But simply, there is no such thing. Even in (hypotethical) applications where part of the Fuel (GoBox/Electrolysis) is produced continuously and on demand there are inherent delays for the gases to travel (eg. traveling 0.5m with 100 m/sec takes 5000 usec) and other signals to be measured.

The simplest is to use one or more faster processor, eg. a cheap PC uses 100..1000 times faster processor than the one we control 8cyl 10000 RPM engines with. But just for timing it's not needed, cause more precise timing may bring only unmeasurable gains. A bigger uC is still considered (some nice RISC like ARM GoBox/CPU ) because programming is easier (bigger wordlenght, less chance of incidental errors like overflow). Programming the calculations in HW has acquired the same easiness as programming uC thanks to the tools available today.


processor inside FPGA

When doing arithmetical calculations in an FPGA, the compilation result is essentially the same as the engineers of the ARM-CPU got, but you can choose which instructions you need and which you may want to eliminate or extend. For example if you choose to only need 1/2 of the instructions, you might make a core with maybe 70% of the logic (familiarize with microcode, RISC and VLIW designs, and you will find that you cannot eliminate more than 30% of the core's transistors by just eliminating instructions). This 70% in the FPGA will consume about 20 times more space in FPGA than on the ARM.

( Visit GoBox/CPU for considerations to choose a new uC for the next versions of GenBoards)


Timing

In other words TPU functions:

There are some timers in every uC that can be used for the above. If more are needed, it can be extended with


Other applications

In other applications, such as high-speed networking there are some more that like FPGA:

Note that FFT (as most arithmetical calc) used to be most efficiently done in up-to-date DSPs.

Now that one can choose to do them fast and parallel or sequential and slower (less resource intensive) in FPGA. We can find many examples of them implemented in FPGA or in the stage where they have already been shrunk into an ASIC such as in your Digital Video Camera.


This is a list of what the FPGA can be better at (note that buzzwords are not design targets. They are vain unless they translate to a better overall implementation cost, which includes development, manufacturing.... you know that...):

~10nanosec response time is possible (which may not be necessary to control an engine. The performance of an engine depends on other factors.

There are no cases, where a FFT based reaction is desired within 40 usec. If you have little clue about engine management, study it first. If you find something, say the estimation of benefits from faster replies, but please do not guess. As a comparison, actually there are no processed signals where a full wave takes less than 40usec (25kHz).

Better than 4 micro Seconds timing precision could add another 0.00% percent engine power:) compared to an ignition timing resolution of 4usec, where a +/-2..3 usec average delay is common due to event-conflicts - which conflicts are necessarily rare. )


This is a list of things, which make no score for neither FPGA nor uC:

If you badly need the concurrency - as in a TPU (what's a TPU? see definition above. The TPU is a small but important part of the system, because it takes care of the timing critical stuff. Chose an uC that provides the necessary TPU channels and IO, or if you somehow cannot find, add programmed logic next to a microcontroller. Otherwise use only a microcontroller - any sane uC already has some TPU channels internally called input capture and output timing - look for PWM channels.


uC is way better than FPGA when development costs are considered:


The key is will there be a measurable gain by doing parallelization or just added complexity?

Errors result from unpredictable behavior as new uC code additions interfere with the interrupt processing and stacks, heaps etc. Code additions in FPGA do not necessarily affect previously established structures - thereby increasing its overall easy of use and evolution.

Added by MembersPage/MarcellGal: Please familiarize yourself with how mainloop and interrupts works, and estimate the measurable benefits in parallelization. Numbers please, no blablah. I give a hint for this: the generated power depends on ignition advance with a sin(x) like function x being in the range of 60 .. 90 degrees. At 90 degrees it is flat, but engines often operate in the area where slope is finite. 1 crankdegree can get upto about 1..2 % power in the extreme case. However the hard part is to get the calculations for where the ignadv should be, and having the trigger support the precision mechanically (forget cambelt-driven trigger for this) not the timing. We need DetonationDetection and IonSense to get close.


Reliability issues

Testability is a key concept for making reliable and robust design. When reusing an IP Core(such as the core of the uC) for several tasks (programs), the reused part must be tested only once. It is especially easy to test memories. When you dedicate individual HW to each task, you need to design it testable. This is the hardest part of HW design. Even though testability only consumes about 10% or chip area in the typical case, the effort to make it testable is typically 50..60% of the design (yes: costs more than the base function). Testability is also an issue for SW: the algorithm must be good (see JUnit in the java land, for example). But testability is double concern with HW: both the HW and the algorithm must be good.

Why do experienced people (those who work a lot with electronics and FPGA and teach VHDL) say that they would not like to rely (for their daily drives or flights) on a uC or FPGA to start up each time at powerup?


Note that it is impossible to design Hardware (e.g. FPGA) without Software (writing code first). One always makes a working model first.

Apparently hardware and software meets in FPGA technology there you can reprogram infinitely often and use Software based simulation and Hardware based debugging (which you can program yourself - routing testsignals to unused pins if necessary connecting to a Logic Analyzer). But that is just at the low-level side of things. The truth is, you always need a model first to do anything useful (SW or HW). And the model is always (except maybe the most trivial tasks, such as address decoding and such) easier to run on a general processor (most of the time on PCs first) than on a dedicated HW.


Note that replacing the microcontroller is not sane. The microcontroller can do mathematical calculations, table storage much more efficiently than the FPGA (think about multiplication, division (!!!!), eeprom and flash, ADCs and SRAM the uC provides). The only thing the programmed logic (FPGA, CPLD) is better at is TPU functions (Time Processing Unit):

This implies that a bottom-up design (which means design targets the general case, without exact preliminary specifications) is almost have no users with only an FPGA. With an uC, or uC+FPGA that is completely different. A top-down design can sometimes be solved with just an FPGA, and it might more or less work to get it done better than with other tool-combinations, depending on specs (if it's mostly timing:FPGA. If most of it is calculations: uC).


Adding FPGA next to a uC could yield gains

Note that some microcontrollers have multichannel TPU in them, and all have some TPU channels.

I suggest the LPC2119 Philips ARM is used as uC, the apparent winner as on the GoBox/CPU page. I don't know which programmed logic is sane to use, but since GoBox has yet not revealed specifications or design targets or example algorithms to implement, it would be too early anyway. It's not clear at what point adding the programmed logic has a benefit (but it's certain that programmed logic adds 0.0 engine power gains - for huge amount of work - before IonSense uC implementation is finished).


Incorrect to be removed:

If you want to implement an ALU (arithmethical logical unit, which is the most usual way to carry out large number of computations) in the FPGA, you will get about 1..5% of the computing power of the processor (with worse reliability figures) of the same technology and die-size.

If you need large number of computations, in most cases you can do that most efficiently in a DSP or RISC or VLIW processor or uC. Most tasks can be mapped very well to execute on powerful ALUs, but there are very few tasks (eg. network packet switching) that can be mapped poorly: eg. TPU functions in case of an engine management, therefore these are always backed up by HW-capabilities in any sane design,