Note 2 for MATH 111 at KU
Popular in Course
Popular in Department
This 21 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Kansas taught by a professor in Fall. Since its upload, it has received 14 views.
Reviews for Note 2 for MATH 111 at KU
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 02/06/15
Hindawi Publishing Corporation EURASIP Journal on Embedded Systems Volume 2006 Article ID 56320 Pages 1719 DOI lot I I SSES200656320 An Overview of Recon gurable Hardware in Embedded Systems Philip Garcia Katherine Compton Michael Schulte Emily Blem and Wenyin Fu Department ofElectrical and Computer Engineering University ofWisconsinrMadison WI 537067169 USA Received 5 January 2006 Revised 7 June 2006 Accepted 19 June 2006 Over the past few years the realm of embedded systems has expanded to include a wide variety of products ranging from digital cameras to sensor networks to medical imaging systems Consequently engineers strive to create ever smaller and faster products many of which have stringent power requirements Coupled with increasing pressure to decrease costs and timeetoemarket the design constraints of embedded systems pose a serious challenge to embedded systems designers Recon gurable hardware can provide a exible and ef cient platform for satisfying the area performance cost and power requirements of many embedded systems This article presents an overview of recon gurable computing in embedded systems in terms of bene ts it can provide how it has already been used design issues and hurdles that have slowed its adoption Copyright 2006 Philip Garcia et al This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited 1 WHY USE RECONFIGURABLE HARDWARE IN EMBEDDED SYSTEMS Recon gurable hardware RH provides a exible medium to implement hardware circuits The RH resources are con gurable and generally recon gurable postefabrication ale lowing a singleebase hardware design to implement a Va riety of circuits The hardware itself is composed of a set of logic and routing resources controlled by con guration memory This memory is frequently implemented as SRAM cells though ash RAM and other technologies are also pos sible Some FPGAs employ antiefuses as a con guration medium 1 2 However because these devices are essen tially oneetime programmable they are not recon gurable and are thus not the focus of this article These memory cells and their stored values in particular affect the functionality of both routing and logic In the routing architecture a cell may control whether or not two wires are electrically con nected or provide a multiplexer select input In logic the cell may control the function of an ALU or implement logic equations in the form of a lookup table LUT which is the most common logic resource in eldeprogramrnable gate are rays FPGAs Essentially circuits are decomposed into small subfunce tions implemented in LUTs or other logic resources in the RH and the routing resources are con gured to electrically connect the logic resources to match the structure of the tar get circuit Writing a new set of values into the con guration memory recon gures the hardware to implement a different circuit Complex RH designs may also conmin communicae tion structures and processor cores that may or may not be recon gurable Embedded systems often have stringent performance and power requirements leading designers to incorporate specialepurpose hardware into their designs Hardware based implementations avoid the instruction fetchdecode execute overhead of traditional software execution and use resources spatially to increase parallelism In many embed ded applications such as multimedia encryption wireless communication and others highly repetitive parallel come putations wellesuited to hardware implementation represent a signi cant fraction of the overall computation required by the system 3 4 Unfortunately applicationespeci c integrated circuit ASIC implementation is not feasible or desirable for all cire cuits One key problem is that the nonerecurring engineering costs NREs of ASICs have been increasing dramatically A mask set for an ASIC in the 90 nm process cost about 1M 5 Previously using FPGAs as ASIC substitutes was only costeeffective in lowevolume applications FPGAs have high perfunit costs which are essentially an amortization of the FPGA NREs themselves over all customers for those chips However as ASIC NREs rise and FPGAs sell in higher vole umes the ASIC NREs begin to outweigh the perfunit cost of FPGAs for higherevolume applications shifting the bale ance towards FP GAs 6 Especially considering the exibility EURASIP Journal on Embedded Systems Software application Hardware kernel implementations hardware A B C a Recon gurable Memory system Recon gurable hardware Memory system b c FIGURE 1 Recon gurable computing implements computeiintensive application kernels a as hardware in RH and the remaining code in software on a CPU b Runitime recon guration allows RH to implement circuits that would otherwise not t simultaneously c of RH to accommodate new circuitry for bug xes protocol updates or new advances expensive and xedidesign ASIC technologybecomes less appealing Furthermore devices traditionally categorized as embed ded systems such as PDAs personal digital assismnts and cellular phones are becoming increasingly multipurpose These systems may implement a very diverse set of appli7 cations that require the performance and power bene ts of hardware implementation such as wireless communications cryptography and digital audiovideo Including a xed cusi tom hardware accelerator for each possible application type is generally infeasible particularly if one or more of the ap plications is not known at designtime RH can act as a gen eral hardware accelerator implementing a variety of differ ent compumtions within or across applications Compute intensive sections of applications can be swapped into the hardware when needed and later swapped out to make room for other compumtions a process called recon gurable com puting Figure 1 illustrates a case where after computations A and B are complete in hardware they can be replaced with compumtion Dipotentially while compumtion C is still running In effect runitime recon guration allows RH to act as a virtual hardware accelerator with capacities and capabilities beyond its actual physical structure Lowipower operation is critical to many embedded sys7 term to improve battery life reduce costs of operation and even improve reliability 7 Computations implemented in RH often dissipate less power than equivalent software run ning on embedded processors since they typically can be im7 plemented at lower clock rates and avoid the overhead assoi ciated with fetching decoding issuing and committing in dividual instructions 8712 However they also often have higher power dissipation than xed ASIC solutions 10 13 Finally the exibility of RH can also be used to increase the faultitolerance of designs RH can be recon gured to avoid hardware faults 14 whether they result from fabrii cation or the environment If the fault is from fabrication this increases product yield decreasing costs If the fault def velops after deployment this allows a faulty device to poten7 tially continue normal operation The new con guration can even be deployed remotely 14 15 to avoid inconveniencing the consumer or allow updates for a device that cannot be physically accessed systems deployed in space on the ocean oor or at other remote or unsafe locations Extra recon gi urable logic in a design can also allow a system to compensate if a fault occurs in a nonrecon gurable resource 16 The faultitolerance of RH can even extend to design faults allow ing bug xes or even upgrades for emerging smndards to in crease device lifespan Faultitolerance advanmges and tech niques are discussed in greater depth in Section 42 This article discusses the bene ts and issues of employ ing RH in embedded systems designs Section 2 lists a variety of applications implemented in embedded systems with RH Section 3 discusses basic architectural aspects and describes several example systems Other design issues critical to many embedded systems are discussed in Section 4 Section 5 ad dresses con guration overhead and Section 6 discusses de sign tools Future issues in recon gurable embedded com puting are discussed in Section 7 For more speci c technical information on RH and recon gurable computing as well as their use outside of embedded systems please refer to one or more ofthe following surveys 10 17722 2 WHAT APPLICATIONS BENEFIT FROM RHl Initially smaller recon gurable devices such as PLDs and PAIs were used as board level glue logic Similarly RH can now be used as chipilevel glue logic on systemsioniaichip SoCs 25 In particular RH can act as a exible communi7 cation fabric for different cores on the SoC 24726 This ale lows hardware design to proceed even if the intercomponent communication methods have not yet been nalized This approach also improves timeitoimarket and design costs be cause the testing of a single recon gurable communication fabric is faster and less costly than the testing of separate communications fabrics for many different SoC designs Fur thermore the con gurable communication fabric can poten7 tially be recon gured if necessary to circumvent design errors in other SoC components 23 27 Philip Garcia et al RH can also perform compumtions in a capacity be yond simple ASIC replacement By recon guring the hard ware at runtime one or more RH structure can be reused for many different computations over time Figure 1 10 207 22 Since many embedded systems must be both high performance and lowepower yet may also have size or ex ibility constraints preventing xedeASIC implementation RH provides a valuable implementation method Further more computational cores used in many applications are available as predesigned intellectual property 1P simplify ing the design process Softwarede ned radio Telecommunications industries employ consmntly evolving wireless technologies Companies under signi cant pressure to deliver products before their competitors sometimes even release products before standards are nalized Software de ned radios SDR are programmable to implement a va riety of wireless protocols potentially even those not yet in troduced 28735 Custom hardware allows many embed ded systems to meet stringent power and performance re quirements particularly for small batteryepowered mobile devices but in this case the system must also be extremely exible A system with RH can implement parallel DSP opere ations with a higher degree of both performance and power ef ciency than a softwareeonly system plus an RH system can be recon gured for different protocols as needed Medical imaging Recently several RHebased systems and algorithms have been proposed for medical imaging 36 37 The ECAT HRRT PET scanner from CTI PET Systems Inc 36 def tects abnormalities in organ systems helping to nd cane cerous tumors and assisting in monitoring ongoing patient treatment This system can dynamically recon gure itself for setup detection and equipment selfediagnosis modes One project implementing a parallelebeam backprojection for medical computer tomography on RH was able to ace celerate the application 100x over a 1 GHz Pentium by ime plementing a custom design in RH and performing a thore ough biteprecision analysis 37 This system also scales well with additional hardware 4x more hardware leads to 4x bet ter performance Networking RH is commonly used in network processors 38742 which have high performance demands and inherently parallel workloads Furthermore networks can use many different routing protocols and different system administrators may have varying needs at different times RH has been used in network devices to run msks such as packet classi cation 38 dynamic routing protocols 39 40 and intrusion de tection systems 42 among others RH can also accommoe date emerging network protocols through recon guration Encryption Many encryption algorithms are wellesuited to hardware ime plementation Operations are generally highly parallel and repetitive with the same series of operations performed on each piece of data Furthermore these algorithms fre quently use exclusiveeor operations which do not require the area and delay overhead of a complete ALU As en cryption research continues to evolve RH can be recon ge ured to implement new standards For these reasons encrype tion algorithms are a popular choice for RH implemenmtion 9 43 44 Scienti c data acquisition and analysis Scienti c dataeacquisition systems receive and preprocess vast quantities of dam before archiving or sending the data off for further processing These systems may be remote or inace cessible operating on battery or solar power yet requiring extremely high performance to handle the required volume of data These systems are increasingly using RH to provide this performance in a exible medium that can be changed as new approaches to data aggregation and preprocessing are researched RH has been used in systems proposed or created for weather radar 45 seismic exploration 46 and adap tive cameras for solar study 47 RH is also used to compress the massive volume of data prior to transmission 48 Spacecraft RH s lowevolume costeffectiveness and hardware exibile ity make it particularly applicable to space applications where it has been used for several missions including Mars Path nder and Surveyor 49 50 These devices can be re con gured to add functionality for updated mission objece tives or x design errors without requiring a space mis sion for repair Spacecraft require special radiationehardened devices that are not produced in the same volume due to higher cost and lower demand as standard microchips leading designers to incorporate the functionality of many different discrete components into one or a few radiation hardened FPGAs Faultetolerance issues are discussed in more depth in Section 42 More experimenml research ex amines the use of genetic algorithms to design evolvable RH that can automatically adapt to needed msks 51 Robotics Robotic control systems often consist of a mix of hardware and software solutions to meet strict size and power de mands One military system prototype uses RH to control unmanned aerial vehicles 46 These vehicles cannot sup port large payloads and must execute heavyeduty image pro cessing algorithms Other research focuses more generally on developing algorithms and hardware cores for robotic con trol and vision 46 52 53 An overview of RH in robotic applications appears in 53 EURASIP Journal on Embedded Systems Automotive The automotive industry has embraced RH because it can implement the functionality of many different parts reduce ing repair inventories Its programmable nature also simplii es product recalls Furthermore FPGAs are wellisuited to the increasingly complex informational and enterminment systems in newer automobiles 54 55 IP companies such as Drivven provide cores for many engine control systems such as fuel injection required by modern automobiles 56 which can be implemented in one of several FPGAs rated for automotive use Image and video Digiml cameras often need to implement many different imageiprocessing operations that must operate quickly with out consuming much battery power With RH the hardware can be recon gured to implement whichever operation is needed 57 58 For systems requiring secure image trans mission the RH can also be recon gured to perform encrypi tion and network interfaces 57 Some systems can also be con gured to accelerate image display 57 58 video play back 35 59 and 5D rendering 59761 3 WHAT DO THESE SYSTEMS LOOK LIKEl This section discusses the RH design and systemilevel inte7 gration examining different design aspects and how they re late to embedded systems design These topics are covered more generally in several FPGA and recon gurable compute ing survey articles 10 1772 Finally the end of this section presents several speci c embedded systems with RH 31 Recon gurable logic Although commercial RH tends to contain LUTibased or sumiofiproducts compute structures these are not neces7 sarily ideal for many embedded systems Each con guration point in these structures contributes some level of area def lay and power overhead and signi cant exibility of these structures may not be required if computations are limited to a particular domain In these cases a more specialized recon gurable fabric can provide the necessary level of exibility with lower overhead than a neigrained bitilevel logic struc7 ture 62766 However some applications including ceri min encryption algorithms cyclic redundancy check Reed Solomon encodersdecoders and convolution encoders do require bitilevel manipulations A number of recon gurable architectures combine ne and coarseigrained compute structures to accommodate both computation styles 677 69 Most frequently this involves embedding coarseegrained structures such as multipliers and memory blocks into a conventional neigrained fabric 70 or designing the ne grained fabric speci cally to support coarseigrained compui mtions 63 71 To implement a needed circuit in RH a CAD ow trans forms its descriptions into an RH con guration First the circuit is synthesized converting the circuit schematic or hardware design language HDL description into a struc7 tural circuit netlist Then a technology mapper further def composes that netlist into components matching the capai bilities of the RH s basic blocks LUTs ALUs etc Next the placer determines which netlist components should be as signed to which physical hardware blocks and a router def cides how to best use the RH s routing fabric to connect those blocks to form the needed circuit Finally the CAD ow def termines the speci c binary values to load into the con gura7 tion bits for the determined implemenmtion More details on generic CAD issues for RH can be found elsewhere 21 72 Like xed hardware design the CAD ow can mrget diff ferent areadelaypower tradeoffs through resource selection resource sharing pipelining loop unrolling wordlength ope timization precision estimation and others 73781 CAD issues particularly applicable to embedded systems however include heterogenous CAD topics 82784 CAD tools for nonsquare RH designs incorporated into SoCs 25 power aware CAD 8491 discussed further in Section 41 and fast CAD algorithms 92797 Fast CAD algorithms can move con gurations to new locations on RH at runitime or make small modi cations to circuits based on runitime conditions to increase ef ciency 98 99 based on available resources 75 or potentially to provide faultitolerance 32 Systemlevel integration Embedded systems typically couple a traditional procesi sor the host with custom hardware speci cally to hani dle computeiintensive highlyeparallel sections of application code 100 The processor controls the hardware and exei cutes the parts of applications not wellisuited to hardware Recon gurable computing systems also frequently couple RH with a processor for the same reasons as well as to control the con guration processor ofthe RH 10 20722 101 RH processor coupling styles can be divided into three basic cat egories RH as a functional unit on the processor dam path RH as a coprocessor and RH as an attached processor in a heterogeneous multiprocessor system The coupling meth7 ods are best differentiated by how and how often the RH and host processorss interact Recon gurable functional units RFUs are very tightly coupled with a host processor Input and output dam are generally read from and written to the processor s register le 66 71 1027106 These units essentially provide new instructions to an otherwise xed instruction set architec7 ture ISA In some cases the processor itself may be imple7 mented on recon gurable logic allowing signi cant procesi sor customization 106 107 In Section 62 we will examine some of the design tools that help simplify the process of crei ating these customAISA processors If the circuits on the RH can operate for some time in dependently of the host processor a coprocessor or even heti erogeneous multiprocessor coupling may be more appropri7 ate 3 4 1087112 A coprocessor may or may not share the data cache of the host processor but generally shares the main memory Figure 1 shows an example of a recon ge urable coprocessor that has its own path to a shared memory Philip Garcia et al structure A heterogeneous multiprocessor may contain one or more recon gurable units one or more embedded or gen eral purpose processors and possibly other specialepurpose processing elements 55 109 115 Like homogenous mule tiprocessor systems heterogeneous multiprocessors may use shared memory for communication between compute nodes 24 a communication bus or even a network architecture 115 Synchronization and scheduling issues of these sys7 tems are similar to those of homogenous multiprocessors In some cases using one or more separate FPGA chips plus the other system circuitry would violate the area per formance or power constraints of the embedded system However FPGA capacities are always increasing so to ad dress this problem designers can now use platform FPGAs or systems on programmable chips SoPCs which are large and complex enough to contain entire SoC designs and fre quently include xed communication structures and other commonlyeneeded circuitry 67769 114 Alternately recon gurable logic can be embedded within an SoC 62 64 115 116 to implement one or more computations This pro vides for domainespeci c SoCs that can be customized to the actual applications needed by programming the recon ge urable logic appropriately Domainespeci c SoCs therefore provide higher performance and lower power consumption than a traditional FPGA structure with some parts of the hardware implemented as standard cells or even full custom The RH itself can even be customized to the applications needed 117 Domainespeci c SoCs facilimte highly ef cient embedded systems but with NREs that are amortized over all applications within the domain 118 33 Example systems Embedded systems with RH span a range of sizes and come plexities some using many discrete RH components with others primarily contained in an SoPC Many of these sys7 tems use Linux or a modi ed lightereweight Linux as an op erating system because the source code is freely available for recompilation to the custom platform This section presents the highelevel design details of a number of systems to pro vide a avor of the range of systems using RH However this list is by no means exhaustive as there are a great many in teresting RHebased embedded systems One large system was designed for 5D vision 60 This system conmins an image acquisition board connected to a matrix of 56 Xilinx XC4005 FPGAs used for lowelevel image processing such as edge detection and edge tracking 1m ages preprocessed by the FPGAs are then sent to a board cone mining 16 DSPs for highelevel image processing This board also contains four more FPGAs used to create a recon ge urable interconnection network between the DSP chips CameEeleon Figure 2 is another imageerelated embed ded system designed in particular as a dynamic web cam era 57 This system is capable of downloading new image processing algorithms from a networked server and incorpoe rating them into the system implemented in RH However it is signi cantly smaller than the 5D vision system using a custom FPGA board with two Xilinx Virtex XCV800 FPA GAs The FPGA board is responsible for the image process FPGA1 FPGA2 13134 Virtex Virtex camera XCV800 XCV800 CamrErleon board To development board with CPU FIGURE 2 CameEeleon is a dynamically recon gurable web camera platform from IMEC 57 SRAM SRAM 3672X256k 36X256k AD i 1G ethernet AD6645 7 DSP Com DP83865 105 MSPS FPGA FPGA ND 133 A1356 45 Altera Altera 105 MSPS EPIS40 EPIS40 AT91RM9200 Flash SDRAM 10100 16X1M 52 X 4M Ethernet FIGURE 5 Block diagram of CASA an embedded radarebased haze ardous weather detection system using RH 45 ing computations A processor board running a Linux varie ant is responsible for network communication and recon ge uring the FPGAs The camera itself is a 15 megapixel image sensor directly connected to the FPGA containing the cam era interface This FPGA is also responsible for image pro cessing while the other FPGA encrypts the image for secure transmission All circuitry would normally have t in one of the two FPGAs but bandwidth concerns necessitated design partitioning between two chips CASA is a weather radar dam acquisition and process ing system used to detect hazardous conditions 45 A block diagram is given in Figure 5 Like CameEeleon 57 one of the two FPGAs in CASA is dedicated to signal processing the left FPGA in both gures and can be updated with new functionality remotely by a networked server In CASA the other FPGA is responsible for communication of result data but may also process dam depending on the con gu7 ration An ARMebased microcontroller running Linux mane ages the FPGA resources CASA also contains multibanked memory multiple Ethernet interfaces and analogeto 7digital AD converters to digitize incoming radar data CASA can process dam at sustained rates of 885 Mbs The Linuxebased SDR application described in 55 uses a single Xilinx Virtex74 FX FPGA in conjunction with an analog RF card memory and an output device frame buffer and audio The FPGA conmins two hard embedded EURASIP Journal on Embedded Systems FPGA Recognition Video RBF neural n etwork Image acquisition T Image 7 storage In put SRAM 7 Image a vectors scannmg extraction a SRAMCMOS sensor controller RBF Main Vector controller network FSM FSM Input Vectors Vectors calglslimn storage FIFO Windows FSM composition RBF FSM network controller Main controller Parallel port controller b FIGURE 4 Bloclclevel diagrams of the systemelevel design a and the FPGA design details b ofa facialerecognition system 119 PowerPC cores and several softicore components a demodi ulation core a memory controller and an IDCT The analog board receives the dam over a wireless network and sends it to the rst processor The rst processor coupled with the demodulation core processes the data and writes it to main memory The second CPU then decodes the data from meme ory using the IDCT core and the resulting video and au7 dio stream is then written to the output device A Linux based recon gurable encryption processor system also uses embedded PowerPC devices but instead in a VirtexilI Pro 44 In this system the RH contains a memory controller a bus bridge to communicate with the onichip peripheral bus OPB which in turn connects to an Ethernet controller a UART the cryptographic engine itself and control logic to manage the recon guration of the cryptographic engine The onichip PowerPC core communicates with these struc7 tures using the builtiin processor local bus PLB This sys7 tem can be recon gured to implement different encryption algorithms One project compared several systems implementing a face tracking algorithm including a Xilinx Sparmnill 300 FPGAibased system a custom ASIC7based hardware system and a softwareibased DSP implementation 119 The FPGA implemenmtion is shown in Figure 4 including a system level block diagram a and demils of the FPGA design b The FPGA conmins multiple interfacing controllers for the sensors the parallel port and the network and also implei ments a 157node radial basis function RBF neural network to detect faces and recognize facial expressions The cusi tom hardware system also used an FPGA but as glue logic not a compute engine As typically expected when compare ing ASIC FPGA and software implementations the soft ware implemenmtion had the lowest throughput onei fth of the ASIC and the custom hardware had the highest The FPGA implemenmtion had half the throughput of the ASIC version However the recognition rates were higher for the more exible solutions with the programmable DSP achieve ing the highest demonstrating a throughput accuracy trade off Both the FPGA and DSP implementations also have the bene t that they can be modi ed postideployment to imple7 ment new algorithms Several embedded systems use RH as custom functional units on a processor s data path One example of this system type is a 5D facial recognition program 120 using a Stretch S5 processor 66 This system beams an invisible light pat tern on a user s face which is then detected by cameras in terfaced with the processor By examining differences in the projected and detected light patterns the system reconstructs a 3D model of the target face in real time The system also contains an Ethernet link to allow the dam to be sent over a network The embedded design implemented on a 300 MHz S5 processor matched the performance ofa 3 GHz PC by us ing RH as an application accelerator However this applica7 tion was designed entirely in software and compiled by the Stretch compiler to a mix of software and hardwareia pro cess completed in ve personimonths Design tools for this development style are discussed further in Section 62 4 WHAT ARE OTHER IMPORTANT DESIGN ISSUES Beside the basic choices of RH logic design and RH inte7 gration low power faultitolerance and realitime issues are also critical to embedded systems designers Understanding the interaction between these topics and RH is important whether the designer is choosing offitheishelf components to include in a system choosing between completed systems or designing a new RH fabric speci cally for a particular em bedded system 41 Low power Many embedded devices are battery powered increasing the importance of power ef ciency Compumtions on ER GAs typically consume less power than equivalent software running on embedded processors but more power than ASICs 10 Studies examining the damiperiwatt ef ciency of FPGAibased implemenmtions have found that they can process just under 20x more damiperiwatt than a RISC7 style processor for both the IDEA encryption algorithm 9 and an FIR lter operation 8 Yet another study shows the use of RH yielding performance increases of 45x to 135x while simulmneously reducing power consumption by up to 93 over a veryilongiinstructioniwordistyle VLIW7style processor 11 To further improve RH powerief ciency Philip Garcia et al I I I VddL VddL Vd dL VddL I I I I VddL output w level converter 39 I l VddH j E VddH j E VddH 3 Uniform VddH routing Cwst EWIst CV22st VddH output wo level converter 39 I 39 l 39 VddH j E VddH j E VddH j E VddH j I j I VddH VddH 1 I LIL I 1st I E watt E VddH mar t E vain E VddH LIL F LIL H LIL VddH LIL C Valst E VddH E V11st E C V7111st E VddH E LIL E VddH E LIL Wm FIGURE 5 Two different layout patterns for xedidistribution duaIAVdd FPGA fabrics 88 researchers have investigated energyief cient architectures the use of multiple supply volmges or threshold voltages and energyief cient mapping techniques to implement algoi rithms on RH Several energyief cient recon gurable architectures have been speci cally developed to reduce power dissipation The FPGA interconnect and clock networks are responsible for most of the power dissipation in traditional FPGA architeci tures 121 One proposed neigrained FPGA structure imi proves energy ef ciency through a hybrid interconnect struc7 ture using nearestineighbor connections a symmetric mesh architecture and hierarchical connectivity to shorten and re duce the number of necessary wires 121 This FPGA are chitecture also uses lowivolmge circuit swing techniques and dual edgeitriggered ip ops to reduce the power dissipation from clock distribution MONTIUM is an energyief cient coarseigrained recon gurable architecture designed for 167 bit DSP applications 122 It improves power ef ciency by reducing interconnect and con guration overhead provid ing access to small local memories and optimizing the RH for wordilevel DSP applications The MONTIUM recon gi urable processor can implement an adaptive Viterbi algoi rithm using 200 times less energy than an ARM9 processor 12 Multiple supply voltages Vdd or threshold voltages Vt can also improve energyief ciency in RH Reducing Vdd de creases dynamic power while increasing Vt decreases leakage power Since changes to Vdd and Vt also affect noise mar gins and circuit speed appropriate values for Vdd and Vt must be carefully selected Proposed fabrics with prede ned dualind and dualin fabrics use lowileakage SRAM cells and dualin lookup mbles that do not penalize performance but reduce total power dissipation by 136 and 141 on average for combinational and sequential circuits respec7 tively 88 An example xed dualind FPGA layout is given in Figure 5 In dualind architectures timingicritical circuit paths are assigned to highind logic and routing while the remaining parts of the circuit are assigned to loindd re sources Level converters preserve a signal s value when tran7 sitionng between Vdd levels Programmable dualind are chitectures can provide an average power savings of 61 across various Microelectronics Center of North Carolina MCNC benchmarks 87 Multiplein architectures com bined with lowileakage multiplexer and routing structures gate biasing and redundant SRAM cells can reduce leakage current by roughly 2X to 4X over FPGA implemenmtions without any leakage reduction techniques 89 Finally many commercial FPGAs conmin multiple clock domains to allow designers to clock critical circuit sections at fast rates and noncritical sections at slower rates lowering overall power consumption of the design 6749 Dualind and dualin architectures require a CAD ow to choose between fast but powerihungry resources or slower but loweripower resources for circuit components 87789 However CAD algorithms can also affect circuit power ef ciency in existing RH designs For example resource see lection module disabling parallel processing pipelining and algorithmic selection together improved energy ef 7 ciency of FFT and matrix multiplication algorithms 85 A dynamic programmingibased approach to map beam forming applications on a Xilinx Virtexill Pro reduces en ergy dissipation by 52 on average over a greedy algorithm 86 Considering power implications of embedded memory blocks can reduce embedded memory dynamic power by an average of 21 and overall core dynamic power by an average of 7 84 Power information can also be incorporated into cost functions used for existing CAD processes Adding an FPGA power model 91 and using poweriaware algorithms throughout the CAD ow can provide 265 poweridelay product savings 90 42 Fault tolerance Faults can be divided into two categories permanent and transient Fabrication faults and design faults are among the permanent faults Transient faults commonly called sin gle event upsets SEUs are brief incorrect values result ing from external forces terrestrial radiation particles from solar ares cosmic rays and radiation from other space phenomena altering the balance or locations of electrons EURASIP Journal on Embedded Systems qgt FIGURE 6 Faults black can be overcome by remapping affected con gurations gray to nonfaulty areas of recon gurable hardware usually in a small area of the system We discuss both catei gories of faults as they relate to RH in this section Tolerating permanent faults is critical to maximizing de vice and system yields to decrease costs and to increasing the lifespan of deployed devices Lifespan is of particular con cern when a system has been deployed to a location dif cult dangerous or impossible to reach for repair or replacement Spaceedeployed unmanned systems for example must be extremely faultitolerant as replacementrepair would be ex pensive and at worst impossible RH can increase tolerance of permanent physical faults because the hardware is modii able to potentially compensate for these faults from fabrii cation or other sources within the RH Figure 6 14 123 or even elsewhere in the system 16 Yields of smtic FPGA devices chips used for a single nonchanging con guration can be increased by using applicationispeci c test vectors to determine if a particular faulty chip is capable of implement ing a particular con guration allowing designers to success fully use otherwise faulty chips 124 125 Finally design faults are among the easiest to x in RH as these devices can be reprogrammed with corrected versions of the faulty circuits Unfortunately although RH s value is in its exibility and that exibility can increase RH s tolerance to perma7 nent faults it can also increase its underlying susceptibile ity to faults The exibility of RH results from the ability to control its resources based on con guration bit values fre quently stored in SRAM These SRAM bits along with any other hardware used to provide exibility such as multiplex ers triistate buffers and pass transistors are additional fail ure points not present in ASICAequivalent circuit implemeni mtions and increase the chip area to present a larger target to radiation particles Furthermore unless the underlying RH design prevents multiple drivers to a wire instead of rely ing on the design tools to prevent it a fault in con guration memory could cause a shorticircuit damaging the device Using properlyishielded radiationihardened devices can minimize SEU errors Unfortunately these devices are ex pensive dif cult to nd and generally use less advanced technologies than their unshielded counterparts 14 123 Triple modular redundancy TMR can detect and correct faults in circuits implemented in FPGAs 126 In TMR three copies of all routing and logic resources perform the same compumtion and the three vote on the correct result The downsides of this technique include area power and per formance overheads that are generally unaccepmbly high for embedded devices and the fact that TMR cannot accommoi date simulmneous errors in multiple copies 14 127 Other faultitolerance techniques focus only on the con guration structure Scrubbing reads back all of the con guration bits compares them to the correct values and reiwrites the cor rect values if a discrepancy is found 127 128 Checksums can also be used to detect errors in subsets of con guration information such as a single logic block but requires addi7 tional resources to store the checksum values in the hardware 127 Los Alamos has researched methods to decrease SEU susceptibility of RH destined for spacecraft use 129 with the goal of tolerating and recovering from SEUs without a full system restart Continuous con guration bit polling com bined with circuit mapping techniques to make SEUs more easily visible allow easier detection of errors in con guration data 129 Similar work uses an SEU watchdog to reset RH after SEUs in highiradiation environment 130 Selfitesting can also be applied to RH with the hardware split into multiple selfitesting areas STARs Periodically each STAR is isolated from the rest of the system for test ing while the remainder of the system continues operation Detected faults cause the system to recon gure the applica7 tion to avoid the fault without interrupting system function and partial or entire STAR blocks can be marked as unusi able 131 This approach requires partitioning the hardware to match the STAR structure and ensuring each block is sufi ciently computationally independent Besides testing itself RH can act as a builtiin recon gurable tester for other parts of the system particularly for SoC devices 132 Any faultitolerance technique will impose additional overhead in terms of area delay power or some combination of the three One way to reduce this overhead is to ap ply faultitolerance techniques selectively within the system Hardware where faults could cause catastrophic failure ime proper levels of anesthesia to be delivered improper nitroi genoxygen mix in a pressurized vehicle etc receive the most protection while hardware where faults cause less criti7 cal errors momentary glitch in an LCD display receive less The COFTA project uses an automatic approach to deter mine where duplicateiandicompare hardware and assertions should be added to provide the same level of fault tolerance as TMR but with 60 less area overhead 133 43 Realtime support Many embedded systems require realitime operation Gene erally there are two types of realitime deadlines deadlines that must always be met hard deadlines and deadlines that must be met the majority of the time soft deadlines 134 Hard deadlines represent msks critical to system operation causing system failure if missed Soft deadlines are used for msks such as video playback where as long as the video pro cessing generally keeps up a few dropped frames are not crite ical These requirements shift the focus of the realitime op erating system RTOS to consider both deadline times and types and concentrate on optimizing worsticase msk execu7 tion times instead of averageicase times Philip Garcia et al In dynamically recon gurable systems the RTOS must mke into account not only ask types deadlines and deadline types but also RHtask resources and task con guration time 1357137 If multiple msks reside on the RH simultaneously the RTOS must also consider their locations in the hardware Generally a con guration is tied to speci c resources at spe7 ci c locations on RH However to facilimte runitime recon guration partially recon gurable architectures with relocai tion allow the locations of the asks to be moved to accomi modate other tasks 137 Issues related to con guration are chitectures and recon guration management are discussed in Section 5 An RTOS may use preemptive scheduling of tasks onto RH 138 For example a softideadline task present on the RH may be removed to make room for a hardideadline task These scheduling algorithms offer tradeoffs in terms of over all system utilization and the toml number of asks that can be effectively scheduled The OVERSOC project 135 invesi tigates the interaction between embedded RTOSs and recon gurable SoC platforms and proposes a variety of methods to model recon gurable fabrics and techniques for schedule ing realitime tasks on recon gurable SoC platforms Although using RH to create a realitime system with cusi tomized hardware instructions can improve task completion ratios most tools used to design these instructions 139 140 focus on reducing average application execution time when in fact worsticase time is generally more impormnt for real time operation One custom instruction generator tool de signed speci cally for realitime systems instead selects sub graphs for custom instruction implemenmtion to minimize worsticase task execution time 141 Topics related to cusi tom instruction generation for nonirealitime systems are discussed in more depth in Section 62 44 Design security Highiquality hardware cores for embedded systems are ex tremely useful to embedded designers speeding the develop ment process However these cores are also timeiconsuming and expensive to develop and verify Furthermore since the hardware designs frequently reside in a con guration bit stream loaded at startup or at runtime into the RH designs can be intercepted and reverseiengineered Therefore design security of this intellectual property IP is critical to core def velopers leading to encryption of con guration bitstreams 142 143 Both Altera and Xilinx have implemented con g7 uration encryption in their commercial products 144 145 5 WHAT ABOUT CONFIGURATION OVERHEAD Recon guring hardware at runtime allows a greater number of computations to be accelerated in hardware than could be otherwise but introduces con guration overhead as the con guration SRAM must be loaded with new values for each recon guration For separate FPGA chips this process can mke on the order of milliseconds 136 possibly overshadi owing the bene ts of hardware compumtion This section brie y presents both hardware and softwareirelated aspects of managing the con guration overhead A straightforward strategy to reduce con guration over head is to reduce the amount of data transferred The struc7 ture of the logicrouting itself has an effect neigrained de vices provide great exibility through a very large number of con guration points Coarseigrained architectures by na ture require fewer con guration bits because fewer choices are available The Stretch S5 embedded processor 66 for example is composed of 47bit ALU structures This architec7 ture can be con gured in less than 100 microseconds if the con guration dam is located in the onichip cache Partiallyirecon gurable RH can be selectively pro grammed 68 71 110 111 114 146 instead of forcing the entire device to be recon gured for any change a common requirement However to be truly effective for runitime recon gurable computing the devices must also relocate and defragment con gurations to avoid positioning con icts within the hardware and fragmenmtion of usable resources 137 1477149 mainmining intracon guration communi7 cation and connections to the outside of the RH A page based architecture is an alternate form of partially recon gi urable architecture that simpli es communication problems In a pageibased design identical tiles of recon gurable re sources are connected by a communication bus and con gi urations occupy some number of complete pages 1507152 Pipeline recon gurable architectures have a similar quality as each con guration smge may be assigned to any phys7 ical pipeline unit 111 These types of organizations can also be imposed on existing FPGA architectures by dedii cating part of the hardware to the required communication infrastructure 150 153 that simpli es crossicon guration communication Furthermore page or tileibased architeci tures would be especially useful in a system also require ing faultitolerance as the same division used for scheduling could be used for the STARS faultidetection approach dis cussed in Section 42 and faulty pages could be avoided Con guration dam can also be compressed 154 par ticularly useful when the RH and the con guration memory are on separate chips When possible onichip con guration memory or a con guration cache can dramatically decrease con guration times 66 155 due to shorter connections and wider communication paths Finally multiple con gurations can be stored within the RH at the con guration points in a multicontexted device 156 157 These devices have several multiplexed planes of con guration information Swapping between the loaded con gurations involves simply changing which con guration plane is addressed A key bene t of this approach is backgroundiloading of a con guration while an other is active Software techniques such as prefetching 158 or scheduling can also reduce con guration overhead by pre dicting needed con gurations and loading them in advance as well as retaining con gurations in a partially recon gi urable device that may be needed again in the near future If the system operation is wellide ned and known in advance temporal partitioning and smtic scheduling may be suf 7 cient 159 160 For other systems the simplest approach is 10 EURASIP Journal on Embedded Systems A W Designers can manually HWSW partition applications E B 7 using a combination of pro ling and intuition and develop g H W the components separately for each resource 171 Alter C W Tim e FIGURE 7 Different implementations fast but large small but slower or software for three kernels A B and C are shown over time Shaded areas show when kernels are not needed In this exam ple one fast or two small kernels can t in RH simultaneously to load con gurations as they are needed removing one or more con gurations from the RH if necessary to free suf 7 cient resources 66 155 161 162 In more complex systems compiler or useriinserted die rectives can be used to preload the con gurations in or der to minimize con guration overhead 155 or the con guration schedule can be determined during application compilation 163 dynamically at runtime 137 153 16 171 or a combination ofthe two 152 Although dynamic scheduling requires some overhead to compute the schedule this is essential if a variety of applications will execute cone currently on the hardware breaking the static predicmbility of the nextineeded con guration Dynamic scheduling also raises the possibility of runtime binding of resources to ei7 ther the recon gurable logic or the host processor 1687170 and of choosing between different versions of the compui mtion created in advance or dynamically 75 99 based on areaspeedpower tradeoffs 153 165 170 172 as shown in Figure 7 This could allow an embedded device to run much faster when plugged in and save power when operate ing on batteries To facilimte this scheduling the RH could be contextiswitched saving the current state before load ing a new one 66 173 174 possibly allowing preemptive scheduling ofthe resources 137 6 WHAT TOOLS AID THE RECONFIGURABLE EMBEDDED DESIGNER The design of recon gurable embedded systems or applica7 tions for them is frequently a complex process Fortunately tools can assist the designer in this process as described in this section 6 1 Hardwaresoftvvare codesign The recon gurable computing hardware software HW SW codesign problem is similar to general HWSW codesign and in many cases FPGAs are used to demonstrate tech niques even if they do not leverage runitime recon guration 24 175 176 Design patterns 77 in many cases can ape ply equally well to general hardware design and hardware design for recon gurable computing This section primar7 ily focuses on areas of codesign speci c to embedded recon gurable computing More information on general HWSW codesign can be found elsewhere 1777180 nately applications can be speci ed in a more uni ed form generally using a highilevel language HLL such as C or Java 66 175 1817183 but in many cases these compilers require code annomtions to specify hardwareispeci c infor7 mation custom bitwidths parallelism etc or only operate on a restricted subset of the language Some compilers per mit parallelism to be speci ed at the task level using threads 184 185 However compiling hardware from a software style description can be dif cult or inef cient due to the see quential nature of software and the spatial nature of hard ware 1867188 Some efforts have therefore focused on new ways to express computations that are more agnostic to nal implementation in hardware or software expressing instead the data ow of the application 151 1897191 One aspect of HWSW codesign unique to RH is temporal partitioning 160 171 192 193 the process ofbreaking up a single ciri cuit or a series of compumtions into a set of con gurations swapped in and out ofthe RH over time Some systems also allow these con gurations to be dynamically placed and con nected to the other components on RH 162 194 Finally designing an application for an embedded system with RH has the advanmge that veri cation tools can use the RH in conjunction with software simulation and debugging to accelerate the veri cation process 66 1957198 If design errors are found the RH can be recon gured with a xed design because con guration is not a permanent process 62 Processor ISA customization Backwardsicompatibility is generally far less critical to em bedded systems than to generalipurpose computers This ale lows embedded systems designers the freedom to adapt prof cessors ISAs to changing needs and technologies and makes custom compilers for such ISAs less of a burden as embedded applications are frequently developed by the same company that develops the hardware or one of its partners RH ale lows the designers to use a single chip design to implement dramatically different ISAs by reprogramming the RH with different functionalities Multiple design tools are available to automate this process 66 139 140 199 200 These tools generally examine precompiled binary instruction streams and generate data ow graphs as candidates for custom in structions Another approach is to create a compileetime list of potential con gurations and their associated binary in struction graph and at run time detect those graphs in the instruction stream replacing them with the appropriate RH operations 140 The SPREE tool 200 is a manualiassist tool that allows a designer to explore processor tradeoffs such as pipeline depth software versus hardware implemenmtion of compo nents such as multiplication and division and other design features The tool also removes unused instructions to save area Tool chains from Altera and Xilinx focus on SoPC plat form design with parameterizable softicore processors mane ually tuned to the respective FPGA architectures and core Philip Garcia et al 11 generators to create other common compumtional structures needed on SoPC designs Developers using Stretch procese sors write applications in C pro le them and choose can didate functions for RH to implement in a C variant de signed to specify hardware 66 120 Finally for designers wanting to create a xedesilicon custom processor with a re con gurable functional unit instead of a softecore processor implemented on an FPGA customizable processors such as Xtensa 201 provide a base processor design and a toolset for customization Xtensa is the base of Stretch Inc commere cially available recon gurable embedded processors 66 63 Automated RH design Finally automatic design tools can aid in the creation of the RH itself 2027204 The Totem project focuses on the creation of automatic design tools to create coarseegrained domainespeci c RH for SoCs based on the intended applica tions 205 Other work investigates the use of synthesizable FPGA structures either speci cally for embedding in SoCs 23 202 or tileebased FPGA layout generators usable eie ther in SoCs or as standealone architectures 204 This latter work created architectures in 34 personeweeks instead of 50 personeyears with only a 36 area penalty 7 WHAT DOES THE FUTURE HOLDl Recon gurable hardware faces a number of challenges if it is to become commonplace in embedded systems First there is a Catch722 in that because recon gurable compute ing is not a common technique in commercial hardware it is not yet something that many embedded designers will know to consider This problem is gradually being overcome with the introduction of recon gurable computing in cermin embedded areas such as network routers highede nition video servers automobiles wireless base stations and medie cal imaging systems Furthermore a greater number of peoe ple are exposed to recon gurable hardware as more univere sities include courses and laboratories using FPGAs Second the strict power limitations of many embedded systems high lights the power inef ciency of LUTebased recon gurable hardware compared to ASIC designs Because power con cerns are intensifying in all areas of computing research will increasingly focus on power ef ciency Efforts are already uni derway with researchers studying a variety of architectural and CAD techniques to improve power dissipation in recon gurable hardware and computing Third the exibility of recon gurable hardware that permits the fault tolerance bene e ts discussed in this article also increases the hardware s suse ceptibility to faults due to the extra area introduced to sup port recon gurability and the use of SRAMebased con gue ration bits Innovative recon gurable architectures circuit level design methodologies and techniques for detecting and avoiding faults are needed to further improve the fault tolere ance of recon gurable hardware There are also a number of softwareerelated issues to con sider Compiler support while improving is not yet at the level required for widespread adoption of embedded recon gurable computing In most cases the computations to be implemented in software and the compumtions to be imple mented in hardware must be speci ed separately in different languages and compiled with different toolsets While some systems and tool suites do offer a more uni ed ow these are currently less common Continued research in effective hardwareesoftware codesign is essential to improve the ease of application design for embedded recon gurable systems Furthermore even though the concept of OS support of ref con gurable hardware was proposed nearly a decade ago this area remains open These challenges are worth addressing as recon gurable hardware has many advanmges for embedded systems Ime plementing computeeintensive applications partially or com pletely in hardware can dramatically improve system perfore mance and or decrease system power consumption The ex ibility of the hardware allows a single structure to act as an accelerator for a variety of calculations saving the area that discrete specialized structures would otherwise require and allowing new computations to be implemented on the hard ware after fabrication That exibility can also be used to re duce the design and production cost of embedded system components as one physical design can be reused for mule tiple different tasks amortizing NREs Finally recon gurae bility provides new opportunities for faultetolerance since a design implemented in the recon gurable hardware can be con gured to avoid faulty areas of that hardware In some cases the recon gurable hardware can even be con gured to implement the functionality of a faulty component else where in the system For all of these reasons recon gurable hardware is a compelling component for embedded system design REFERENCES l I Greene E Hamdy and S Beal Antifuse eld pror grammable gate arrays Proceedings of the I EEE vol 81 no 7 pp 104271056 1993 2 Actel Corporation Programming Antifuse Devices Apr plication Note Actel Mountain View Calif USA 2005 httpwwwactelcom 3 G Lu H Singh M Lee N Bagherzadeh F I Kurdahi and E M C Filho The morphoSys parallel recon gurable sys tem in Proceedings of 5th International EurorPar Conference on Parallel Processing EuroePar 99 pp 7277734 Toulouse France AugustrSeptember 1999 4 G Kuzmanov G Gaydadjiev and S Vassiliadis The MOLEN processor prototype in Proceedings ofIZtli Annual IEEE Symposium on FieldrProgrammable Custom Computing Machines FCCM 04 pp 2967299 Napa Valley Calif USA April 2004 5 D Pramanik H Kamberian C Progler M Sanie and D Pinto Cost effective strategies for ASIC masks in Cost and Performance in Integrated Circuit Creation vol 5043 ofProe ceedings ofSPIE pp 1427152 Santa Clara Calif USA Februe ary 2003 6 Actel Corporation Flash FPGAs in the valueebased market white paper Tech Rep 5590002170 Actel Mountain View Calif USA 2005 httpwwwactelcom 7 B Moyer Lowrpower design for embedded processors Proceedings ofthe IEEE vol 89 no 11 pp 157671587 2001 12 EURASIP Journal on Embedded Systems 8 A Abnous K Seno Y lchikawa M Wan and I Rabaey Evaluation of a lowepower recon gurable DSP architece ture in Proceedings of the 5th Recon gurable Architectures Workshop RAW 98 pp 55760 Orlando Fla USA March 1998 9 O Mencer M Morf and M I Flynn Hardware software triedesign of encryption for mobile communication units in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing ICASSP 98 vol 5 pp 30457 3048 Seattler Wash USA May 1998 10 R Tessier and W Burleson Recon gurable computing and digital signal processing a survey ournal of VLSI Signal Pror cessing vol 28 no 172 pp 7727 2001 11 A Lodi M Toma and F Campi A pipelined con ge urable gate array for embedded processors in Proceed ings ofACMSIGDA 11th International Symposium on Field Programmable Gate Arrays FPGA 03 pp 21729 Monterey Calif USA February 2003 12 G K Rauwerda G I M Smit and P M Heysters lme plementation of multiestandard wireless communication ref ceivers in aheterogeneous recon gurable systemeonechip in Proceedings ofthe 16th ProRISC Workshop pp 4217427 Velde hoven The Netherlands November 2005 13 l Kuon and I Rose Measuring the gap between FPGAs and ASle in Proceedings of the ACMSIGDA 14th International Symposium on FieldrProgrammable Gate Arrays FPGA 06 pp 21730 Monterey Calif USA February 2006 14 P A Laplante Computing requirements for selfrrepairing space systems ournal ofAerospace Computing Information and Communication vol 2 no 3 pp 1547169 2005 15 T Branca How to Add Features and Fix Bugs 7 Remotely Here s What You Need to Consider When Designing a Xilinx Online Application Xilinx 2001 16 C F Da Silva and A M Tokarnia RECASTER synthesis of faultetolerant embedded systems based on dynamically ref con gurable FPGAs in Proceedings of the 18th International Parallel and Distributed Processing Symposium IPDPS 04 pp 200372008 Santa Fe NM USA April 2004 17 I Rose A El Gamal and A Sangiovannievincentelli ArchL tecture of eldeprogrammable gate arrays Proceedings of the IEEE vol 81 no 7 pp 101371029 1993 18 W H MangioneeSmith B Hutchings D Andrews et al Seeking solutions in con gurable computing IEEE Come puter vol 30 no 12 pp 38743 1997 19 S Hauck The roles of FPGAs in reprogrammable systems Proceedings ofthe IEEE vol 86 no 4 pp 6157638 1998 20 R Hartenstein Trends in recon gurable logic and recon gurable computing in Proceedings of the 9th IEEE Internar tional Conference on Electronics Circuits and Systems ICECS 02 pp 8017808 Dubrovnik Croatia September 2002 21 K Compton and S Hauck Recon gurable computing a survey of systems and software ACM Computing Surveys vol 34 no 2 pp 1717210 2002 22 T I Todman G A Constantinides S I E Wilton O Mencer W Luk and P Y K Cheung Recon gurable computing are chitect39ures and design methods IEE Proceedings Computers andDigital Techniques vol 152 no 2 pp 1937207 2005 23 N Kafa K Bozman and S I E Wilton Architectures and algorithms for synthesizable embedded programmable logic cores in Proceedings ofACMSIGDA 11th International Syme posium on FieldeProgrammable Gate Arrays FPGA 03 pp 3711 Monterey Calif USA February 2003 24 M Luthra S Gupta N Dutt R Gupta and A Nicolau In terface synthesis using memory mapping for an FPGA plat form in Proceedings of IEEE 2 stInternational Conference on Computer Design VLSI in Computers and Processors ICCD 03 pp 1407145 San Jose Calif USA October 2003 25 T Wong and S I E Wilton Placement and routing for nonrrectangular embedded programmable logic cores in SoC design in IEEE International Conference on Field Programmable Technology FPT 04 pp 65772 Brisbane Australia December 2004 26 L Shannon and P Chow Simplifying the integration of processing elements in computing systems using a pror grammable controller in Proceedings of 13th Annual IEEE Symposium on FieldrProgrammable Custom Computing Mar chines FCCM 05 pp 63772 NapaValley Calif USA April 2005 27 B R Quinton and S I E Wilton Postesilicon debug using programmable logic cores in Proceedings of the IEEE Inter national Conference on FieldrProgrammable Technology FPT 05 pp 2417248 Singapore Republic of Singapore Deceme ber 2005 28 A Alsolaim I Becker M Glesner and I Starzyk An chitecture and application of a dynamically recon gurable hardware array for future mobile communication systems in Proceedings of the Annual IEEE Symposium on Field Programmable Custom ComputingMachines FCCM 00 pp 2057214 Napa Valley Calif USA April 2000 29 C Dick and F Harris FPGA implementation of an OFDM PHY in Proceedings of the 37th Asilomar Conference on Signals Systems and Computers vol 1 pp 9057909 Paci c Grove Calif USA November 2003 30 B Mohebbi E C Filho R Maestre M Davies and F I Kurdahi A case study of mapping a softwarerde ned radio SDR application on a recon gurable DSP core in Proceed ings oflst IEEEACMIFIP International Conference on Hard wareSoftware Codesign and System Synthesis pp 1037108 Newport Beach Calif USA October 2003 31 K Sarrigeorgidis and I M Rabaey Massively parallel wireless recon gurable processor architecture and program ming in Proceedings of17th International Parallel and Dis tributed Processing Symposium IPDPS 03 pp 1707177 Nice France April 2003 C Ebeling C Fisher G Xing M Shen and H Liu lmplee menting an OFDM receiver on the RaPiD recon gurable are chitecture IEEE Transactions on Computers vol 53 no 11 pp 143671448 2004 33 G K Rauwerda P M Heysters and G I M Smit Mapping wireless communication algorithms onto a recon gurable are chitecture ournal ofSupercomputing vol 30 no 3 pp 2637 282 2004 34 A Rudra FPGArbased applications for software radio RF Design Magazine pp 24735 2004 35 P Ryser Software de ne radio with recon gurable hard ware and software a framework for a TV broadcast re ceiver in Embedded Systems Conference San Francisco Calif USA March 2005 httpwwwxilinxcomproductsdesign resourcesproccentralresourceproccentralresourceshtm 36 Altera lnc Altera Devices on the Cutting Edge of Medical Technology 2000 httpwwwalteracomcorporatecust successescustomercst7CTIPEThtml 37 S Coric M Leeser E Miller and M Trepanier Parallel beam backprojection an FPGA implementation optimized 32 Philip Garcia et al 13 for medical imaging in Proceedings of the ACMSIGDA In ternational Symposium on FieldeProgrammable Gate Arrays FPGA 02 pp 2177226 Monterey Calif USA February 2002 38 A Johnson and K Mackenzie Pattern matching in recon ge urable logic for packet classi cation in Proceedings of Inter national Conference on Compilers Architecture and Synthesis for Embedded Systems CASES 01 pp 1267130 Atlanta Ga USA November 2001 39 F Braun J Lockwood and M Waldvogel Protocol wrapr pers for layered network packet processing in recon gurable hardware IEEE Micro vol 22 no 1 pp 66774 2002 40 E L Horta J W Lockwood D E Taylor and D Parlour Dynamic hardware plugins in an FPGA with partial run time recon guration in Proceedings of the 39th Design Aur tomation Conference pp 3437348 New Orleans La USA June 2002 Lattice Semiconductor Corporation Lattice Orca ORLllOG Datasheet 2002 42 Z K Baker and V K Prasanna A methodology for syn thesis of ef cient intrusion detection systems on FPGAs in Proceedings of the 12th Annual IEEE Symposium on Field Programmable Custom ComputingMachines FCCM 04 pp 1357144 NapaValley Calif USA April 2004 F Crowe A Daly T Kerins and W Marnane Singleechip FPGA implementation of a cryptographic coeprocessor in Proceedings of the IEEE International Conference on Field Programmable Technology pp 2797285 Brisbane Australia December 2004 T T7O Kwok and Y7K Kwok On the design of a self recon gurable SoPC based cryptographic engine in Pror ceedings of24th International Conference on Distributed Come puting Systems Workshops ICDCS 04 pp 8767881 Tokyo Japan March 2004 45 R l iasgiwale L Krnan A Perinkulam and R Tessier Ree con gurable data acquisition system for weather radar appli7 cations in Proceedings of48th Midwest Symposium on Cirr cuits and Systems MWSCAS 05 pp 8227825 Cincinnati Ohio USA August 2005 46 C Sanderson and D Shand FPGAs supplant processors and ASle in advanced imaging applications FPGA and Structured ASIC Journal 2005 httpwwwfpgajournalcom articles200520050104nallatechhtm 47 T R Rimmele Recent advances in solar adaptive optics in Advancements in Adaptive Optics vol 5490 ofProceedings of SPIE pp 34746 Glasgow Scotland UK June 2004 48 T Fry and S Hauck SPlHT image compression on FPGAs IEEE Transactions on Circuits and Systems for Video Technob ogy vol 15 no 9 pp 113871147 2005 49 R O Reynolds P H Smith L S Bell and H U Keller Dee sign of Mars lander cameras for Mars Path nder Mars Sure veyor 98 and Mars Surveyor 01 IEEE Transactions on In strumentation and Measurement vol 50 no 1 pp 63771 2001 50 M Ki e M Andro Q K Tran G Fujikawa and P P Chu Toward a dynamically recon gurable computing and com munication system for small spacecraft in Proceedings of the let International Communication Satellite System Conference efr Exhibit ICSSC 03 Yokohama Japan April 2003 51 A Stoica D Keymeulen C7S Lazaro WrT Li K Hayworth and R Tawel Toward oneboard synthesis and adaptation of electronic functions an evolvable hardware approach in 41 43 44 Proceedings of IEEE Aerospace Applications Conference vol 2 pp 3517357 Aspen Colo USA March 1999 52 J W Weingarten G Gruener and R Siegwart A stateeofr theeart 3D sensor for robot navigation in Proceedings of IEEERS International Conference on Intelligent Robots and Systems IROS 04 vol 3 pp 215572160 Sendai Japan SeptembereOctober 2004 53 W J MacLean An evaluation of the suitability of FPGAs for embedded vision systems in Proceedings of IEEE Confer ence on Computer Vision and Pattern Recognition CVPR 05 vol 3 pp 1317131 San Diego Calif USA June 2005 54 K Parnell You can take it with you on the road with Xilinx XcellJournal no 43 2002 55 K Parnell The changing face of automotive ECU design XcellJournal no 53 2005 56 Drivven Programmable Logic lP Cores for FPGA and CPLD httpwwwdrivvencomProg rammableLog39ice lPCoreshtm 2006 57 D Desmet P Avasare P Coene et al Design of CameEe leon a runetime recon gurable web camera in Embedded Processor Design Challenges Systems Architectures Modeling and Simulation SAMOS 02 vol 2268 of LNCS pp 2747 290 Springer Berlin Germany 2002 58 M Leaser S Miller and H Yu Smart camera based on recon gurable hardware enables diverse reaLtime applica7 tions in Proceedings of 12th Annual IEEE Symposium on FieldeProgrammable Custom Computing Machines FCCM 04 pp 1477155 Napa Valley Calif USA April 2004 59 J7Y Mignolet S Vernalde D Verkest and R Lauweree ins Enabling hardwareesoftware multitasking on a re con gurable computing platform for networked portable multimedia appliances in Proceedings of the International Conference on Engineering Recon gurable Systems and Algoe rithms pp 1167122 Las Vegas Nev USA June 2002 60 K M Hou E Yao X W Tu et al A recon gurable and exible parallel 3D vision system for a mobile robot in Pror ceedings of Computer Architectures for Machine Perception pp 2157221 New Orleans La USA December 1993 61 J P Durbano F E Ortiz J R Humphrey P F Curt and D W Prather FPGArbased acceleration of the 3D niteedifference timeedomain method in Proceedings of the 12th AnnualIEEE Symposium on FieldrProgrammable Custom ComputingMachines FCCM 04 pp 1567163 Napa Valley Calif USA April 2004 Elixent DFA 000 RISC Accelerator Elixent Bristol England 2002 63 K LeijteneNowak and J L Van Meerbergen An FPGA are chitecture with enhanced datapath functionality in Proceed ings ofACMSIGDA 11th International Symposium on Field Programmable Gate Arrays FPGA 03 pp 1957204 Mon terey Calif USA February 2003 64 Silicon Hive Silicon Hive Technology Primer Phillips Elece tronics NV The Netherlands 2003 65 A G Ye and J Rose Using multiebit logic blocks and au7 tomated packing to improve eldeprog rammable gate array density for implementing datapath circuits in IEEE Inter national Conference on FieldrProgrammable Technology FPT 04 pp 1297136 Brisbane Australia December 2004 66 J M Arnold S5 the architecture and development ow of a software con gurable processor in Proceedings of the IEEE International Conference on FieldeProgrammable Technology FPT 05 pp 1217128 Singapore Republic of Singapore December 2005 62 14 EURASIP Journal on Embedded Systems 67 Altera lnc Stratix II Device Handbook Volume I Altera San Jose Calif USA 2005 68 Xilinx lnc VirteerI Pro and VirteerI Pro XPlatform FPGAs Complete Data Sheet Xilinx San Jose Calif USA 2005 69 Xilinx lnc Virtexe4 Family Overview Xilinx San Jose Calif USA 2004 70 S Haynes A Ferrari and P Cheung Flexible recon gurable multiplier blocks suitable for enhancing the architecture of FPGAs in Proceedings ofthe Custom Integrated Circuits Con ference pp 1917194 San Diego Calif USA May 1999 71 S Hauck T Fry M Hosler and I Kao The Chimaera ref con gurable functional unit in Proceedings of the 5th Annual IEEE Symposium on FieldrProgrammable Custom Computing Machines FCCM 97 pp 87796 Napa Valley Calif USA April 1997 72 V Betz I Rose and A Marquardt Architecture and CAD for DeepeSubmicron FPGAs Kluwer Academic Boston Mass USA 1999 73 K I Kum and W Sung Combined wordelength optimizae tion and highelevel synthesis of digital signal processing sys tems IEEE Transactions on ComputereAided Design ofIntee grated Circuits and Systems vol 20 no 8 pp 9217930 2001 74 G A Constantinides P Y K Cheung and W Luk The mule tiple wordlength paradigm in Proceedings of the 9th Annual IEEE Symposium on FieldrProgrammable Custom Computing Machines FCCM 01 pp 51760 Rohnert Park Calif USA AprileMay 2001 75 U Malik K So and O Diessel Resourceeaware runetime elaboration of behavioural FPGA speci cations in Proceed ings ofIEEE International Conference on FieldrProgrammable Technology FPT 02 pp 68775 Hong Kong December 2002 76 Z Zhao and M Leeser Precision modeling of oatingepoint applications for variable bitwidth computing in Proceedings of the International Conference on Engineering of Recon ge urable Systems and Algorithms ERSA 03 pp 2087214 Las Vegas Nev USA June 2003 77 A DeHon I Adams M DeLorimier et al Designpatterns for recon gurable computing in Proceedings of the 12th Anr nual IEEE Symposium on FieldrProgrammable Custom Come puting Machines FCCM 04 pp 13723 Napa Valley Calif USA April 2004 78 K Han B L Evans and E E Swartzlander Ir Data wordlength reduction for lowepower signal processing soft ware in IEEE Workshop on Signal Processing Systems SIPS 04 pp 3437348 Austin Tex USA October 2004 79 I Park P C Diniz and K R Shesha Shayee Performance and area modeling of complete FPGA designs in the presence of loop transformations IEEE Transactions on Computers vol 53 no ll pp 142071435 2004 80 M L Chang and S Hauck Pr cis a usercentric word length optimization tool IEEE Design and Test of Computers vol 22 no 4 pp 3497361 2005 81 C Morra I Becker M AyalaeRincon and R Hartenstein FELIX using rewritingelogic for generating functionally equivalent implementations in Proceedings of International Conference on FieldrProgrammable Logic andApplications pp 25730 Tampere Finland August 2005 82 I Cong and S Xu Technology mapping for FPGAs with em bedded memory blocks in Proceedings of the ACMSIGDA International Symposium on FieldrProgrammable GateArrays FPGA 98 pp 1797188 Monterey Calif USA February 1998 83 S I E Wilton Implementing logic in FPGA memory are rays heterogeneous memory architectures in Proceedings of IEEE Symposium on FieldrProgrammable Custom Computing Machines FCCM 02 pp 1427147 Napa Valley Calif USA April 2002 84 R Tessier V Betz D Neto and T Gopalsamy Powerraware RAM mapping for FPGA embedded memory blocks in Proceedings of the ACMSIGDA International Symposium on FieldeProgrammable Gate Arrays FPGA 06 pp 1897198 Monterey Calif USA February 2006 85 S Choi R Scrofano V K Prasanna and I W Iang Energyeef cient signal processing using FPGAs in Proceed 7 ings of the ACMSIGDA International Symposium on Field Programmable Gate Arrays FPGA 03 pp 2257234 Mon terey Calif USA February 2003 86 I On S Choi and V K Prasanna Performance modeling of recon gurable SoC architectures and energyeef cient map ping of a class of application in Proceedings of I 1 th Annual IEEE Symposium on FieldrProgrammable Custom Computing Machines FCCM 03 pp 2417250 Napa Valley Calif USA April 2003 87 A Gayasen K Lee N Vijaykrishnan M Kandemir M I Ire win and T Tuan A dualevdd low power FPGA architecture in Proceedings of the 14th International Conference on Field ProgrammableLogic andApplications FPL 04 pp 1457157 Leuven Belgium AugusteSeptember 2004 88 F Li Y Lin L He and I Cong Lowepower FPGA us ing preede ned dualeVdddualth fabrics in Proceedings of ACMSIGDA 12th International Symposium on Field Programmable Gate Arrays FPGA 04 vol 12 pp 42750 Monterey Calif USA February 2004 89 A Rahman and V Polavarapuv Evaluation of low leakage design techniques for eld programmable gate arrays in ACMSIGDA International Symposium on Field Programmable Gate Arrays FPGA 04 vol 12 pp 23730 Monterey Calif USA February 2004 90 I Lamoureux and S I E Wilton On the interaction be tween powereaware computereaided design algorithms for eldeprogrammable gate arrays ournal of Low Power Elece tronics vol 1 no 2 pp 1197132 2005 91 K K W Poon S I E Wilton and A Yan A detailed power model for eldeprogrammable gate arrays ACM Transacr tions on Design Automation of Electronic Systems vol 10 no 2 pp 2797302 2005 92 A DeHon R Huang and I Wawrzynek Hardwareeassisted fast routing in Proceedings of the 10th Annual IEEE Syme posium on FieldrProgrammable Custom Computing Machines FCCM 02 pp 2057215 Napa Valley Calif USA April 2002 93 P Maidee C Ababei and K Bazargan Fast timingedriven partitioningebased placement for island style FPGAs in Pror ceedings of the 40th Design Automation Conference DAC 03 pp 5987603 Anaheim Calif USA June 2003 94 M G Wrighton andA M DeHon Hardwareeassisted simr ulated annealing with application for fast FPGA placement in ACMSIGDA 11th International Symposium on Field Programmable GateArrays FPGA 03 pp 33742 Monterey Calif USA February 2003 95 M Handa and R Vemuri Hardware assisted two dimene sional ultra fast placement in Proceedings of the Internae tional Parallel and Distributed Processing Symposium IPDPS 04 vol 18 pp 191571922 Santa Fe NM USA April 2004 Philip Garcia et al 15 96 S Li and C Ebeling QuickRoute a fast routing algorithm for pipelined architectures in Proceedings of IEEE Internar tional Conference on FieldrProgrammable Technology FPT 04 pp 73780 Brisbane Australia December 2004 97 R Lysecky F Vahid and S X D Tan A study ofthe scalae bility of onechip routing for justeinetime FPGA compilation in Proceedings of 13th Annual IEEE Symposium on Field Programmable Custom ComputingMachines FCCM 05 pp 57762 Napa Valley Calif USA April 2005 98 M Chu N Weaver K Sulimma A DeHon and I Wawrzynek Object oriented circuitegenerators in Java in Proceedings of the 6th Annual IEEE Symposium on Field Programmable Custom ComputingMachines FCCM 98 pp 1587166 Napa Valley Calif USA April 1998 99 A Derbyshire and W Luk Compiling runetime parametrise able designs in Proceedings of the IEEE International Confer ence on FieldeProgrammable Technology FPT 02 pp 44751 Hong Kong December 2002 W Wolf Computers as Components Principles ofEmbedded Computer Systems Design Morgan Kaufmann San Francisco Calif USA 2000 F Barat R Lauwereins and G Deconinck Recon gurable instruction set processors from a hardwaresoftware per spective IEEE Transactions on Software Engineering vol 28 no 9 pp 8477862 2002 F Razdan and M Smith A higheperformance microarchie tecture with hardwareeprog rammable functional units in Proceedings of the 27th Annual International Symposium on Microarchitecture MICRO 94 pp 1727180 San Jose Calif USA NovembereDecember 1994 103 R D Wittig and P Chow OneChip an FPGA processor with recon gurable logic in Proceedings of the IEEE Syme posium on FPGAsfor Custom Computing Machines pp 1267 135 Napa Valley Calif USA April 1996 104 I E Carrillo and P Chow The effect of recon gurable units in superscalar processors in Proceedings of the ACMSIGDA International Symposium on FieldrProgrammable GateArrays FPGA 01 pp 1417150 Monterrey Calif USA February 2001 105 B Mei S Vernalde D Verkest and R Lauwereins Design methodology for atightly coupled VLlW recon gurable ma trix architecture a case study in Proceedings of the Confer ence on Design Automation and Test in Europe DATE 04 vol 2 pp 122471229 Paris France February 2004 106 Altera lnc Nios II Processor Rq erence Handbook Altera San Jose Calif USA 2005 107 Xilinx lnc MicroBlaze Processor Reference Guide Xilinx San Jose Calif USA 2003 108 A Lawrence A Kay W Luk T Nomura and l Page Us ing recon gurable hardware to speed up product develop ment and performance in Proceedings of the 5th Internar tional Workshop on FieldeProgrammable Logic and Applicar tions FPL 95 pp 1117118 Oxford UK AugusteSeptember 1995 109 I M Rabaey A Abnous Y lchikawa K Seno and M Wan Heterogeneous recon gurable systems in IEEE Workshop on Signal Processing Systems Design and Implementation SiPS 97 pp 24734 Leicester UK November 1997 110 I R Hauser and I Wawrzynek Garp a MIPS processor with a recon gurable coprocessor in Proceedings of the 5th An nual IEEE Symposium on FieldrProgrammable Custom Come puting Machines FCCM 97 pp 12721 Napa Valley Calif USA April 1997 100 101 102 111 H Schmit D Whelihan A Tsai M Moe B Levine and R R Taylor PipeRench avirtualized programmable datapath in 018 Micron technology in Proceedings of the Custom In tegrated Circuits Conference pp 63766 Orlando Fla USA May 2002 112 M Bocchi C De Bartolomeis C Mucci et al A XiRisce based SoC for embedded DSP applications in Proceedings of the IEEE Custom Integrated Circuits Conference pp 5957598 Orlando Fla USA October 2004 113 R B Kujoth C7W Wang D B Gottlieb I J Cook andN P Carter A recon gurable unit for a clustered programmable recon gurable processor in Proceedings ofACMSIGDA 12th International Symposium on FieldrProgrammable Gate Arrays FPGA 04 vol 12 pp 2007209 Monterey Calif USA February 2004 114 Xilinx lnc VirteerI Platform FPGAs Complete Data Sheet Xilinx San Jose Calif USA 2004 115 Actel Corporation VariCoreTM Embedded Programmable Gate Array Core EPGATM 018pm Family Actel Mountain View Calif USA 2001 116 M2000 Press ReleaseiMay I5 2002 M2000 Bievres France 2002 117 K Compton and S Hauck Totem custom recon gurable array generation in Proceedings of the 9th Annual IEEE Syme posium on FieldrProgrammable Custom Computing Machines FCCM 01 pp 1117119 Rohnert Park Calif USA Aprilr May 2001 STMicroelectronics STMicroelectronics Introduces New Member of SPEArTM Family of Con gurable Systemeone Chip le Press Release 2005 httpusstcomstonline pressnewsyear2005p1711phtm 119 F Yang and M Paindavoine Implementation of an RBF neural network on embedded systems reaLtime face track ing and identity veri cation IEEE Transactions on Neural Networks vol 14 no 5 pp 116271175 2003 120 P Weaver and F Palma Using softwareecon g39urable pror cessors in biometric applications Industrial Embedded Sys7 tems Resource Guide pp 84786 2005 httpwwwindustrial7 embeddedcom V George Z Hui and I Rabaey The design ofa low energy FPGA in Proceedings of the International Symposium on Low Power Electronics and Design pp 1887193 San Diego Calif USA August 1999 122 P Heysters G I M Smit and E Molenkamp Energy ef ciency of the MONTlUM recon gurable tile processor in Proceedings of the International Conference on Engineering ofRecon gurable Systems and Algorithms ERSA 04 pp 387 44 Las Vegas Nev USA June 2004 123 G Asadi and M B Tahoori Soft error rate estimation and mitigation for SRAMebased FPGAs in Proceedings of the ACMSIGDA 13th International Symposium on Field Programmable Gate Arrays FPGA 05 pp 1497160 Mon terey Calif USA February 2005 124 Xilinx lnc EasyPath Devices Datasheet Xilinx San Jose Calif USA 2005 N Campregher P Y K Cheung G A Constantindes and M Vasilko Yield enhancements of designespeci c FPGAs in Proceedings of the ACMSIGDA International Symposium on FieldrProgrammable Gate Arrays FPGA 06 pp 937100 Monterey Calif USA February 2006 126 L Sterpone and M Violante Analysis of the robustness of the TMR architecture in SRAMrbased FPGAs IEEE Transacr tions on Nuclear Science vol 52 no 5 pp 154571549 2005 118 121 125 16 EURASIP Journal on Embedded Systems 127 P Bernardi M Sonza Reorda L Sterpone and M Violante On the evaluation of SEU sensitiVeness in SRAMrbased FR GAs in Proceedings of the 10th IEEE International Oanine Testing Symposium IOLTS 2404 pp 1157120 Madeira ls land Portugal July 2004 128 A Tiwari and K A Tomko Enhanced reliability of nite state machines in FPGA through ef cient fault detection and correction IEEE Transactions on Reliability Vol 54 no 3 pp 4597467 2005 129 P Graham M Caffrey M Wirthlin D E Johnson and N Rollins Recon gurable computing in space from current technology to recon gurable systemseonraechip in Proceed ings of the IEEE Aerospace Conference Vol 5 pp 2399L2410 Big Sky Mont USA March 2003 130 K Hasuko C Fukunaga R lchimiya et al A remote con trol system for FPGAeembedded modules in radiation en Viornments IEEE Transactions on Nuclear Science Vol 49 no 2 part 1 pp 5017506 2002 131 J Lach W H MangionerSmith and M Potkonjak Ef e ciently supporting faultetolerance in FPGAs in Proceedings of the ACMSIGDA 6th International Symposium on Field Programmable Gate Arrays FPGA 98 pp 1057115 Mon terey Calif USA February 1998 132 N Mokhoff lnfrastruct39ure IP Seen Aiding SoC Yields EE Times July 2002 133 B P Dave and N K Jha COFTA hardwareesoftware co synthesis of heterogeneous distributed embedded systems for low overhead fault tolerance IEEE Transactions on Compute ers Vol48 no 4 pp 4177441 1999 134 J W S Liu Reaerime Systems PrenticeeHall Englewood Cliffs NJ USA 2000 135 F Verdier J PreVotet A Benkhelifa D Chillet and S Pillee ment Exploring RTOS issues with a higheleVel model of a recon gurable SoC platform in Proceedings of the Eu ropean Workshop on Recon gurable Communication Centric ReCoSoC 05 Montpellier France June 2005 B Griese E Vonnahme M Porrmann and U Ruckert Hardware support for dynamic recon guration in recon gurable SoC architectures in Proceedings of the 14th In ternational Conference on FieldeProgram mable Logic and Apr plications FPL 04 pp 8427846 LeuVen Belgium August September 2004 C Steiger H Walder and M Platzner Operating systems for recon gurable embedded platforms online scheduling ofrealetime tasks IEEE Transactions on Computers Vol 53 no ll pp 139371407 2004 138 K Danne and M Platzner Periodic reaLtime scheduling for FPGA computers in Proceedings of the 3rd Workshop on In telligentSolutions in Embedded Systems WISES 05 pp 1177 127 Hamburg Germany May 2005 P Brisk A Kaplan R Kastner and M Sarrafzadeh Instruce tion generation and regularity extraction for recon gurable processors in Proceedings of the International Conferences on Compilers Architectures and Synthesis of Embeded Systems CASES 02 pp 2627269 Grenoble France October 2002 S Yehia N Clark S Mahlke and K Flautner Exploring the design space of LUTebased transparent accelerators in In tere national Conference on Compilers Architecture and Synthesis for Embedded Systems CASES 05 pp 11721 San Francisco Calif USA September 2005 141 P Yu and T Mitra Satisfying reaLtime constraints with cuse tom instructions in Proceedings of the 3rd IEEEACMIFIP International Conference on HardwareSoftware Codesign and 136 137 139 140 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 Systems Synthesis CODESISSS 05 pp 1667171 New Jere sey NJ USA September 2005 T Kean Secure con guration of eld programmable gate arrays in Proceedings of 11th International Conference on FieldeProgrammable Logic and Applications FPL 01 pp 1427151 Belfast Northern Ireland UK August 2001 L Bossuet G Gogniat and W Burleson Dynamically cone gurable security for SRAM FPGA bitstreams in Proceed ings of the International Parallel and Distributed Processing Symposium IPDPS 04 pp 199572002 Santa Fe NM USA April 2004 Xilinx Inc and A Telikepalli Is Your FPGA Design Secure Xilinx San Jose Calif USA 2003 Altera lnc FPGA Design Security Solution Using Max II Der vices Altera San Jose Calif USA 2004 C R Rupp M Landguth T Garverick et al The NAPA adaptive processing architecture in Proceedings of 6th IEEE Symposium on FieldrProgrammable Custom Computing Mae chines FCCM 98 pp 28737 NapaValley Calif USA April 1998 K Bazargan R Kastner and M Sarrafzadeh Fast template placement for recon gurable computing systems IEEE Der sign and Test ofComputers Vol 17 no 1 pp 68783 2000 K Compton Z Li J Cooley S Knol and S Hauck Con ge uration relocation and defragmentation for runetime recon gurable computing IEEE Transactions on Very Large Scale Integration VLSI Systems Vol 10 no 3 pp 2097220 2002 U Malik and O Diessel On the placement and granularity of FPGA con gurations in Proceedings ofIEEE International Conference on FieldeProgrammable Technology FPT 04 pp 1617168 Brisbane Australia December 2004 G Brebner Swappable logic unit a paradigm for Virtual hardware in Proceedings of the 5th Annual IEEE Symposium on FieldrProgrammable Custom ComputingMachines FCCM 97 pp 77786 Napa Valley Calif USA April 1997 E Caspi R Huang Y MarkoVskiy J Yeh J Wawrzynek and A DeHon A streaming multiethreaded model in Proceedings of the 3rd Workshop on Media and Stream Procese sors MSP 01 pp 21728 Austin Tex USA December 2001 Y MarkoVskiy E Caspi R Huang et al Analysis of quasi static scheduling techniques in a Virtualized recon gurable machine in Proceedings of I 0th ACM International Syme posium on FieldeProgrammable Gate Arrays FPGA 02 pp 1967205 Monterey Calif USA February 2002 V Nollet I Y Mignolet T A Bartic D Verkest S Vernalde and R Lauwereins Hierarchical runetime recon guration managed by an operating system for recon gurable systems in Proceedings of the International Conference on Engineering ofRecon gurable Systems and Algorithms pp 81787 Las Vee gas NeV USA June 2003 Z Li and S Hauck Con guration compression for Virtex FPGAs in Proceedings of the 9th Annual IEEE Symposium on FieldrProgrammable Custom ComputingMachines FCCM 01 pp 1477159 Rohnert Park Calif USA AprilrMay 2001 Z Li K Compton and S Hauck Con guration caching techniques for FPGA in Proceedings of 8th IEEE Symposium on FieldrProgrammable Custom ComputingMachines FCCM 00 Napa Valley Calif USA April 2000 A DeHon DPGA utilization and application in Proceed ings of the ACMSIGDA International Symposium on Field Programmable Gate Arrays FPGA 96 pp 1157121 Mon terey Calif USA February 1996 Philip Garcia et al 17 157 158 159 160 161 162 163 164 165 166 167 168 169 170 S Trimberger D Carberry A Johnson and I Wong A time multiplexed FPGA in Proceedings of the 5th Annual IEEE Symposium on FieldeProgrammable Custom Computing Mar chines pp 22728 Napa Valley Calif USA April 1997 Z Li and S Hauck Con guration prefetching techniques for partial recon gurable coprocessor with relocation and defragmentation in Proceedings of 10th ACM International Symposium on FieldrProgrammable Gate Arrays FPGA 02 pp 1877195 Monterey Calif USA February 2002 R Maestre F I Kurdahi N Bagherzadeh H Singh R Here mida and M Fernandez Kernel scheduling in recon ge urable computing in Proceedings of Design Automation and Test in Europe Conference and Exhibition pp 90796 Munich Germany March 1999 K M Gajjala Puma and D Bhatia Temporal partitioning and scheduling data ow graphs for recon gurable compute ers IEEE Transactions on Computers vol 48 no 6 pp 5797 590 1999 G Brebner A virtual hardware operating system for the Xilr inx XC6200 in Proceedings of the 6th International Workshop on FieldeProgrammable Logic and Applications FPL 96 pp 3277336 Dermstadt Germany September 1996 I Resano D Mozos D Verkest and F Catthoor A recon ge uration manager for dynamically recon gurable hardware IEEE Design and Test of Computers vol 22 no 5 pp 4527 460 2005 A Sudarsanam M Srinivasan and S Panchanathan Rer source estimation and task scheduling for multithreaded ref con gurable architectures in Proceedings of the International Conference on Parallel and Distributed Systems ICPADS 04 pp 3237330 Newport Beach Calif USA July 2004 O Diessel H ElGindy M Middendorf H Schmeck and B Schmidt Dynamic scheduling of tasks on partially recon gi urable FPGAs IEE Proceedings Computers and Digital Tech niques vol 147 no 3 pp 1817188 2000 H Quinn L A S King M Leeser and W Meleis Run time assignment of recon gurable hardware components for image processing pipelines in 11th Annual IEEE Symposium on FieldrProgrammable Custom ComputingMachines FCCM 03 pp 1737182 Napa Valley Calif USA April 2003 G Stitt R Lysecky and F Vahid Dynamic hard ware software partitioning a rst approach in Proceedings ofthe 40th Design Automation Conference DAC 03 pp 2507 255 Anaheim Calif USA June 2003 I Noguera and R Badia Multitasking on recon gurable are chitect39ures microarchitect39ure support and dynamic schedule ing ACM Transactions on Embedded Computing Systems vol 3 no 2 pp 3857406 2004 A Ahmadinia C Bobda D Koch M Majer and I Teich Task scheduling for heterogeneous recon gurable compute ers in Proceedings of the 17th Symposium on Integrated Ci cuits and Systems Design pp 22727 Pernambuco Brazil September 2004 R Lysecky and F Vahid A con gurable logic architecture for dynamic hardware software partitioning in Proceedings ofDesign Automation and Test in Europe Conference and Ex hibition vol 1 pp 4807485 Paris France February 2004 W Fu and K Compton An execution environment for re con gurable computing in Proceedings of the 13th Annual IEEE Symposium on FieldrProgrammable Custom Computing Machines FCCM 05 pp 1497158 NapaValley Calif USA April 2005 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 T Wiangtong P Y K Cheung and W Luk Hard ware software codesign a systematic approach targeting dataeintensive applications IEEE Signal Processing Magae zine vol 22 no 3 pp 14722 2005 P Benoit L Torres G Sassatelli M Robert and G Cambon Automatic task scheduling loop unrolling using dedicated RTR controllers in coarse grain recon gurable architectures in Proceedings of the 19th IEEE International Parallel and Dis tributed Processing Symposium IPDPS 05 p 148a Denver Colo USA April 2005 H Simmler L Levison and R Manner Multitasking on FPGA coprocessors in The International Conference on FieldeProgrammable Logic Recon gurable Computing and Applications FPL 00 pp 1217130 Villach Austria August 2000 H Kalte andM Porrmann Context saving and restoring for multitasking in recon gurable systems in Proceedings of In ternational Conference on FieldeProgrammable Logic and Apr plications FPL 05 pp 2237228 Tampere Finland August 2005 Y Li T Callahan E Darnell R Harr U Kurkure and I Stockwood Hardwareesoftware coedesign of embedded ref con gurable architectures in Proceedings of 37 th Design Aur tomation Conference DAC 00 pp 5077512 Los Angeles Calif USA June 2000 M I W Savage Z Salcic G Coghill and G Covic Ex tended genetic algorithm for codesign optimization of DSP systems in FPGAs in Proceedings of IEEE International Con ference on FieldeProgram mable Technology FPT 04 pp 2917294 Brisbane Australia December 2004 S Kumar I H Aylor B W Johnson and W A Wulf The Codesign of Embedded Systems A Uni ed HardwareSoftware Representation Springer New York NY USA 1995 M Chiodo P Giusto A Iurecska H C Hsieh A SangiovannirVincentelli and L Lavagno Hardware software codesign of embedded systems IEEE Micro vol 14 no 4 pp 26736 1994 R Ernst Codesign of embedded systems status and trends IEEE Design and Test of Computers vol 15 no 2 pp 45754 1998 W Wolf A decade of hardwaresoftware codesign IEEE Computer vol 36 no 4 pp 38743 2003 M Gokhale I M Stone I Arnold and M Kalinowski Streameoriented FPGA computing in the StreamseC high level language in Proceedings of the Annual IEEE Symposium on FieldrProgrammable Custom ComputingMachines FCCM 00 Napa Valley Calif USA April 2000 Synopsys lnc CoCentric System C Compiler Synopsys Mountain View Calif USA 2000 M Weinhardt and W Luk Pipeline vectorization IEEE Transactions on ComputereAided Design ofIntegrated Circuits and Systems vol 20 no 2 pp 2347248 2001 D Niehaus and D Andrews Using the multiethreaded computation model as a unifying framework for hardware software coedesign and implementation in Proceedings of the 9th International Workshop on Objecteoriented Reaerime Dependable Systems WORDS 03 p 317 Capri Italy Octoe ber 2003 B Swahn and S Hassoun Hardware scheduling for dynamic adaptability using external pro ling and hardware thread ing in Proceedings of IEEEACM International Conference on ComputereAided Design ICCAD 03 pp 58764 San Jose Calif USA November 2003 18 EURASIP Journal on Embedded Systems 186 G De Micheli Hardware synthesis from CC models in Proceedings of Design Automation and Test in Europe Confer ence and Exhibition pp 3827383 Munich Germany March 1999 187 A DeHon Very large scale spatial computing in Proceed ings of the 3rd International Conference on Unconventional Models of Computation UMC 02 pp 27737 Kobe Japan October 2002 D Andrews D Niehaus and P Ashenden Program ming models for hybrid CPUFPGA chips IEEE Computer vol 37 no 1 pp 1187120 2004 189 J P David and J D Legat A data ow oriented co design for recon gurable systems in Proceedings of the 9th Interna tional Workshop on Rapid System Prototyping pp 2077211 Leuven Belgium June 1998 R Rinker M Carter A Patel et al An automated pro cess for compiling data ow graphics into recon gurable hardware IEEE Transactions on Very Large Scale Integration VLSI Systems vol 9 no 1 pp 1307139 2001 B Mei S Vernalde D Verkest H De Man and R Lauw ereins DRESC a retargetable compiler for coarse grained recon gurable architectures in Proceedings of IEEE Inter national Conference on Field Programmable Technology FPT 02 pp 1667173 Hong Kong December 2002 192 J M P Cardoso On combining temporal partitioning and sharing of functional units in compilation for recon gurable architectures IEEE Transactions on Computers VOl 52 no 10 pp 136271375 2003 S Banerjee E Bozorgzadeh and N Dutt Physically aware HW SW partitioning for recon gurable architectures with partial dynamic recon guration in Proceedings of the 42nd Design Automation Conference DAC 05 pp 3357340 Ana heim Calif USA June 2005 C Bobda and A Ahmadinia Dynamic interconnection of recon gurable modules on recon gurable devices IEEE De sign and Test of Computers vol 22 no 5 pp 4437451 2005 B Hutchings and B Nelson Developing and debugging FPGA applications in hardware with JHDL in Proceedings of 33rd Asilomar Conference on Signals Systems and Comput ers vol 1 pp 554558 Paci c Grove Calif USA October 1999 K A Tomko and A Tiwari Hardwaresoftware co debugging for recon gurable computing in Proceedings of the 5th IEEE International High Level Design Validation and Test Workshop HLDVT 00 pp 59763 Berkeley Calif USA November 2000 T Rissa W Luk and P Y K Cheung Automated combi nation of simulation and hardware prototyping in Proceed ings of the International Conference on Engineering of Recon gurable Systems and Algorithms ERSA 04 pp 1847193 Las Vegas Nev USA June 2004 198 G Talavera V Nollet J Y Mignolet et al Hardware software debugging techniques for recon gurable systems on chip in Proceedings of the IEEE International Conference on Industrial Technology ICIT 04 vol 3 pp 140271407 Hammamet Tunisia December 2004 Y Jin N Satish K Ravindran and K Keutzer An auto mated exploration framework for FPGA based soft multi processor systems in Proceedings of the 3rd IEEEACMIFIP International Conference on HardwareSoftware Codesign and System Synthesis CODESISSS 05 pp 2737278 Jersey City NJ USA September 2005 188 190 191 193 194 195 196 197 199 200 201 202 203 204 P Yiannacouras J G Steffan and J Rose Application speci c customization of soft processor microarchitecture in Proceedings of ACMSIGDA International Symposium on Field Programmable Gate Arrays FPGA 06 pp 2017210 Monterey Calif USA February 2006 R E Gonzalez Xtensa a con gurable and extensible pro cessor IEEE Micro vol 20 no 2 pp 60770 2000 A Yan and S J E Wilton Sequential synthesizable embed ded programmable logic cores for system on chip in Pro ceedings of the IEEE Custom Integrated Circuits Conference CICC 04 pp 4357438 Orlando Fla USA October 2004 S Hauck K Compton K Eguro M Holland S Philips and A Sharma Totem domain speci c recon gurable logic to appear in IEEE Transactions on Very Large Scale Integration VLSI Systems I Kuon A Egier and J Rose Design layout and veri cation of an FPGA using automated tools in Proceedings of the ACMSIGDA 13th International Symposium on Field Programmahle Gate Arrays FPGA 05 pp 215726 Mon terey Calif USA February 2005 Philip Garcia received a BS degree in computer engineering from Lehigh Uni versity He also received his MS de gree at Lehigh University concentrating on architecture aware database algorithms He currently is an Electrical Engineering PhD Student at the University of Wisconsin Madison studying under the advisement of Dr Katherine Compton His current re search is in the design of interfaces between recon gurable hardware and general processor systems Katherine Compton received her BS MS and PhD degrees from Northwestern University in 1998 2000 and 2003 respec tively Since January of 2004 she has been an Assistant Professor at the University of Wisconsin Madison in the Department of Electrical and Computer Engineering She and her graduate students are investigating new architectures logic structures integra a 1 tion techniques and systems software tech niques for recon gurable computing She serves on a number of program committees for FPGA and recon gurable computing con ferences and symposia She is also a Member of both ACM and IEEE Michael Schulte received a BS degree in electrical engineering from the University of Wisconsin Madison and MS and PhD degrees in electrical engineering from the University of Texas at Austin He is cur rently an Associate Professor at the Univer sity of Wisconsin Madison where he leads the Madison Embedded Systems and Archi tectures Group His research interests in clude high performance embedded proces sors computer architecture domain speci c systems computer arithmetic and recon gurable computing He is a Senior Mem ber of the IEEE and the IEEE Computer Society and an Associate Editor for the IEEE Transactions on Computers and the Journal of VLSI Signal Processing Philip Garcia et al 19 Emily Blem received a BS degree in Engi neering and a BA degree in Mathematics from Swarthmore College She is currently pursuing her PhD degree at the University of Wisconsin Madison Her research inter ests include computer architecture perfor mance analysis and modeling and recon g urable computing She is a Member of the IEEE and the IEEE Computer Society Wenyin Fu received the BS degree from Shanghai Iiaotong University in 1999 and the MS degree in both electrical engineer ing and computer science from the Uni versity of Wisconsin at Madison in 2003 and 2004 respectively His research interests center on computer architecture embedded systems and recon gurable computing He is currently working toward a PhD degree at the same university studying with Dr Katherine Compton EURASIP JOURNAL ON EMBEDDED SYSTEMS Special Issue on Embedded Systems for Intelligent Vehicles Call for Papers The transport sector is seeking new technology to improve safety driver comfort and ef cient use of infrastructures Computer vision range sensors adaptive control and net working among the others target problems like traf c ow control pedestrian protection laneedeparture monitoring smart parking facilities and driver assismnce in general Em bedded systems are sought after to implement technologie cally advanced solutions in smart vehicles The automotive industry addresses mass markets in which embedded systems have a dramatic impact on the nal consumer market price From the point of view of academic research intelligent vee hicles represent a complete and suf ciently complex bench mark for integrating sensors actuators and control to test prototypes of autonomous systems Additionally intelligent vehicles are a challenging environment with a direct applicae tive aspect for research on autonomous systems intended as systems reacting in a closed loop with the environment Topics of interest include smart sensors sensor fusion em bedded vehicle controls autonomous vehicles centralized and local traf c control GSM and ad hoc networking Blue tooth and IEEE 802154 technologies driverecomputer in terface signal processing for embedded environments aue tonomous components and intelligent control This special issue focuses on new results of research work in the eld of embedded systems for intelligent vehicles Seve eral main keywords are 0 Intelligent vehicles Autonomous vehicles 0 Embedded systems versus autonomous systems a Computer vision in embedded systems Laserradar range sensors 0 Multiple sensor embedded architectures Sensor networks for automotive applications a Vehicle networking Obsmcle detection and tracking GPSebased navigation 0 Design methodologies FPGA for embedded systems with application to intele ligent vehicles Authors should follow the EURASIP ES manuscript format described at httpwwwhindawicomjournalSesl Prospective authors should submit an electronic copy of their complete manuscript through the EURASIP ES manuscript tracking system at httpwwwhindawicommts according to the following timetable Manuscript Due October 15 2006 February 15 2007 May 15 2007 Acceptance Noti cation Final Manuscript Due Publication Date 3rd Quarter 2007 GUEST EDITORS Samir Bouaziz Institut d Electronique Fondamenmle Universite PariseSud XI Bat 220 91405 Orsay CedeX France bsiefu7psudfr Paolo Lombardi Institute for the Protection and Security of the Citizen European Commission U Joint Research Centre TP210 Via Fermi1 21020 Ispra Imly paololombardijrcit Roger Reynaud Institut d Electronique Fondamentale Universite PariseSud XI Bat 220 91405 Orsay CedeX France rogerreynaudiefuepsudfr Gunasekaran S Seetharaman Department ofElectrical and Computer Engineering Air Force Institute of Technology Dayton OH 45433 USA gunaieeeorg Hindawi Publishing Corporation h ttpmmch in duniron VLSI DESIGN Special Issue on NetworksonC hip Call for Papers Single chip and embedded systems are becoming increasingly complex and heterogeneous Such systemseonechip SoCs imply the seamless integration of numerous IP cores per forming different functions and operating at different clock frequencies On one hand this integration process requires standard interface sockets to allow for design reuse of IP components across multiple platforms On the other hand it is causing the scalability limitations of stateeofetheeart SoC busses to emerge Networkseonechip NoCs are generally viewed as the ule timate solution for the design of modular and scalable come munication architectures and provide inherent support to the integration of heterogeneous cores through the smne dardization of the network boundary NoC architectures loosen the delay bottleneck in signal propagation across deep esubmicron interconnects and are likely to improve de sign predicmbility although their area and power overheads still remain critical issues to be addressed by research This special issue is dedicated to the aspects of architecture and design methodology of onechip interconnection systems and their applications Topics of interest include but are not limited to a Design ows for NoCs and MPeSoC platforms 0 Modeling simulation and test of NoC systems Onechip network monitoring and management Architectures and topologies Performance and tradeeoff analysis 0 Mapping and scheduling applicationscommunication Energy ef ciency and power management 0 Fault tolerance and reliability issues 0 Routing and addressing issues QoS in NoC systems Recon gurability issues 0 Industrial case studies of SoC designs using the NoC paradigm Authors are encouraged to submit highequality research contributions that will not require major revisions Authors should follow the VLSI Design manuscript format at httpwwwhindawicomGetournalastjournalVISI Prospective authors should submit an electronic copy of their complete manuscript through the VLSI Design manuscript tracking system at httpwwwhindawicommts according to the following timetable Manuscript Due October 15 2006 Acceptance Noti cation December 15 2006 Final Manuscript Due March 15 2007 Publication Date 2nd Quarter 2007 GUEST EDITORS Davide Bertozzi Dipartimento di Ingegneria Universita di Ferrara Italy dbertozziingunifeit Shashi Kumar Department of Electronics and Computer Engineering School of Engineering Ionko39ping University Sweden ShashiKumaringhjse Maurizio Palesi Dipartimento di Ingegneria Informatica e delle Telecomunicazioni Universita di Camnia Italy mpalesidiitunictit Hindau39i Publishing Corporation h ttpmmch in dil iCOITl