Tuesday, January 22, 2008 |
A | B | C | D |
---|---|---|---|
Opening Ceremony 08:30 - 09:00 |
|||
Keynote Session I 09:00 - 10:00 |
|||
New Challenges in High Level Synthesis 10:15 - 12:20 |
Power and Thermal Modeling and Optimization 10:15 - 12:20 |
Emerging Technologies 10:15 - 11:55 |
University LSI Design Contest 10:15 - 12:20 |
Advanced Topic in Logic Synthesis 13:30 - 15:35 |
Interconnect Modeling and Simulation Techniques 13:30 - 15:35 |
Floorplanning 13:30 - 15:35 |
Special Session - Tackling Manufacturability/Variability for 32nm and Below 13:30 - 15:35 |
Routing 15:50 - 17:55 |
Interconnect, NoCs, and MPSoCs 15:50 - 17:30 |
Special Session (Panel) The Tears and Joy of Sowing and Reaping Complex SoC's 15:50 - 17:55 |
Wednesday, January 23, 2008 |
Thursday, January 24, 2008 |
A | B | C | D |
---|---|---|---|
Keynote Session III 9:00 - 10:00 |
|||
Reliable/Testable Design Techniques 10:15 - 12:20 |
Communication and Interfaces 10:15 - 12:20 |
Power: Delivery and Reduction 10:15 - 12:20 |
Special Session (Panel) Concurrent SoC and SiP Designs 10:15 - 12:20 |
Test Generation and Test Power 13:30 - 15:35 |
Design Space Exploration 13:30 - 15:35 |
Reliability and Power Management 13:30 - 15:35 |
Designers' Forum - Low Power Chips 13:30 - 15:35 |
Analog/RF/Mixed Signal CAD 15:50 - 17:55 |
Architecture Exploration 15:50 - 17:55 |
Designers' Forum (Panel) Best Ways to Use Billions of Devices on a Chip 15:50 - 17:55 |
Tuesday, January 22, 2008 |
Title | (Keynote Address) A Brand New Wireless Day |
Author | *Jan M. Rabaey (Univ. of California, Berkeley, United States) |
Page | p. 1 |
PDF file |
Title | Variability-Driven Module Selection with Joint Design Time Optimization and Post-Silicon Tuning |
Author | Feng Wang, *Xiaoxia Wu, Yuan Xie (Pennsylvania State University, United States) |
Page | pp. 2 - 9 |
Keyword | module selection, design optimization, high level synthesis, delay variations |
Abstract | Increasing delay and power variation are significant challenges to the designers as technology scales to deep sub-micron (DSM) regime. Traditional module selection techniques in high level synthesis use worst case delay/power information to perform the optimization, and therefore may be too pessimistic. In this paper, we propose a module selection algorithm that combines design-time optimization with postsilicon tuning (using adaptive body biasing) to maximize design yield. Fast efficient performance and power yield gradient computation is developed. The post silicon optimization is formulated as an efficient sequential conic programming to determine the optimal body bias distribution, which in turn affects design-time module selection. To the best of our knowledge, this is the first variability-driven high level synthesis technique that considers post-silicon tuning during design time optimization. |
PDF file |
Title | Behavioral Synthesis with Activating Unused Flip-Flops for Reducing Glitch Power in FPGA |
Author | *Cheng-Tao Hsieh (Nat'l Tsing Hua Univ., Taiwan), Jason Cong, Zhiru Zhang (Univ. of California, Los Angeles, United States), Shih-Chieh Chang (Nat'l Tsing Hua Univ., Taiwan) |
Page | pp. 10 - 15 |
Keyword | low power, behavioral synthesis, FPGA |
Abstract | In this paper we discuss optimizing the interconnect power of designs implemented in FPGA platforms. In particular, we reduce the glitch power on interconnects associated with the output of functional units in a design. The idea is to activate unused flip-flops to block the propagation of glitches, which takes advantage of the abundant flip-flops in modern FPGA structures. Since the activation of additional flip-flops may cause data hazard problems, we develop several effective behavioral synthesis techniques to prevent such data hazards. We also study the optimality of our techniques. The experimental results show that on average, our methods lead to a 28% reduction in dynamic power in the Xilinx Virtex-II platform. |
PDF file |
Title | A Multicycle Communication Architecture and Synthesis Flow for Global Interconnect Resource Sharing |
Author | Wei-Sheng Huang, Yu-Ru Hong, Juinn-Dar Huang, *Ya-Shih Huang (National Chiao Tung University, Taiwan) |
Page | pp. 16 - 21 |
Keyword | multicycle communication architecture, distributed register file, interconnect, high-level synthesis, resource sharing |
Abstract | In deep submicron technology, wire delay is no longer negligible and is gradually dominating the system latency. Some state-of-the-art architectural synthesis flows adopt the distributed register (DR) architecture to cope with this increasing latency. The DR architecture, though allows multicycle communication, introduces extra overhead on interconnect resource. In this paper, we propose the Regular Distributed Register - Global Resource Sharing (RDR-GRS) architecture to enable global sharing of interconnects and registers. Based on the RDR-GRS architecture, we further define the channel and register allocation problem as a path scheduling problem of data transfers. A formal and flexible formulation of this problem is then presented and optimally solved by Integer Linear Programming (ILP). Experimental results show that RDR-GRS/ILP can averagely reduce 58% wires and 35% registers compared to the previous work. |
PDF file |
Title | Scheduling with Integer Time Budgeting for Low-Power Optimization |
Author | Wei Jiang, Zhiru Zhang, Miodrag Potkonjak, *Jason Cong (Univ. of California, Los Angeles, United States) |
Page | pp. 22 - 27 |
Keyword | Behavior Synthesis, Scheduling, Delay Budgeting |
Abstract | In this paper we present a mathematical programming formulation of the integer time budgeting problem for directed acyclic graphs. In particular, we formally prove that our constraint matrix has a special property that enables a polynomial-time algorithm to solve the problem optimally with guaranteed integral solution. Our theory can be directly applied to solve a scheduling problem in behavioral synthesis with the objective of minimizing the system power consumption. Given a set of scheduling constraints and a collection of convex power-delay tradeoff curves for each type of operation, our scheduler can intelligently schedule the operations to appropriate clock cycles and simultaneously select the module implementations that lead to low-power solutions. Experiments demonstrate that our proposed technique and produce near-optimal results (within 6% of the optimum by the ILP formulation), but with 40x+ speedup. |
PDF file |
Title | REWIRED - Register Write Inhibition by Resource Dedication |
Author | *Pushkar Tripathi, Rohan Jain (Indian Inst. of Tech. Delhi, India), Srikanth Kurra (Oracle, India), Preeti Ranjan Panda (Indian Inst. of Tech. Delhi, India) |
Page | pp. 28 - 31 |
Keyword | behavioural synthesis, register allocation, low power |
Abstract | We propose REWIRED (REgister Write Inhibition by REsource Dedication), a technique for reducing power during high level synthesis (HLS) by selectively inhibiting the storage of function unit (FU) output data into registers. Registers are generally inferred in HLS when data produced in one clock cycle is used in a later cycle. However, when it can be established that the input registers to an FU are not changing values during a certain period, the outputs during this period can be directly read off the FU output pins without needing to store them in registers. When the life-times of such data are short, it may be possible to completely eliminate the register storage operation, thereby reducing power. We present a genetic algorithm formulation and a heuristic for maximizing the number of register stores that can be inhibited in a scheduled data flow graph (DFG) during behavioral synthesis. |
PDF file |
Title | An Efficient Performance Improvement Method Utilizing Specialized Functional Units in Behavioral Synthesis |
Author | *Tsuyoshi Sadakata, Yusuke Matsunaga (Kyushu University, Japan) |
Page | pp. 32 - 35 |
Keyword | Behavioral Synthesis, Specialized Functional Unit, Module Selection, Scheduling, Funcitonal unit Allocation |
Abstract | This paper proposes a novel Behavioral Synthesis method that improves a performance of synthesized circuits utilizing specialized functional units efficiently. Almost all conventional methods can not utilize specialized functional units efficiently under a total area constraint because of their less flexibility for resource sharing. With proposed method, module selection, scheduling, and allocation problems under a total area constraint with specialized functional units can be solved in practical time. Experimental results show that proposed method has achieved up to 35 % and on average 14 % reduction of the number of cycles in practical time. |
PDF file |
Title | Predictive Power Aware Management for Embedded Mobile Devices |
Author | *Young-Si Hwang, Sung-Kwan Ku, Chan-Min Jung, Ki-Seok Chung (Hanyang University, Republic of Korea) |
Page | pp. 36 - 41 |
Keyword | Low Power, Embedded System, Dynamic Power Management |
Abstract | Intelligent power management of mobile devices is getting more important as ubiquitous computing is coming true in daily life. Power aware system management relies on techniques of collecting and analyzing information on the status of I/O devices or processors while some application is running. However, the overhead of collecting information using SW while the system is running is so huge that performance of the system may be severely deteriorated. Therefore, it is very crucial to design a PMU (power management unit) which collects information in HW so that the performance of the system is not degraded. In this paper, we propose a novel PMU design which collects information of I/O device while an application is running, and the power aware management is carried out based on the collected information. Experiments with various applications have been conducted to show the effectiveness of our design. |
PDF file |
Title | A Dynamic-Programming Algorithm for Reducing the Energy Consumption of Pipelined System-Level Streaming Applications |
Author | N. Liveris, *H. Zhou (Northwestern University, United States), P. Banerjee (HP Labs, United States) |
Page | pp. 42 - 48 |
Keyword | energy, power gating, streaming, pipeline |
Abstract | In this paper we present a System-Level technique for reducing energy consumption. The technique is applicable to pipelined applications represented as chain-structured graphs and targets the energy overhead of switching between active and sleep mode. The overhead is reduced by increasing the number of consecutive executions of the pipeline stages. The technique has no impact on the average throughput. We derive upper bounds on the number of consecutive executions and present a dynamic-programming algorithm that finds the optimal solution using these bounds. For specific cases we derive a quality metric that can be used to trade quality of the result for running-time. |
PDF file |
Title | Temperature-Aware MPSoC Scheduling for Reducing Hot Spots and Gradients |
Author | *Ayse Kivilcim Coskun, Tajana Simunic Rosing (Univ. of California, San Diego, United States), Keith A. Whisnant, Kenny C. Gross (Sun Microsystems, United States) |
Page | pp. 49 - 54 |
Keyword | scheduling, thermal management, reliability |
Abstract | Thermal hot spots and temperature gradients on the die need to be minimized to manufacture reliable systems while meeting energy and performance constraints. In this work, we solve the task scheduling problem for multiprocessor system-on-chips (MPSoCs) using Integer Linear Programming (ILP). The goal of our optimization is minimizing the hot spots and balancing the temperature distribution on the die for a known set of tasks. Under the given assumptions about task characteristics, the solution is optimal. We compare our technique against optimal scheduling methods for energy minimization, energy balancing, and hot spot minimization, and show that our technique achieves significantly better thermal profiles. We also extend our technique to handle workload variations at runtime. |
PDF file |
Title | Run-Time Power Gating of On-Chip Routers Using Look-Ahead Routing |
Author | *Hiroki Matsutani (Keio University, Japan), Michihiro Koibuchi (National Institute of Informatics, Japan), Daihan Wang, Hideharu Amano (Keio University, Japan) |
Page | pp. 55 - 60 |
Keyword | Network-on-Chips, look-ahead routing, power gating, leakage power, low power |
Abstract | Since on-chip routers in Network-on-Chips play a key role for enabling on-chip communication between cores, they must be always preparing for packet injections even if a part of cores are in standby mode, resulting in a larger standby power of routers compared with cores. The run-time power gating of individual channels in a router is one of attractive solutions to reduce the standby power of chip without affecting the on-chip communication. However, a state transition between sleep and active mode incurs the performance penalty, and turning a power switch on or off dissipates the overhead energy, which means a short-term sleep adversely increases the power consumption. In this paper, we propose a sleep control method based on look-ahead routing that detects the arrival of packets two hops ahead, so as to hide the wake-up delay and reduce the short-term sleeps of channels. Simulation results using real application traces show that the proposed method conceals the wake-up delay of less than five cycles, and more leakage power can be saved compared with the original naive method. |
PDF file |
Title | Automated Techniques for Energy Efficient Scheduling on Homogeneous and Heterogeneous Chip Multi-Processor Architectures |
Author | *Sushu Zhang, Karam S. Chatha (Arizona State Univ., United States) |
Page | pp. 61 - 66 |
Keyword | low power design, chip multi-processor, scheduling, approximation algorithm |
Abstract | We address performance maximization of independent task sets under energy constraint on chip multi-processor (CMP) architectures that support multiple voltage/frequency operating states for each core. We prove that the problem is strongly NP-hard. We propose polynomial time 2-approximation algorithms for homogeneous and heterogeneous CMPs. To the best of our knowledge, our techniques offer the tightest bounds for energy constrained design on CMP architectures. Experimental results demonstrate that our techniques are effective and efficient under various workloads on several CMP architectures. |
PDF file |
Title | Statistical Power Profile Correlation for Realistic Thermal Estimation |
Author | *Love Singhal (University of California, Irvine, United States), Sejong Oh (KAIST, Republic of Korea), Eli Bozorgzadeh (University of California, Irvine, United States) |
Page | pp. 67 - 70 |
Keyword | Power profile, Thermal-aware, temperature estimation, Clustering |
Abstract | At system level, the on-chip temperature depends both on power density and the thermal coupling with the neighboring region. The problem of finding the right set of input power profile(s) for accurate temperature estimation has not been studied. Considering only average or peak power density may lead either to underestimation or overestimation of the thermal crisis, respectively. To provide more realistic temperature estimation, we propose to incorporate multiple power profile representation, referred to as leader power profiles. Using the proposed statistical methods to determine the closeness between the power profiles, we apply a clustering algorithm to identify leader power profiles. We incorporate them in a thermal-aware floorplanner and empirical results show that using the single leader power profile (average or peak) leads to 37% degradation in critical wire delay and 20% degradation in wire length, compared to using the multiple leader power profiles. |
PDF file |
Title | Reconfigurable RTD-Based Circuit Elements of Complete Logic Functionality |
Author | *Yexin Zheng, Chao Huang (Virginia Tech., United States) |
Page | pp. 71 - 76 |
Keyword | reconfigurable, RTD circuit |
Abstract | Resonant tunneling diodes (RTDs) have demonstrated promising circuit characteristics of high speed switching property and versatile functionality with negative differential resistance (NDR). In this paper, we propose novel programmable logic elements (PLEs) that can be configured to realize all three- or four-input logic functions. These simple RTD-based circuit elements are implemented with threshold gates (TGs) and multi-threshold threshold gates (MTTGs) by employing programmable monostable-bistable logic element (MOBILE) principles. We also developed a dynamically reconfigurable scheme based on our PLE structures which facilitate nanopipelining without incurring delay overheads. |
PDF file |
Title | MBARC: A Scalable Memory Based Reconfigurable Computing Framework for Nanoscale Devices |
Author | Somnath Paul, *Swarup Bhunia (Case Western Reserve University, United States) |
Page | pp. 77 - 82 |
Keyword | Reconfigurable, Nano-Computing, Memory-based computing |
Abstract | We propose MBARC, a reconfigurable framework using memory as the primary computing element. The proposed framework leverages on the reported advantages of memory array design with nanodevices, which are compatible to fabrication into dense and regular structures. Simulation results for a set of ISCAS benchmarks show average improvement of 32% in area, 21% in delay and 34% in energy per vector compared to nanoscale FPGA implementation. |
PDF file |
Title | Moving Forward: A Non-Search Based Synthesis Method Toward Efficient CNOT-Based Quantum Circuit Synthesis Algorithms |
Author | *Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi (Amirkabir Univ. of Tech., Iran) |
Page | pp. 83 - 88 |
Keyword | Quantum Computing, Reversible Logic, CNOT-based Circuit, , Matrix representation |
Abstract | Quantum information processing is in the beginning stages and there is no mature method for quantum circuit synthesis. Among open research problems, quantum circuit synthesis has recently received significant attention. In this paper, we propose a new non-search based moving forward synthesis algorithm (MOSAIC) for CNOT-based quantum circuits. In contrast with the widely used search-based methods, MOSAIC is guaranteed to produce a result and can lead to a solution in much fewer steps. To evaluate the proposed algorithms, different circuits taken from the literature are used. The experimental results show the efficiency of the proposed algorithm. |
PDF file |
Title | A CAD Tool for RF MEMS Devices |
Author | *Rajesh Pande, Rajendra Patrikar (Visvesvaraya Nat'l Inst. of Tech., India) |
Page | pp. 89 - 94 |
Keyword | MEMS, CAD, FEM, RF |
Abstract | A stable,multiple energy domains and multi scale simulation tool for Microsystems is developed. A structured design methodology is adopted for design and optimization of RF MEMS shunt switch and MEMS inductor. The CAD tool developed is a device specific and incorporates physical parameters such as surface roughness. The tool analyzes the impact of surface roughness and also does thermal analysis. These are useful for understanding reliability and failure mechanisms of RF MEMS components. |
PDF file |
Title | A 1.2GHz Delayed Clock Generator for High-speed Microprocessors |
Author | *Inhwa Jung, Moo-Young Kim, Chulwoo Kim (Korea University, Republic of Korea) |
Page | pp. 95 - 96 |
Keyword | clock generator, clock-on-demand, lock time, low-power |
Abstract | A 1.2GHz delayed clock generator capable of adjusting its clock phase according to input clock frequencies has been developed. It consists of a full-digital CMOS circuit that leads to a simple, robust, and portable IP. One-cycle lock time enables clock-on-demand circuit structures. The implemented delayed clock generator tile in 0.13um CMOS technology occupies only 0.004mm2 and operates at variable input frequencies ranging from 625MHz to 1.2GHz. |
PDF file |
Title | LVDS-Type On-Chip Transmision Line Interconnect with Passive Equalizers in 90 nm CMOS Process |
Author | *Akiko Mineyama, Hiroyuki Ito, Takahiro Ishii, Kenichi Okada, Kazuya Masu (Tokyo Institute of Technology, Japan) |
Page | pp. 97 - 98 |
Keyword | transmision line , on-chip interconnect, global interconnect, LVDS-type, passive equalizer |
Abstract | This paper demonstrates a low voltage differential signaling (LVDS)-type on-chip transmission line (TL) interconnect with passive equalizers to solve delay issues on global interconnects. The proposed on-chip TL interconnect can achieve 10.5 Gbps signaling and has smaller delay, smaller delay variation and better power efficiency than conventional on-chip interconnects at high-frequencies. |
PDF file |
Title | A Slew-Rate Controlled Output Driver with One-Cycle Tuning Time |
Author | *Young-Ho Kwak, Inhwa Jung, Chulwoo Kim (Korea University, Republic of Korea) |
Page | pp. 99 - 100 |
Keyword | slew-rate, one-cycle, low power, 0.18um |
Abstract | A low-power slew-rate controlled output driver with open loop digital scheme, one-cycle lock time is presented. Proposed output driver maintains slew rate in the range of 2.1V/ns to 3.6V/ns in a one cycle after the enable clock is inserted. It is implemented in 0.18um CMOS process, and the control block consumes 13.7mW at 1Gbps. |
PDF file |
Title | A Low-Leakage Current Power 180-nm CMOS SRAM |
Author | *Tadayoshi Enomoto, Yuki Higuchi (Chuo University, Japan) |
Page | pp. 101 - 102 |
Keyword | SRAM, leakage, power, CMOS |
Abstract | A low leakage power, 180-nm 1K-b SRAM was fabricated. The stand-by leakage power of a 1K-bit memory cell array incorporating a newly developed leakage current reduction circuit called a “self-controllable voltage level (SVL)” circuit was only 3.7 nW, 5.4% of that of an equivalent conventional memory-cell array at VDD of 1.8 V. On the other hand, the speed remained almost constant with a minimal overhead in terms of the memory cell array area. |
PDF file |
Title | A CMOS Direct Sampling Mixer Using Switched Capacitor Filter Technique for Software-Defined Radio |
Author | *Hong Phuc Ninh, Takashi Moue, Takashi Kurashina, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan) |
Page | pp. 103 - 104 |
Keyword | Sampling Mixer, Switched Capacitor Filter, Software-Defined Radio |
Abstract | This paper proposes a novel direct sampling mixer (DSM) using Switched Capacitor Filter (SCF) for multi-band receivers. The proposed DSM has a higher gain, more flexibility and lower flicker noise than that of conventional circuits. The mixer for Digital Terrestrial Television (ISDB-T) 1-segment was fabricated in a 0.18um CMOS process, and measured results are presented for a sampling frequency of 800MHz. The experimental results exhibit 430kHz signal bandwith with 27.3dB attenuation of adjacent interferer assuming at 3MHz offset. |
PDF file |
Title | Small-Area CMOS RF Distributed Mixer Using Multi-Port Inductors |
Author | *Susumu Sadoshima, Satoshi Fukuda, Tackya Yammouch, Hiroyuki Ito, Kenichi Okada, Kazuya Masu (Tokyo Institute of Technology, Japan) |
Page | pp. 105 - 106 |
Keyword | mixer, cmos, uwb |
Abstract | This paper presents a novel small-area distributed mixer for ultrawide-band (UWB) receivers.The proposed mixer uses five 4-port inductors instead of fifteen 2-port inductors to shrink area of the circuit.The proposed mixer achieves conversion gain of -10dB, noise figure of 15dB, return loss of less than -10dB from 2.3 to 6.0GHz, IIP3 of 13.6dBm, and the circuit area of 0.51mm^2. |
PDF file |
Title | Dynamic Supply Noise Measurement Circuit Composed of Standard Cells Suitable for In-Site SoC Power Integrity Verification |
Author | *Yasuhiro Ogasahara, Masanori Hashimoto, Takao Onoye (Osaka University, Japan) |
Page | pp. 107 - 108 |
Keyword | measurement, power supply noise, ring oscillator |
Abstract | This paper presents an all digital measurement circuit called ``gated oscillator'' for capturing waveforms of dynamic power supply noise. The gated oscillator is constructed with standard cells, and thus can be easily embedded in SoCs for design verification. The performance of the gated oscillator is verified with fabricated test chips in a 90nm process. |
PDF file |
Title | Duo-Binary Circular Turbo Decoder Based on Border Metric Encoding for WiMAX |
Author | *Ji-Hoon Kim, In-Cheol Park (KAIST, Republic of Korea) |
Page | pp. 109 - 110 |
Keyword | turbo code, turbo decoder, duo-binary SISO decoding, WiMAX, interleaver |
Abstract | This paper presents a duo-binary circular turbo decoder based on border metric encoding. With the proposed method, the memory size for branch memory is reduced by half and the dummy calculation is removed at the cost of the small-sized memory which holds the encoded border metrics. Based on the proposed SISO decoder and the dedicated hardware interleaver, a duo-binary circular turbo decoder is designed for the WiMAX standard using a 0.13 um CMOS process, which can support 24.26Mbps at 200MHz. |
PDF file |
Title | Area and Power Efficient Design of Coarse Time Synchronizer and Frequency Offset Estimator for Fixed WiMAX System |
Author | *Tae-Hwan Kim, In-Cheol Park (KAIST, Republic of Korea) |
Page | pp. 111 - 112 |
Keyword | OFDM, WiMAX, IEEE 802.16d, coarse time synchronization, carrier frequency offset estimation |
Abstract | Targeting fixed WiMAX systems, this paper presents a new architecture for coarse time synchronization and carrier frequency offset (CFO) estimation. The proposed architecture is based on a two-step approach where the data-paths are decoupled to individually optimize performance and area. Implemented with 0.13um CMOS technology, the results show that the proposed architecture has advantages of less silicon area and power consumption as well as better performance compared to the previous joint approach. |
PDF file |
Title | A Low-Cost Cryptographic Processor for Security Embedded System |
Author | *Ronghua Lu, Jun Han, Xiaoyang Zeng, Qing Li, Lang Mai, Jia Zhao (Fudan Univ., China) |
Page | pp. 113 - 114 |
Keyword | Security, Processor, Cryptographic, RSA, AES |
Abstract | A low-cost cryptographic processor for security embedded system is presented in this paper. The processor, without any assistance of dedicated cryptographic coprocessors, is scalable and very efficient for popular cryptographic functions such as RSA/ECC, AES, Hash, etc. Based on SMIC 0.18um standard CMOS technology, the core circuit of the test chip has only about 32k gates, and a max frequency of 200MHz, under which the 1024-bit RSA algorithm takes only 150ms and the throughout of AES reaches 256Mbits/s. |
PDF file |
Title | Multithreaded Coprocessor Interface for Multi-Core Multimedia SoC |
Author | *Shih Hao Ou, Tay-Jyi Lin, Xiang Sheng Deng, Zhi Hong Zhuo, Chih Wei Liu (Nat'l Chiao Tung Univ., Taiwan) |
Page | pp. 115 - 116 |
Keyword | Dual-core, Multithreaded |
Abstract | Modern architectures exploit task level parallelism to improve their performance in a cost-effective manner. However, task synchronization and management is time consuming and wastes computing resources especially on application-specific architectures, such as DSP. In this paper, we propose a smart coprocessor interface that helps to offload the task management job from MPU or DSP. In our simulations, our approach can improve the overall performance of a dual-core platform by 57%. The hardware overhead of the interface is only 1.56% of the DSP core. |
PDF file |
Title | Parameterized Embedded In-circuit Emulator and Its Retargetable Debugging Software for Microprocessor/Microcontroller/DSP Processor |
Author | *Liang-Bi Chen, Yung-Chih Liu, Chien-Hung Chen, Chung-Fu Kao, Ing-Jer Huang (Department of Computer Science and Engineering National Sun Yat-Sen University, Taiwan) |
Page | pp. 117 - 118 |
Keyword | In-circuit Emulation, In-circuit Emulator, Testing, Debugging, Mircoprocessor |
Abstract | The in-circuit emulator (ICE) is commonly adopted as a microprocessor debugging technique. In this paper, a parameterized embedded in-circuit emulator and its retargetable debugging software are proposed. The parameterized embedded in-circuit emulator can be integrated into different style processors such as microcontroller, microprocessor, and DSP processor. The GUI interface Debugging software can help user to debug easily. As a result of it, the duration of microprocessor debugging design procedure time is reduced. |
PDF file |
Title | Global Optimization of Common Subexpressions for Multiplierless Synthesis of Multiple Constant Multiplications |
Author | Yuen-Hong Alvin Ho, Chi-Un Lei, *Hing-Kit Kwan, Ngai Wong (The University of Hong Kong, Hong Kong) |
Page | pp. 119 - 124 |
Keyword | common subexpression sharing, multiple constant multiplications, mixed-integer linear programming |
Abstract | In the context of multiple constant multiplication (MCM) design, we propose a novel common subexpression elimination (CSE) algorithm that models the optimal synthesis of coefficients into a 0-1 mixed-integer linear programming (MILP) problem. A time delay constraint is included for synthesis. We also propose coefficient decompositions that combine all minimal signed digit (MSD) representations and the shifted sum (difference) of coefficients. In some cases, the proposed solution space further reduces the number of adders/subtractors in the MCM synthesis. |
PDF file |
Title | Decomposition Based Approach for Synthesis of Multi-Level Threshold Logic Circuits |
Author | Tejaswi Gowda, *Sarma Vrudhula (Arizona State University, United States) |
Page | pp. 125 - 130 |
Keyword | Threshold Logic, Logic Synthesis, Logic Decomposition, Nano Circuits, EDA |
Abstract | Scaling is currently the most popular technique used to improve performance metrics of CMOS circuits. This cannot go on forever because the properties that are responsible for the functioning of MOSFETs no longer hold in nano dimensions. Recent research into nano devices has shown that nano devices can be an alternative to CMOS when scaling of CMOS becomes infeasible in the near future. This is motivating the need for stable and mature design automation techniques for threshold logic since it is the design abstraction used for most nano-devices. This paper presents a new decomposition theory that is based on the properties of threshold functions. The main contributions of this paper are: (1) A new method of algebraic factorization called the min-max factorization. (2) A decomposition theory that uses this new factorization to identify and characterize threshold functions. (3) A new threshold logic synthesis methodology that uses the decomposition theory. This synthesis methodology produces circuits that are better than the previous state of art (27% better gate count and comparable circuit depth). |
PDF file |
Title | Timing-Power Optimization for Mixed-Radix Ling Adders by Integer Linear Programming |
Author | *Yi Zhu, Jianhua Liu, Haikun Zhu, Chung-Kuan Cheng (University of California, San Diego, United States) |
Page | pp. 131 - 137 |
Keyword | prefix adder, power optimization, integer linear programming |
Abstract | This paper optimizes timing and power consumption of mixed-radix Ling adders with the physical area constraints using an integer linear programming formulation. Each cell in the prefix network is flexible to have different radix and size, and Ling carries are incorporated. Optimal solutions are obtained by solving the proposed formulation. The experiments show that the produced optimal structures have a large power saving compared with traditional designs. The ASIC implementation results are superior to those produced by Synopsys Module Compiler. |
PDF file |
Title | Efficient Synthesis of Compressor Trees on FPGAs |
Author | Hadi Parandeh-Afshar (Univ. of Tehran, Iran), *Philip Brisk, Paolo Ienne (EPFL, Switzerland) |
Page | pp. 138 - 143 |
Keyword | Generalized Parallel Counters, Compressor Tree, FPGA |
Abstract | FPGA performance is currently lacking for arithmetic circuits. In many applications, such as digital signal and video processing, large sums of k > 2 integer values is the most computationally intensive part. To improve the quality of addition circuits on FPGAs, both Xilinx and Altera have augmented their basic LUT structure with dedicated circuitry for addition, including a fast carry-chain that does not suffer from routing delays. To sum k > 2 values, the most efficient method is to use a tree of binary or ternary adders. In the world of ASICs, it is well known that compressor trees outperform adder trees when summing k > 2 values; however, due to the peculiarities of FPGAs, all previous literature has reported that adder trees are faster than compressor trees. This paper shows that the conventional wisdom is actually false. A heuristic to synthesize a compressor tree onto an FPGA is presented that reduces the combinational delay through the tree by 27.5%, on average, with an area increase of approximately 5.7%. |
PDF file |
Title | Area Recovery under Depth Constraint by Cut Substitution for Technology Mapping for LUT-Based FPGAs |
Author | *Taiga Takata, Yusuke Matsunaga (Kyushu University, Japan) |
Page | pp. 144 - 147 |
Keyword | Technology Mapping, FPGA, Logic Synthesis |
Abstract | This paper presents the post-processing algorithm, Cut Substitution, for technology mapping for LUT-based FPGAs to minimize the area under depth minimum constraint. The problem to minimize area under depth minimum costraint during technology mapping seems to be as difficult as NP-Hard class problem. Cut Substitution generates a local optimum solution by eliminating redundant LUTs while the depth is maintained. The experiments shows that the proposed method derives the solutions whose area are 9% smaller than those of DAOmap on average. |
PDF file |
Title | An Optimal Algorithm for Sizing Sequential Circuits for Industrial Library Based Designs |
Author | Sanghamitra Roy, Yu Hen Hu (Univ. of Wisconsin, Madison, United States), *Charlie Chung-Ping Chen, Shih-Pin Hung, Tse-Yu Chiang, Jiuan-Guei Tseng (Nat'l Taiwan Univ., Taiwan) |
Page | pp. 148 - 151 |
Keyword | Gate sizing, Sequential circuit, Clock skew, Feedback, Optimization |
Abstract | In this paper, we propose an optimal gate sizing and clock skew optimization algorithm for globally sizing synchronous sequential circuits. The number of constraints and variables in our formulation is linear with respect to the number of circuit components and hence our algorithm can efficiently find the optimal solution for industrial scale designs. To the best of our knowledge our method is the first exact gate sizing algorithm that can handle cyclic sequential circuits. Experimental results on industrial cell libraries demonstrate that our algorithm can yield an average of 12.6% improvement in the optimal clock period by combining clock skew optimization with gate sizing. For identical clock period, our algorithm can achieve an average of 11.3% area savings over a popular commercial synthesis tool. |
PDF file |
Title | Efficient Numerical Modeling of Random Rough Surface Effects for Interconnect Internal Impedance Extraction |
Author | *Quan Chen, Ngai Wong (The University of Hong Kong, Hong Kong) |
Page | pp. 152 - 157 |
Keyword | Rough Surface, Impedance, SIE Method, Interconnects |
Abstract | This paper proposes an efficient model for numerically evaluating the impact of random surface roughness on the internal impedance for large-scale interconnect structures. The effective resistivity (ER) and effective permeability (EP) are numerically formulated to avoid the computationally prohibitive global discretization, while maintaining the model accuracy and flexibility. A modified stochastic integral equation (SIE) method is proposed to significantly speed up the computation for the mean values of ER and EP under the assumption of random surface roughness. Numerical experiments then verify the efficacy of our approach. |
PDF file |
Title | Efficient Techniques for 3-D Impedance Extraction Using Mixed Boundary Element Method |
Author | *Fang Gong, Wenjian Yu, Zeyi Wang (Dept. of Computer Science, Tsinghua University, China), Zhiping Yu (Institute of Microelectronics, Tsinghua University, China), Changhao Yan (Fudan University, China) |
Page | pp. 158 - 163 |
Keyword | parasitic extraction, preconditioner, surface integral formulation, wide-band analysis, mixed boundary element method |
Abstract | In this paper, we describe the algorithms implemented in MBEM, a program for wideband impedance extraction of complicated 3-D structures. MBEM is based on a mixed boundary element method (BEM), which reduces the number of unknowns from about 7N in FastImp to 4N, for MQS analysis. Efficient techniques are proposed to handle the extra matrix multiplication, form post-process matrices, and solve the final linear equation system. The inaccuracy of calcu- lation using FastImp at low frequency is also analyzed, which shows the mixed BEM eliminates it completely. Experiments on several typical 3-D structures validate the advantage of MBEM over FastImp, on both accuracy and efficiency. |
PDF file |
Title | Generating Stable and Sparse Reluctance/Inductance Matrix under Insufficient Conditions |
Author | *Yuichi Tanji (Kagawa University, Japan), Takayuki Watanabe (The University of Shizuoka, Japan), Hideki Asai (Shizuoka University, Japan) |
Page | pp. 164 - 169 |
Keyword | Sparse, Inductance, Reluctance, Extraction |
Abstract | This paper presents generating stable and sparse reluctance/inductance matrix from the inductance matrix which is extracted under insufficient discretization. So far, to generate the sparse reluctance matrix with guaranteed stability, this matrix has to be diagonally dominant M matrix. Hence, the repeated inductance extractions are necessary using a smaller grid size, in order to obtain the well-defined matrix. Alternatively, this paper provides some ideas for generating the sparse reluctance matrix, even if the extracted reluctance matrix is not diagonally dominant M matrix, precisely, the positive off-diagonal elements are even found. This eases the extraction tasks greatly. Furthermore, the sparse inductance matrix is also generated by using the practical and sophisticated double inverse methods, which is useful for the SPICE simulation, since reluctance components are not still supported in SPICE-like simulators. |
PDF file |
Title | Hierarchical Krylov Subspace Reduced Order Modeling of Large RLC Circuits |
Author | Duo Li, *Sheldon X.-D. Tan (University of California, Riverside, United States) |
Page | pp. 170 - 175 |
Keyword | Model order reduction, interconnect |
Abstract | In this paper, we propose a new model order reduction approach for large interconnect circuits using hierarchical decomposition and Krylov subspace projection-based model order reduction. The new approach, called hiePrimor, first partitions a large interconnect circuit into a number of smaller subcircuits and then performs the projection-based model order reduction on each of subcircuits in isolation and on the top level circuit thereafter. The new approach can exploit the parallel computing to speed up the reduction process. Theoretically we show hiePrimor can have the same accuracy as the flat reduction method given the same reduction order and it can also preserves the passivity of the reduced models as well. We also show that partitioning is important for hierarchical projection-based reduction and the minimum-span objective should be required to archive best performance for hierarchical reduction. The proposed method is suitable for reducing large global interconnects like coupled bus, transmission lines, large clock nets in the post layout stage. Experimental results demonstrate that hiePrimor can be significantly faster than flat projection method like PRIMA and be order of magnitude faster than PRIMA with parallel computing without loss of accuracy. |
PDF file |
Title | Statistical Noise Margin Estimation for Sub-Threshold Combinational Circuits |
Author | *Yu Pu (Technische Universiteit Eindhoven, Netherlands), Jose Pineda de Gyvez (NXP Research Eindhoven, Netherlands), Henk Corporaal (Technische Universiteit Eindhoven, Netherlands), Yajun Ha (National University of Singapore, Singapore) |
Page | pp. 176 - 179 |
Keyword | subthreshold , reliability, noise margin |
Abstract | The increasingly popular sub-threshold design is strongly calling for EDA support to estimate noise margins, minimum functional supply voltage, as well as the functional yield. In this paper, we propose a fast, accurate and statistical approach to accomplish these goals. First, we derive close-form functions based on a new equivalent resistance model which enables the fast estimation of noise margins of individual cells at the gate-level. Second, we propose to calculate and propagate the noise margin information with an affine arithmetic model that takes into account process variations and correspondent inter-cell correlations. Experiments with ISCAS benchmarks have shown that the new approach has an accuracy of 98.5% w.r.t. transistor-level Monte Carlo simulations. The running time per input vector of the new approach only needs a few seconds, in contrast to the many hours required by transistor-level DC Monte-Carlo simulations. To the best of our knowledge, we are the first to provide a fast, accurate and statistical methodology other than Monte-Carlo simulation for the noise margin estimation of sub-threshold combinational circuits. |
PDF file |
Title | Symmetry-Aware Placement with Transitive Closure Graphs for Analog Layout Design |
Author | *Lihong Zhang (Memorial University of Newfoundland, Canada), C.-J. Richard Shi (University of Washington, United States), Yingtao Jiang (University of Nevada, United States) |
Page | pp. 180 - 185 |
Keyword | Analog integrated circuits, layout, symmetry, placement, transitive closure graph |
Abstract | A new scheme is proposed to use transitive closure graph (TCG) to explore the full symmetry solution space in analog layout design. We define a set of TCG symmetric-feasible conditions and show that it is extremely useful in reducing the solution space. A method is presented for generating random symmetric-feasible TCGs in O(n) time preserving the TCG closure property. Experimental results have confirmed the effectiveness of the proposed symmetry-aware TCG placement algorithm. |
PDF file |
Title | Constraint-Free Analog Placement with Topological Symmetry Structure |
Author | *Qing Dong, Shigetoshi Nakatake (Univ. of Kitakyushu, Japan) |
Page | pp. 186 - 191 |
Keyword | placement, analog layout, symmetry, regularity, sequence-pair |
Abstract | In analog circuits, blocks need to be placed symmetrically to satisfying the devices matching. Different from the existing constraint-driven approaches, the proposed topological symmetry structure enables us to generate a symmetrical placement without any constraint. Simulated annealing is utilized as the framework of the optimization, and we propose new move operation to keep the placement's topological symmetry. By inserting dummy blocks, we present a physical skewed symmetry structure allowing non-symmetry partly, so that to enhance the placement on area and wire length. Besides, we incorporate regularity into the evaluation of placement. Experiments showed that our approach generated topological complete symmetry placements without much compromise on chip area and wire length, compared to the placements with no symmetry. |
PDF file |
Title | TCG-Based Muli-Bend Bus Driven Floorplanning |
Author | Tilen Ma, *Evangeline F. Y. Young (The Chinese Univ. of Hong Kong, Hong Kong) |
Page | pp. 192 - 197 |
Keyword | Floorplanning, Algorithm, Bus Planning |
Abstract | In this paper, the problem of bus driven floorplanning is addressed. Given a set of modules and bus specifications, a floorplan solution including the bus routes will be generated with the floorplan area and total bus area minimized. Some previous works have addressed this problem with restricted bus shapes of 0-bend, 1-bend or 2-bend [1]. However, in this paper, we address this bus driven floorplanning without any limitations on the shapes of the buses. We solve this problem by a simulated annealing based floorplanner using the Transitive Closure Graph (TCG) representation. Experimental results show that we can improve over [1] significantly in terms of both run time and quality, since there are more flexibilities in routing the buses and complex shape validataion steps are not needed. For data sets with buses connecting a large number of blocks, our approach can still generate high quality solutions effectively, while the approach in [1] of restricting to 2-bend buses often cannot give any feasible solutions. |
PDF file |
Title | Large-Scale Fixed-Outline Floorplanning Design Using Convex Optimization Techniques |
Author | *Chaomin Luo, Miguel F. Anjos (Univ. of Waterloo, Canada), Anthony Vannelli (Univ. of Guelph, Canada) |
Page | pp. 198 - 203 |
Keyword | fixed-outline floorplanning, convex optimization, second-order cone programming, relative position matrix , wirelength minimization |
Abstract | Abstract — A two-stage optimization methodology is proposed to solve the fixed-outline floorplanning problem that is a global optimization problem for wirelength minimization. In the first stage, an attractor-repeller convex optimization model provides the relative positions of the modules on the floorplan. The second stage places and sizes the modules using second-order cone optimization. A Voronoi diagram is employed to obtain a planar graph and thus a relative position matrix to connect the two stages. Overlapfree and deadspace-free floorplans are achieved in a fixed outline and floorplans with any specified percentage of whitespace can be produced. Experimental results on GSRC benchmarks demonstrate that we obtain significant improvements on the best results known in the literature for these benchmarks. Most importantly, our methodology provides greater improvement over other floorplanners as the number of modules increases. |
PDF file |
Title | Bus-Aware Microarchitectural Floorplanning |
Author | Dae Hyun Kim, *Sung Kyu Lim (Georgia Institute of Technology, United States) |
Page | pp. 204 - 208 |
Keyword | floorplanning, bus |
Abstract | In this paper we present the first bus-aware microarchitectural floorplanning. Our goal is to study the impact of bus routability on other important floorplanning objectives including area, performance, power, and thermal. We developed a fast performance-aware bus routing algorithm, which is integrated into the floorplanning engine to ensure routability while optimizing other conflicting objectives. Our related experiments performed on high performance processors show that we obtain 100% routability at the cost of minimal increase on area, performance, and power objectives under thermal constraint. |
PDF file |
Title | LP Based White Space Redistribution for Thermal Via Planning and Performance Optimization in 3D ICs |
Author | *Xin Li, Yuchun Ma, Xianlong Hong, Sheqin Dong (Tsinghua Univ., China), Jason Cong (Univ. of California, Los Angeles, United States) |
Page | pp. 209 - 212 |
Keyword | 3D ICs, performance, thermal via, floorplanning |
Abstract | Thermal issue is a critical challenge in 3D IC circuit design. Incorporating thermal vias into 3D IC is a promising way to mitigate thermal issues by lowering down the thermal resistances between device layers. However, it is usually difficult to get enough space at target regions to insert thermal vias. In this paper, we propose a novel analytical algorithm to re-allocate white space for 3D ICs to facilitate via insertion. Experimental results show that after reallocating whitespaces, thermal vias and total wirelength could be reduced by 14% and by 2%, respectively. It also shows that whitespace distribution with via planning alone will degrade performance by 9% while performance-aware via planning method can reduce thermal via number by 60% and the performance is kept nearly unchanged. |
PDF file |
Title | (Invited Paper) Predictive Models and CAD Methodology for Pattern Dependent Variability |
Author | *Nishath Verghese, Richard Rouse, Philippe Hurat (Cadence Design Systems, United States) |
Page | pp. 213 - 218 |
Abstract | Lithography, etch and stress are dominant effects impacting the functionality and performance of designs at 65nm and below. This paper discusses pattern dependent variability caused by these effects and discusses a modelbased approach to extracting this variability. A methodology to gauge the extent of this pattern dependent variability for standard cells is presented by looking at the difference in transistor parameters when the cell is analyze in different contexts. A full-chip methodology that addresses the delay change due to systematic varation has been introduced to analyze and repair a 65nm digital design. |
PDF file |
Title | (Invited Paper) Technology Modeling and Characterization Beyond the 45nm Node |
Author | *Sani R. Nassif (IBM, United States) |
Page | p. 219 |
PDF file |
Title | (Invited Paper) Synergistic Physical Synthesis for Manufacturability and Variability in 45nm Designs and Beyond |
Author | *David Z. Pan, Minsik Cho (University of Texas, Austin, United States) |
Page | pp. 220 - 225 |
Abstract | Nanometer IC designs are increasingly challenged by manufacturing closure, i.e., being fabricated with high product yield, mainly due to aggressive technology scaling and increasing process/environmental variations. Realizing the criticality of addressing manufacturability for higher yield and tolerance to variations during design, there has been a surge of research activities recently from both academia and industry. In this paper, we will survey the key activities in synergistic physical synthesis and shed lights on some of the future research directions. |
PDF file |
Title | MaizeRouter: Engineering an Effective Global Router |
Author | *Michael D. Moffitt (IBM Austin Research Lab, United States) |
Page | pp. 226 - 231 |
Keyword | global routing |
Abstract | In this paper, we present MaizeRouter, winner of the inaugural 2007 Global Routing Contest. MaizeRouter reflects a significant leap in progress over existing publicly-available tools, and draws upon simple yet powerful edge-based operations (including extreme edge shifting, a technique aimed at congestion reduction, and edge retraction, a counterpart to extreme edge shifting that reduces unnecessary wirelength). These algorithmic contributions are built upon a framework of interdependent net decomposition, and permit a broad search space that previous algorithms have been unable to achieve. |
PDF file |
Title | A New Global Router for Modern Designs |
Author | *Jhih-Rong Gao, Pei-Ci Wu (Synopsys, Taiwan), Ting-Chi Wang (Nat'l Tsing Hua Univ., Taiwan) |
Page | pp. 232 - 237 |
Keyword | Global Routing |
Abstract | In this paper, we present a new global router, NTHU-Route, for modern designs. NTHU-Route is based on iterative rip-ups and reroutes, and several techniques are proposed to enhance our global router. These techniques include (1) a history based cost function which helps to distribute overflow during iterative rip-ups and reroutes, (2) an adaptive multi-source multi-sink maze routing method to improve the wirelength of maze routing, (3) a congested region identification method to specify the order for nets to be ripped up and rerouted, and (4) a refinement process to further reduce overflow when iterative history based rip-ups and reroutes reach bottleneck. Compared with two state-of-the-art works on ISPD98 benchmarks, NTHU-Route outperforms them in both overflow and wirelength. For the much larger designs from the ISPD07 benchmark suite, our solution quality is better than or comparable to the best results reported in the ISPD07 routing contest. |
PDF file |
Title | Routability Driven Modification Method of Monotonic Via Assignment for 2-Layer Ball Grid Array Packages |
Author | *Yoichi Tomioka, Atsushi Takahashi (Tokyo Inst. of Tech., Japan) |
Page | pp. 238 - 243 |
Keyword | ball grid array, monotonic, package, routing |
Abstract | Ball Grid Array packages in which I/O pins are arranged in a grid array pattern realize a number of connections between chips and a printed circuit board, but it takes much time in manual routing. We propose a fast routing method for 2-layer Ball Grid Array packages to support designers. Our method distributes wires evenly on top layer and increases completion ratio of nets by improving via assignment iteratively. |
PDF file |
Title | Ordered Escape Routing Based on Boolean Satisfiability |
Author | Lijuan Luo, *Martin D.F. Wong (University of Illinois at Urbana-Champaign, United States) |
Page | pp. 244 - 249 |
Keyword | escape routing, Boolean satisfiability |
Abstract | Routing for high-speed boards is largely a time-consuming manual task today. In this paper we consider the ordered escape routing problem which is a key problem in board-level routing. All existing approaches to this problem cannot guarantee to find a routing solution even if one exists. We present in this paper an algorithm to exactly solve this problem based on Boolean satisfiability. Experimental results on escape routing problems from industry show that our algorithm performs well. |
PDF file |
Title | MeshWorks: An Efficient Framework for Planning, Synthesis and Optimization of Clock Mesh Networks |
Author | *Anand Rajaram, David Z. Pan (University of Texas at Austin, United States) |
Page | pp. 250 - 257 |
Keyword | Clock, Mesh, CTS |
Abstract | A leaf-level clock mesh is known to be very tolerant to variations [1]. However, its use is limited to a few high-end designs because of the high power/resource requirements and lack of automatic mesh synthesis tools [2]. Most existing works on clock mesh [1], [3]–[7]either deal with semi-custom design or perform optimizations on a given clock mesh. However, the problem of obtaining a good initial clock mesh has not been addressed. Similarly, the problem of achieving a smooth tradeoff between skew and power/resources has not been addressed adequately. In this work, we present MeshWorks, the first comprehensive automated framework for planning, synthesis and optimization of clock mesh networks with the objective of addressing the above issues. Experimental results suggest that our algorithms can achieve an additional reduction of 26% in buffer area, 19% in wirelength and 18% in power, compared to the recent work of [7] with similar worst case maximum frequency under variation. |
PDF file |
Title | Interconnect Modeling for Improved System-Level Design Optimization |
Author | Luca Carloni (Columbia University, United States), Andrew B. Kahng, Swamy Muddu (University of California, San Diego, United States), Alessandro Pinto (University of California, Berkeley, United States), *Kambiz Samadi, Puneet Sharma (University of California, San Diego, United States) |
Page | pp. 258 - 264 |
Keyword | System-Level, Network-on-Chip, Interconnect Delay, Modeling |
Abstract | Accurate modeling of delay, power, and area of interconnections early in the design phase is crucial for efficient system-level optimization. Models presently used in system-level optimizations, such as network-on-chip (NoC) synthesis are inaccurate in the presence of deep-submicron effects. In this paper, we propose new, highly accurate models for delay and power in buffered interconnects; these models are usable by system-level designers for existing and future technologies. We present a general and transferable methodology to construct our models from a wide variety of reliable sources (Liberty, LEF/ITF, ITRS, PTM, etc.). The modeling infrastructure, and a number of characterized technologies, are available as open-source. Our models comprehend key interconnect circuit and layout design styles, and a power-efficient buffering technique that overcomes unrealities of previous delay-driven buffering techniques. We show that our models are significantly more accurate than previous models for global and intermediate buffered interconnects in 90nm and 65nm foundry processes - essentially matching signoff analyses. We also integrate our models in an automatic NoC topology synthesis tool and show that the more accurate modeling signicantly affects optimal/achievable architectures that are synthesized by the tool. The increased accuracy afforded by our models enables system-level designers to obtain better assessments of the achievable performance/power/area tradeoffs for (communication-centric aspects of) system design, with negligible setup and overhead burdens. |
PDF file |
Title | NoCOUT : NoC Topology Generation with Mixed Packet-Switched and Point-to-Point Networks |
Author | Jeremy Chan, *Sri Parameswaran (Univ. of New South Wales, Australia) |
Page | pp. 265 - 270 |
Keyword | NoC, Topology, Generation |
Abstract | Networks-on-Chip (NoC) have been widely proposed as the future communication paradigm for use in next-generation System-on-Chip. In this paper, we present NoCOUT, a methodology for generating an energy optimized application specific NoC topology which supports both point-to-point and packet-switched networks. The algorithm uses a prohibitive greedy iterative improvement strategy to explore the design space efficiently. A system-level floorplanner is used to evaluate the iterative design improvements and provide feedback on the effects of the topology on wire length. The algorithm is integrated within a NoC synthesis framework with characterized NoC power and area models to allow accurate exploration for a NoC router library. We apply the topology generation algorithm to several test cases including real-world and synthetic communication graphs with both regular and irregular traffic patterns, and varying core sizes. Since the method is iterative, it is possible to start with a known design to search for improvements. Experimental results show that many different applications benefit from a mix of "on chip networks" and "point-to-point networks". With such a hybrid network, we achieve approximately 25% lower energy consumption (with a maximum of 37\%) than a state of the art min-cut partition based topology generator for a variety of benchmarks. In addition, the average hop count is reduced by 0.75 hops, which would significantly reduce the network latency. |
PDF file |
Title | Automatic Generation of Hardware dependent Software for MPSoCs from Abstract System Specifications |
Author | *Gunar Schirner, Andreas Gerstlauer, Rainer Dömer (University of California, Irvine, United States) |
Page | pp. 271 - 276 |
Keyword | software synthesis, Hardware dependent Software, TLM, system level design |
Abstract | Increasing software content in embedded systems and SoCs drives the demand to automatically synthesize software binaries from abstract models. This is especially critical for Hardware dependent Software (HdS) due to the tight coupling. In this paper, we present our approach to automatically synthesize HdS from an abstract system model. We synthesize driver code, interrupt handlers and startup code. We furthermore automatically adjust the application to use RTOS services. We target traditional RTOS-based multi-tasking solutions, as well as a pure interrupt-based implementation (without any RTOS). Our experimental results show the automatic generation of final binary images for six real-life target applications and demonstrate significant productivity gains due to automation. Our HdS synthesis is an enabler for efficient MPSoC development and rapid design space exploration. |
PDF file |
Title | Application-Specific Network-on-Chip Architecture Synthesis Based on Set Partitions and Steiner Trees |
Author | *Shan Yan, Bill Lin (University of California, San Diego, United States) |
Page | pp. 277 - 282 |
Keyword | Network-on-Chip, communication architecture synthesis, custom topology synthesis, Rectilinear Steiner Tree |
Abstract | This paper considers the problem of synthesizing application-specific Network-on-Chip (NoC) architectures. We propose two heuristic algorithms called CLUSTER and DECOMPOSE that can systematically examine different set partitions of communication flows, and we propose Rectilinear-Steiner-Tree(RST) based algorithms for generating an efficient network topology for each group in the partition. Different evaluation functions in fitting with the implementation backend and the corresponding implementation technology can be incorporated into our solution framework to evaluate the implementation cost of the set partitions and RST topologies generated. In particular, we experimented with an implementation cost model based on the power consumption parameters of a 70nm process technology where leakage power is a major source of energy consumption. Experimental results on a variety of NoC benchmarks showed that our synthesis results can on average achieve a 6.92× reduction in power consumption over the best standard mesh implementation. To further gauge the effectiveness of our heuristic algorithms, we also implemented an exact algorithm that enumerates all distinct set partitions. For the benchmarks where exact results could be obtained, our CLUSTER and DECOMPOSE algorithms on average can achieve results within 1% and 2% of exact results, with execution times all under 1 second whereas the exact algorithms took as much as 4.5 hours. |
PDF file |
Title | (Invited Paper) Floating-Point Reconfiguration Array Processor for 3D Graphics Physics Engine |
Author | *Hoonmo Yang (Core Logic, Republic of Korea) |
Page | p. 283 |
PDF file |
Title | (Invited Paper) Super-K: A SoC for Single-chip Ultra Mobile Computer |
Author | *Xu Cheng (Peking University, China) |
Page | p. 284 |
PDF file |
Title | (Panel Discussion) The Tears and Joy of Sowing and Reaping Complex SoC's |
Author | Moderator: Ing-Jer Huang (Nat'l Sun Yat-Sen Univ., Taiwan), Panelists: Youn-Long Lin (Nat'l Tsing Hua Univ./Global UniChip, Taiwan), Hoonmo Yang (Core Logic, Republic of Korea), Toshihiro Hattori (Renesas Technology, Japan), Ahmed Jarraya (CEA-LETI, MINATEC, France), Xu Chen (Peking Univ., China) |
Wednesday, January 23, 2008 |
Title | (Keynote Address) The Evolution of SoC Platform According to the New Mobile Paradigm |
Author | Ki-Soo Hwang (Core Logic, Republic of Korea) |
Page | p. 285 |
PDF file |
Title | Statistical Gate Delay Model for Multiple Input Switching |
Author | *Takayuki Fukuoka, Akira Tsuchiya, Hidetoshi Onodera (Kyoto University, Japan) |
Page | pp. 286 - 291 |
Keyword | Statistical timing, Multiple input switching, Process variation |
Abstract | In this paper, we propose a calculation method of gate delay for SSTA (Statistical Static Timing Analysis) considering MIS (Multiple Input Switching). Most SSTA approaches assume a single input switching model and ignore the effect of MIS on gate delay. MIS occurs when multiple inputs of a gate switch nearly simultaneously. Thus, ignoring MIS causes error in MAX operation in SSTA. We propose a statistical gate delay model considering MIS. We verify the proposed method by SPICE based Monte Carlo simulations and experimental results show that the proposed method improves the error due to ignoring MIS. |
PDF file |
Title | Non-Gaussian Statistical Timing Models of Die-to-Die and Within-Die Parameter Variations for Full Chip Analysis |
Author | *Katsumi Homma, Izumi Nitta, Toshiyuki Shibuya (Fujitsu Labs., Japan) |
Page | pp. 292 - 297 |
Keyword | Statistical Timing Analysis, die-to-die variations, within-die variations |
Abstract | Statistical Timing Analysis (SSTA) is a method that calculates circuit delay statistically with process parameter variations, die-to-die (D2D) and within-die (WID) variations. In this paper, we model that WID parameter variations are for each cell and line in a chip and D2D variations are governed by one variation on a chip. We propose a new method of computing a full chip delay distribution considering both D2D and WID parameter variations. Experimental results show that the proposed method is more accurate than previous methods on actual chip designs. |
PDF file |
Title | Non-Gaussian Statistical Timing Analysis Using Second-Order Polynomial Fitting |
Author | Lerong Cheng (Univ. of California, Los Angeles, United States), *Jinjun Xiong (IBM, United States), Lei He (Univ. of California, Los Angeles, United States) |
Page | pp. 298 - 303 |
Keyword | Timing, Statistical |
Abstract | In the nanometer manufacturing region, process variation causes significant uncertainty for circuit performance verification. Statistical static timing analysis (SSTA) is thus developed to estimate timing distribution under process variation. However, most of the existing SSTA techniques have difficulty in handling the non-Gaussian variation distribution and non-linear dependency of delay on variation sources. To solve such a problem, in this paper, we first propose a new method to approximate the max operation of two non-Gaussian random variables through second-order polynomial fitting. We then present new non-Gaussian SSTA algorithms under two types of variational delay models: quadratic model and semi-quadratic model (i.e., quadratic model without crossing terms). All atomic operations (such as max and sum) of our algorithms are performed by closed-form formulas, hence they scale well for large designs. Experimental results show that compared to the Monte-Carlo simulation, our approach predicts the mean, standard deviation, and skewness within 1%, 1%, and 5% error, respectively. Our approach is more accurate and also 20x faster than the most recent method for non-Gaussian and nonlinear SSTA. |
PDF file |
Title | A Capacitive Boosted Buffer Technique for High-Speed Process-Variation-Tolerant Interconnect in UDVS Application |
Author | Saihua Lin, *Yu Wang, Rong Luo, Huazhong Yang (Tsinghua Univ., China) |
Page | pp. 304 - 309 |
Keyword | interconnect, buffer, process variation |
Abstract | In this paper, we propose a new capacitive boosted buffer technique that can be used in high speed interconnect for ultra-dynamic voltage scaling (UDVS) application with the process variation effect mitigated. The circuit is simple and fully compatible with digital CMOS technology. Implemented in a standard 0.18 µm CMOS technology, the circuit is shown applicable for both sub-threshold circuit and above threshold circuit without the problem of short current. Simulation results demonstrate the conclusion that the proposed new buffer is more robust to load, process, voltage, and temperature (PVT) variations. When applied to a simple H-tree clock network, the proposed buffer can reduce the skew by 5.5Õ when compared to that of the traditional buffer. |
PDF file |
Title | Static Timing: Back to Our Roots |
Author | Ruiming Chen, Lizheng Zhang, Vladimir Zolotov, Chandu Visweswariah, *Jinjun Xiong (IBM, United States) |
Page | pp. 310 - 315 |
Keyword | Statistical Timing Methodology, pessimism reduction, spatial correlation modeling , Incremental Timing |
Abstract | Existing static timing methodologies apply various techniques to address increasingly larger process variations. The techniques include multi-corner timing, on-chip variation (OCV) derating coefficients, and path-based common path pessimism removal (CPPR) procedures. These techniques, however, destroy the benefits of linear run-time and incrementality possessed by classical static timing. The major contribution of this work is an efficient statistical timing methodology with comprehensive modeling of process variations, while at the same time retaining those key benefits. Our methodology is compatible with existing characterization methods and scales well to large chip designs. To achieve this goal, three techniques are developed: (1) building the statistical delay model based on existing multi-corner library characterization; (2) modeling spatial correlation in a scalable manner; and (3) avoiding the time-consuming CPPR procedure by removing common path pessimism in the clock network by an incremental block-based technique. Experimental results on industrial 90 nm ASIC designs show that the proposed timing methodology correctly handles all types of process variation, achieves high correlation with traditional multi-corner timing with more than 4 x speedup, and is a vehicle for pessimism reduction. |
PDF file |
Title | Synthesis and Design of Parameter Extractors for Low-Power Pre-computation-Based Content-Addressable Memory Using Gate-Block Selection Algorithm |
Author | *Jui-Yuan Hsieh, Shanq-Jang Ruan (National Taiwan University of Science and Technology, Taiwan) |
Page | pp. 316 - 321 |
Keyword | CAM, low-power, pre-computation, gate-block selection algorithm, synthesis |
Abstract | Content addressable memory (CAM) is frequently used in applications, such as lookup tables, databases, associative computing, and networking, that require high-speed searches due to its ability to improve application performance by using parallel comparison to reduce search time. Although the use of parallel comparison results in fast search time, it also significantly increases power consumption. In this paper, we propose a gate-block selection algorithm, which can synthesize a proper parameter extractor of the pre-computation-based CAM (PB-CAM) to improve the efficiency for specific applications such as embedded systems. Through experimental results, we found that our approach effectively reduces the number of comparison operations for specific data types (ranging from 19.24% to 27.42%) compared with the 1's count approach. We used Synopsys Nanosim to estimate the power consumption in TSMC 0.35um CMOS process. Compared to the 1's count PB-CAM, our proposed PB-CAM achieves 17.72% to 21.09% in power reduction for specific data types. |
PDF file |
Title | Block Cache for Embedded Systems |
Author | *Dominic Hillenbrand, Jörg Henkel (University of Karlsruhe (TH), Germany) |
Page | pp. 322 - 327 |
Keyword | cache, on chip memory, embedded systems, system on chip, memory bandwidth |
Abstract | We present a new method to automatically use on chip memory for code blocks of instructions which are dynamically scheduled at runtime to increase performance and reduce power consumption which we call block caches. Block caches can already outperform instruction caches of the same size. We provide initial data and insights into the automated use of block caches and their respective on- and offline phases. |
PDF file |
Title | A Compiler-in-the-Loop Framework to Explore Horizontally Partitioned Cache Architectures |
Author | *Aviral Shrivastava (Arizona State University, United States), Ilya Issenin, Nikil Dutt (University of California, Irvine, United States) |
Page | pp. 328 - 333 |
Keyword | embedded, compiler, processor, cache, energy |
Abstract | Horizontally Partitioned Caches (HPCs) are a promising architectural feature to reduce the energy consumption of the memory subsystem. However, the energy reduction obtained using HPC architectures is very sensitive to the HPC parameters. Therefore it is very important to explore the HPC design space and carefuly choose the HPC parameters that result in minimum energy consumption for the application. However, since in HPC architectures, the compiler has a significant impact on the energy consumption of the memory subsystem, it is extremely important to include compiler while deciding the HPC design parameters. While there has been no previous apporaches to HPC design exploration, existing cache design space exploration methodologies do not include the compiler effectsduring DSE. In this paper, we present a Compiler-inthe- Loop (CIL) Design Space Exploration (DSE) methodology to explore and decide the HPC design parameters. Our experimental results on HP iPAQ h4300-like memory subsystem running benchmarks from the MiBench suite demonstrate that CIL DSE can discover HPC configurations with up to 80% lesser energy consumption than the HPC configuration in the iPAQ. In contrast, tradiation simulation-only exploration can discover HPC design parameters that result in only 57% memory subsystem energy reduction. Finally our hybrid CIL DSE heuristic saves 67% of the exploration time as compared to the exhaustive exploration, while providing maximum possible energy savings on our set of benchmarks. |
PDF file |
Title | Fast, Quasi-Optimal, and Pipelined Instruction-Set Extensions |
Author | *Ajay K. Verma, Philip Brisk, Paolo Ienne (EPFL, Switzerland) |
Page | pp. 334 - 339 |
Keyword | Instruction Set Extension, Integer Linear Programming |
Abstract | Nowadays many customised embedded processors offer the possibility of speeding up an application by implementing it using Application-Specific Functional units (AFUs). However, the AFUs must satisfy certain constraints in terms of read and write ports between AFU and processor register file. Due to these restrictions the size and complexity of AFUs remain small. However, in recent some work has been done on relaxing the register file port constraints by serialising register file access (i.e., by allowing multi cycle read and write). This makes the problem of selecting best AFU significantly more complex. Most previous approaches use a two staged process to solve this problem, i.e., first selecting AFUs under some higher I/O constraints and then serialise them under the actual register file port constraints. Not only these methods are complex but also lead to suboptimal solutions. In this paper we formulate the AFU selection problem as an Integer Linear Programming and solve it optimally. We show experimentally that our methodology produces significantly better results compared to state of art techniques. |
PDF file |
Title | Load Scheduling: Reducing Pressure on Distributed Register Files for Free |
Author | *Mei Wen, Nan Wu, Maolin Guan, Chunyuan Zhang (National University of Defense Technology, China) |
Page | pp. 340 - 345 |
Keyword | VLIW, distributed register files |
Abstract | In this paper we describe load scheduling, a novel method that balances load among register files by residual resources. Load scheduling can reduce register pressure for clustered VLIW processors with distributed register files while not increasing VLIW scheduling length. We have implemented load scheduling in compiler for Imagine and FT64 stream processors. The result shows that the proposed technique effectively reduces the number of variables spilled to memory, and can even eliminate it. The algorithm presented in this paper is extremely efficient in embedded processor with limited register resource because it can improve registers utilization instead of increasing the requirement for the number of registers. |
PDF file |
Title | DPlace2.0: A Stable and Efficient Analytical Placement Based on Diffusion |
Author | Tao Luo, *David Z. Pan (University of Texas at Austin, United States) |
Page | pp. 346 - 351 |
Keyword | Placement |
Abstract | Nowadays a placement problem often involves multi-million objects and excessive fixed blockages. We present a new global placement algorithm that scales well to the modern large-scale circuit placement problems. We simulate the natural diffusion process to spread cells smoothly over the placement region, and use both analytical and discrete techniques to improve the wire length. Although any analytical wire length technique can be used in our new framework, by using the quadratic wire length model, the hessian of our formulation is extremely sparse compared with conventional formulations, which brings 24x speed up on quadratic solver. We also propose a wire linearization technique that transform quadratic star model into HPWL exactly. The overall runtime of our tool is close to the fastest placement tool in existing literature and significantly better than others. And meanwhile, we obtain competitive wire length results to the best known ones. The average total wire length is 2.2\% higher than mPL6, 0.2\%, 3.1\%, and 9.1\% better than FastPlace3.0, APlace2.0, and Capo10.2 respectively. |
PDF file |
Title | Total Power Optimization Combining Placement, Sizing and Multi-Vt Through Slack Distribution Management |
Author | Tao Luo (Univ. of Texas, Austin, United States), David Newmark (Advanced Micro Devices, United States), *David Z. Pan (Univ. of Texas, Austin, United States) |
Page | pp. 352 - 357 |
Keyword | power, leakge, gate sizing, threshold voltage |
Abstract | Power dissipation is quickly becoming one of the most important limiters in nanometer IC design for leakage increases exponentially as the technology scaling down. However, power and timing are often conflicting objectives during optimization. In this paper, we propose a novel total power optimization flow under performance constraint. Instead of using placement, gate sizing, and multiple-Vt assignment techniques independently, we combine them together through the concept of slack distribution management to maximize the potential for power reduction. We propose to use the linear programming (LP) based placement and the geometric programming (GP) based gate sizing formulations to improve the slack distribution, which helps to maximize the total power reduction during the Vt-assignment stage. Our formulations include important practical design constraints, such as slew, noise and short circuit power, which were often ignored previously. We tested our algorithm on a set of industrial-strength manually optimized circuits from a multi-GHz 65nm microprocessor, and obtained very promising results. To our best knowledge, this is the first work that combines placement, gate sizing and Vt swapping systematically for total power (and in particular leakage) management. |
PDF file |
Title | An Innovative Steiner Tree Based Approach for Polygon Partitioning |
Author | Yongqiang Lu, *Qing Su, Jamil Kawa (Synopsys, United States) |
Page | pp. 358 - 363 |
Keyword | Minimal Steiner tree, Polygon partition, Minimal Partition tree |
Abstract | As device technology continues to scale past 65nm, the number of geometries added by the heavy application of resolution enhancement techniques (RET) continues to grow. This is a direct consequence of the 193nm lithography having to suffice for tighter geometries with every new node. As a result issues associated with mask data preparation (MDP) such as complexity, run time, and quality are growing in severity. As one major and core step in MDP, polygon partitioning converts the complex layout shapes into trapezoids suitable for mask writing. The partitioning run time and quality of the resulting polygon partitions directly impacts the cost, integrity, and quality of the written mask. In this work, we introduce an innovative approach to solve the polygon partition quality problem by constructing a variant Steiner minimal tree: minimal partition tree (MPT). We prove the equivalence between the MPT and the optimal polygon partition. Also, the search space for MPT is further reduced for the efficiency of the MPT algorithms. Finally, a generic MPT algorithm flow and a linear-time heuristic algorithm based on it are proposed. Experimental results show that this new approach and the associated proposed algorithm solve the polygon partitioning problems with very promising and high quality results. |
PDF file |
Title | An MILP-Based Wire Spreading Algorithm for PSM-Aware Layout Modification |
Author | *Ming-Chao Tsai, Yung-Chia Lin, Ting-Chi Wang (the Department of Computer Science, National Tsing Hua University, Taiwan) |
Page | pp. 364 - 369 |
Keyword | PSM, MILP, wire spreading, RET |
Abstract | Phase shifting mask (PSM) is a promising resolution enhancement technique, which is used in the deep sub-wavelength lithography of the VLSI fabrication process. However, applying the PSM technique requires the layout to be free of phase conflict. In this paper, we present an MILP-based layout modification algorithm which solves the phase conflict problem by wire spreading. Unlike existing layout modification methods which first solves the phase conflict problem by removing edges from the layout-associated conflict graphs and then tries to revise the layout to match the resultant conflict graphs, our algorithm simultaneously considers the phase conflict problem and the feasibility of modifying the layout. The experimental results indicate that without increasing the chip size, the phase conflict problem can be well tackled with minimal perturbation to the layout. |
PDF file |
Title | Low Power Clock Buffer Planning Methodology in F-D Placement for Large Scale Circuit Design |
Author | *Yanfeng Wang, Qiang Zhou, Yici Cai (Tsinghua University, China), Jiang Hu (Texas A&M University, United States), Xianlong Hong, Jinian Bian (Tsinghua University, China) |
Page | pp. 370 - 375 |
Keyword | low power, buffer planning, F-D placement |
Abstract | Traditionally, clock network layout is performed after cell placement. Such methodology is facing a serious problem in nanometer IC designs where people tend to use huge clock buffers for robustness against variations. That is, clock buffers are often placed far from ideal locations to avoid overlap with logic cells. As a result, both power dissipation and timing are degraded. In order to solve this problem, we propose a low power clock buffer planning methodology which is integrated with cell placement. A Bin-Divided Grouping algorithm is developed to construct virtual buffer tree, which can explicitly model the clock buffers in placement. The virtual buffer tree is dynamically updated during the placement to reflect the changes of latch locations. To reduce power dissipation, latch clamping is incorporated with the clock buffer planning. The experimental results show that our method can reduce clock power significantly by 21% on average. |
PDF file |
Title | Power Grid Analysis Benchmarks |
Author | *Sani R. Nassif (IBM, United States) |
Page | pp. 376 - 381 |
Keyword | Power Grid Analysis, Benchmarks |
Abstract | Benchmarks are an immensely useful tool in performing research since they allow for rapid and clear comparison between different approaches to solving CAD problems. Recent experience from the placement and routing areas suggests that the ready availability of realistic industrial-size benchmarks can energize research in a given area, and can even lead to significant breakthroughs. To this end, we are making a number of power grid analysis benchmarks available for the public. These are all drawn from real designs, and vary over a reasonable range of size and difficulty thereby making studies of algorithm complexity possible. This paper documents the format for the various benchmarks, and give details for their access. |
PDF file |
Title | (Invited Paper) In-band Mobile Digital TV Transmission Technology for Advanced Television Systems Committee |
Author | *Junehee Lee (Samsung Electronics, Republic of Korea) |
Page | p. 382 |
PDF file |
Title | (Invited Paper) In-Vehicle Vision Processors for Driver Assistance Systems |
Author | *Shorin Kyo, Shin’ichiro Okazaki (NEC Corp., Japan) |
Page | pp. 383 - 388 |
PDF file |
Title | (Invited Paper) Multi-Core DSP for Base Stations: Large and Small |
Author | *Doug Pulley (picoChip, Great Britain) |
Page | pp. 389 - 391 |
PDF file |
Title | (Invited Paper) 1-cc Computer Using UWB-IR for Wireless Sensor Network |
Author | *Tatsuo Nakagawa, Masayuki Miyazaki, Goichi Ono, Ryosuke Fujiwara, Takayasu Norimatsu, Takahide Terada (Hitachi, Japan), Akira Maeki, Yuji Ogata, Shinsuke Kobayashi, Noboru Koshizuka, Ken Sakamura (YRP Ubiquitous Networking Lab., Japan) |
Page | pp. 392 - 397 |
Abstract | An ultra-small, high-data-rate, low-power 1-cc computer (OCCC) with an UWB-IR (ultra-wideband impulse-radio) transceiver was developed for a wireless sensor network. Thanks to bear-chip implementation and a flexible printed circuit board, the size of the computer is only 1 cm3. To achieve 10-Mbps data rate, a middle-class 32-bit microcontroller, which has both a bus interface and a USB 2.0 controller, was selected. Low-power techniques, such as transition of microcontroller status to standby mode by using an external real-time clock during wait times, power shutdown of halted circuits, and detailed control of UWB-IR transceiver status, are applied. The effect of these low-power techniques is verified by measuring the time history of current consumption of the OCCC. It was confirmed that the OCCC can provide wireless communication at a transmission rate of 258 kbps over a distance of 30 m. |
PDF file |
Title | Verifying Full-Custom Multipliers by Boolean Equivalence Checking and an Arithmetic Bit Level Proof |
Author | *Udo Krautz, Markus Wedler, Wolfgang Kunz (University Kaiserslautern, Germany), Kai Weber, Christian Jacobi, Matthias Pflanz (IBM, Germany) |
Page | pp. 398 - 403 |
Keyword | formal verification |
Abstract | In this paper we describe a methodology to formally verify highly optimized multipliers. We define a multiplier description language which abstracts from low-level optimizations and which can model a wide range of common implementations at a structural and arithmetic level. The correctness of the created model is established by bit level transformations matching the model against a standard multiplication specification. The model is also translated into a gate netlist to be compared with the full-custom implementation of the multiplier by standard equivalence checking. |
PDF file |
Title | A Symbolic Approach for Mixed-Signal Model Checking |
Author | *Alexander Jesser, Lars Hedrich (University of Frankfurt a.M., Germany) |
Page | pp. 404 - 409 |
Keyword | Formal Verification, Model Checking, Mixed-Signal, multi terminal binary decision diagram |
Abstract | In this paper we firstly introduce a novel symbolic model checker MScheck for mixed-signal circuits. MScheck is capable to conflate the continuous behavior, typical for analog designs, and the discrete behavior in the digital domain for formal verification. Timing information of both systems will be symbolically stored within multi terminal binary decision diagrams (MTBDDs) for the entire verification procedure. The effectiveness of our approach is demonstrated on a phase locked loop (PLL) by formal verification of the locking property. |
PDF file |
Title | Faster Projection Based Methods for Circuit Level Verification |
Author | *Chao Yan, Mark Greenstreet (University of British Columbia, Canada) |
Page | pp. 410 - 415 |
Keyword | Formal Verification, Reachability Analysis, ODE |
Abstract | As VLSI fabrication technology progresses to 65nm feature sizes and smaller, transistors no longer operate as ideal switches. This motivates the verification of digital circuits using continuous models. Recently, we showed how such verification can be performed using projection based methods.However, the verification was slow, requiring nearly four CPU days to verify a nine-transistor toggle flip-flop. Here, we describe improvements to the reachability algorithms and optimizations of the software architecture. These produce a 15x reduction in computation time and significant reductions in the over-approximation errors. With these changes, the same toggle flip-flop can be verified in a few hours, making formal verification a viable alternative to circuit simulation. |
PDF file |
Title | A Debug Probe for Concurrently Debugging Multiple Embedded Cores and Inter-Core Transactions in NoC-Based Systems |
Author | Shan Tang, *Qiang Xu (The Chinese University of Hong Kong, Hong Kong) |
Page | pp. 416 - 421 |
Keyword | Post-silicon validation, Debug probe, Transaction, NoC |
Abstract | Existing SoC debug techniques mainly target bus-based systems. They are not readily applicable to the emerging system that use Network-on-Chip (NoC) as on-chip communication scheme. In this paper, we present the detailed design of a novel debug probe (DP) inserted between the core under debug (CUD) and the NoC. With embedded configurable triggers, delay control and timestamping mechanism, the proposed DP is very effective for inter-core transaction analysis as well as controlling embedded cores' debug processes. Experimental results show the functionalities of the proposed DP and its area overhead. |
PDF file |
Title | A Fast Two-Pass HDL Simulation with On-Demand Dump |
Author | *Kyuho Shim (Pusan Nat'l Univ., Republic of Korea), Youngrae Cho, Namdo Kim (Samsung Electronics, Republic of Korea), Hyuncheol Baik, Kyungkuk Kim, Dusung Kim (Pusan Nat'l Univ., Republic of Korea), Jaebum Kim, Byeongun Min, Kyumyung Choi (Samsung Electronics, Republic of Korea), Maciej Ciesielski (Logic-Mill Technology LLC, United States), Seiyang Yang (Pusan Nat'l Univ., Republic of Korea) |
Page | pp. 422 - 427 |
Keyword | Simulation |
Abstract | Simulation-based functional verification is characterized by two inherently conflicting targets: the signal visibility and simulation performance. Achieving a proper trade-off between these two targets is of paramount importance. Even though HDL simulators are the most widely used verification platform at the RTL and gate level, their major drawback is the low performance in verifying complex SOCs, especially when the high visibility over the design under verification is required. This paper presents a new, fast simulation method as an effective way to achieve both high simulation speed and full signal visibility. It is based on an original two-pass simulation approach. During the 1st pass, with the simulation running at full speed, a set of design states is saved periodically at predetermined checkpoints. During the 2nd pass, another simulation is performed, using any of saved checkpoints and providing 100% signal visibility for debugging. Our method differs from the traditional simulation snapshot approach in the amount and the way the design state is saved. Experimental results show significant speed-up compared to existing traditional simulation methods while maintaining 100% visibility. |
PDF file |
Title | Hybrid Solid-State Disks: Combining Heterogeneous NAND Flash in Large SSDs |
Author | *Li-Pin Chang (National Chiao Tung University, Taiwan) |
Page | pp. 428 - 433 |
Keyword | flash memory, storage system, file system, embedded system |
Abstract | This paper presents a hybrid approach to large SSDs. The idea is to complement the drawbacks of SLC flash and MLC flash with each other's advantages. The technical issues of the design of a hybrid SSD pertain to data placement and wear leveling over heterogeneous NAND flash. Our experimental results show that a hybrid SSD improves over a conventional SSD by 4.85 times in terms of average response. The average throughput and energy consumption are improved by 17% and 14%, respectively. |
PDF file |
Title | Enabling Run-Time Memory Data Transfer Optimizations at the System Level with Automated Extraction of Embedded Software Metadata Information |
Author | *Alexandros Bartzas (Democritus University of Thrace, Greece), Miguel Peon-Quiros (Universidad Complutense de Madrid, Spain), Stylianos Mamagkakis, Francky Catthoor (IMEC vzw, Belgium), Dimitrios Soudris (Democritus University of Thrace, Greece), Jose Manuel Mendias (Universidad Complutense de Madrid, Spain) |
Page | pp. 434 - 439 |
Keyword | metadata, embedded systems, DMA, profiling, dynamic data |
Abstract | The information about the run-time behavior of software applications is crucial for enabling system level optimizations for embedded systems. This embedded software Metadata information is especially important today, because several complex multi-threaded applications are mapped on the memory of a single embedded system. Each thread is triggered at run-time by different input events that can not be predicted at design-time. New methods and tools are needed to automatically profile and analyze the dynamic data access behavior of simultaneously executing threads in order to enable memory data transfer optimizations. In this paper, we propose such a method and tool which extract the necessary software Metadata information to enable these data transfer optimizations at the system level. We assess the effectiveness of our approach with the results for 5 real-life software applications using 7 real-life run-time input traces. |
PDF file |
Title | Automatic Re-Coding of Reference Code into Structured and Analyzable SoC Models |
Author | Pramod Chandraiah, *Rainer Dömer (University of California, Irvine, United States) |
Page | pp. 440 - 445 |
Keyword | Specification Modeling, Structural hierarchy, System Level Design Languages, Code transformations, Architectural Exploration |
Abstract | The quality of the input system model has a direct bearing on the effectiveness of the system exploration and synthesis tools. Given a well-structured system model, tools today are effective in generating efficient implementations. However, readily available reference C codes are not conducive for system synthesis as they lack the necessary structure and analyzability needed by the design flow. Usually reference C code is manually converted into a SoC model by applying necessary transformations. The type of transformations depends on the underlying design flow and tools. Proper structural hierarchy is one essential feature needed for architectural exploration. In this paper, we provide automatic C code transformations to encapsulate functions and insert structural hierarchy to create well-structured and analyzable SoC models. Our automatic transformations, combined with interactive application of the designer's knowledge and experience, enable faster creation of structural hierarchy in C models and hence result in significant reduction of the overall design time. |
PDF file |
Title | Action Coverage Formulation for Power Optimization in Body Sensor Networks |
Author | Hassan Ghasemzadeh, *Eric Guenterberg, Katherine Gilani, Roozbeh Jafari (University of Texas, Dallas, United States) |
Page | pp. 446 - 451 |
Keyword | Body Sensor Networks, Wearable Embedded Systems, Physical Movement Monitoring, Power Optimization, Classification |
Abstract | Advances in technology have led to the development of various light-weight sensory devices that can be woven into the physical environment of our daily lives. Such systems enable on-body and mobile health-care monitoring. Our interest particularly lies in the area of movement monitoring platforms that operate with inertial sensors. In this paper, we propose a power optimization technique that will consider the sensing coverage problem from a collaborative signal processing perspective. We introduce compatibility graphs and describe how they can be utilized for power optimization. The problem we outline can be transformed into an NP-hard problem. Therefore, we propose an ILP formulation to attain a lower bound on the solution and a fast greedy technique. Along side this, we introduce a system for dynamically activating and deactivating sensor nodes in real-time. Finally, we elucidate the effectiveness of our techniques on data collected from several subjects. |
PDF file |
Title | Dynamic Scheduling of Imprecise-Computation Tasks in Maximizing QoS under Energy Constraints for Embedded Systems |
Author | *Heng Yu, Bharadwaj Veeravalli, Yajun Ha (National University of Singapore, Singapore) |
Page | pp. 452 - 455 |
Keyword | RT Embedded Systems, Scheduling, Imprecise-Computation, DVS |
Abstract | In designing energy-aware CPU scheduling algorithms for real-time embedded systems, dynamic slack reclamation techniques significantly improve system Quality-of-Service (QoS) and energy efficiency. However, the limited schemes in this domain either demand high complexity or can only achieve limited QoS. In this paper, we present a novel low complexity runtime scheduling algorithm for the Imprecise Computation (IC) modeled tasks. The target is to maximize system QoS under energy constraints. Our proposed algorithm, named Gradient Curve Shifting (GCS), is able to decide the best allocation of slack cycles arising at runtime, with very low complexity. We study both linear and concave QoS functions associated with IC modelde tasks, on non-DVS and DVS processors. Furthermore, we apply the intra-task DVS technique to tasks and achieve as large as 18% more of the system QoS compared to the conventional “optimal” solution which is inter-task DVS based. |
PDF file |
Title | Architecture-level Thermal Behavioral Characterization For Multi-Core Microprocessors |
Author | Duo Li, *Sheldon X.-D. Tan (Univ. of California, Riverside, United States), Murli Tirumala (Intel, United States) |
Page | pp. 456 - 461 |
Keyword | Thermal, Behavioral modeling, Multi-core |
Abstract | In this paper, we investigate a new architecture-level thermal characterization problem from behavioral modeling perspective to address the emerging thermal related analysis and optimization problems for high-performance multi-core microprocessor design. We propose a new approach, called ThermPOF, to build the thermal behavioral models from the measured architecture thermal and power information. ThermPOF first builds the behavioral thermal model using generalized pencil-of-function (GPOF) method. And then to effectively model transient temperature changes, we proposed two new schemes to improve the GPOF. First we apply logarithmic-scale sampling instead of traditional linear sampling to better capture the temperature changing characteristics. Second, we modify the extracted thermal impulse response such that the extracted poles from GPOF are guaranteed to be stable without accuracy loss. To further reduce the model size, Krylov subspace based model order reduction is performed to reduce the order of the models in the state-space form. Experimental results on a practical quad-core microprocessor show that generated thermal behavioral models match the measured data very well. |
PDF file |
Title | Full-Chip Thermal Analysis for the Early Design Stage via Generalized Integral Transforms |
Author | *Pei-Yu Huang, Chih-Kang Lin, Yu-Min Lee (National Chiao Tung University, Taiwan) |
Page | pp. 462 - 467 |
Keyword | Thermal analysis, generalized integral transforms |
Abstract | The capability of predicting the temperature profile is critically important for circuit timing estimation, leakage reduction, power estimation, hotspot avoidance, and reliability concerns during modern IC designs. This paper presents an accurate and fast analytical full-chip thermal simulator for the early-stage temperature-aware chip design. By using the technique of generalized integral transforms (GIT), our proposed method can accurately estimate the temperature distribution of full-chip with very small truncation points of bases in the spatial domain. We also develop a fast Fourier transform (FFT) like evaluating algorithm to efficiently evaluate the temperature distribution. Experimental results confirm that our GIT based analyzer can achieve an order of magnitude speedup compared with a highly efficient Green's function based method. |
PDF file |
Title | A Stochastic Local Hot Spot Alerting Technique |
Author | Hwisung Jung, *Massoud Pedram (University of Southern California, United States) |
Page | pp. 468 - 473 |
Keyword | hot spot, Markov decision process, Kalman filter, thermal alert |
Abstract | With the increasing levels of variability in the behavior of manufactured nano-scale devices and dramatic changes in the power density on a chip, timely identification of hot spots on a chip has become a challenging task. This paper addresses the questions of how and when to identify and issue a hot spot alert. There are important questions since temperature reports by thermal sensors may be erroneous, noisy, or arrive too late to enable effective application of thermal management mechanisms to avoid chip failure. This paper thus presents a stochastic technique for identifying and reporting local hot spots under probabilistic conditions induced by uncertainty in the chip junction temperature and the system power state. More specifically, it introduces a stochastic framework for estimating the chip temperature and the power state of the system based on a combination of Kalman Filtering (KF) and Markovian Decision Process (MDP) model. Experimental results demonstrate the effectiveness of the framework and show that the proposed technique alerts about thermal threats accurately and in a timely fashion in spite of noisy or sometimes erroneous readings by the temperature sensor. |
PDF file |
Title | Design Rule Optimization of Regular layout for Leakage Reduction in Nanoscale Design |
Author | Anupama R. Subramaniam (Arizona State University, United States), Ritu Singhal, *Chi-Chao Wang, Yu Cao (Arizona State University, United States) |
Page | pp. 474 - 479 |
Keyword | Design Rule, Optimization, RDR, NRG leakage, manufacturability |
Abstract | The effect of non-rectilinear gate (NRG) due to sub-wavelength lithograph dramatically increases the leakage current by more than 15X. To mitigate this penalty, we have developed a systematic procedure to optimize key layout parameters in regular layout with minimum area and speed overhead. As demonstrated in 65nm technology, the optimization of regular layout achieves more than 70% reduction in leakage under NRG, with area penalty of ~10% and marginal impact on circuit speed and active power. |
PDF file |
Title | Investigation of Diffusion Rounding for Post-Lithography Analysis |
Author | Puneet Gupta (Univ. of California, Los Angles, United States), Andrew B. Kahng (Univ. of California, San Diego, United States), *Youngmin Kim (Univ. of Michigan, Ann Arbor, United States), Saumil Shah (Blaze-DFM, United States), Dennis Sylvester (Univ. of Michigan, Ann Arbor, United States) |
Page | pp. 480 - 485 |
Keyword | diffusion rounding, gate variability, DFM, lithography simulation, non-rectilinear |
Abstract | Due to aggressive scaling of device feature size to improve circuit performance in the sub-wavelength lithography regime, both diffusion and poly gate shapes are no longer rectilinear. Diffusion rounding occurs most notably where the diffusion shapes are not perfectly rectangular, including common L and T-shaped diffusion layouts to connect to power rails. This paper investigates the impact of the non-rectilinear shape of diffusion (i.e., sloped diffusion or diffusion rounding) on circuit performance (delay and leakage). Simple weighting function models for Ion and Ioff to account for the diffusion rounding effects are proposed, and compared with TCAD simulation. Our experiments show that diffusion rounding has an asymmetric characteristic for Ioff due to the differing significance of source/drain junctions on device threshold voltage. Therefore, we can model Ion and Ioff as a function of slope angle and direction. The proposed models match well with TCAD simulation results, with less than 2% and 6% error in Ion and Ioff, respectively. |
PDF file |
Title | (Panel Discussion) Are System Level EDA Tools/Methodologies Coming? |
Author | Moderator: Ren-Song Tsay (Nat l Tsing Hua Univ., Taiwan), Panelists: Raul Camposano (Xoomsys, Tajikistan), Toshihiro Hattori (Renesas Technology, Japan), Austin Kim (Samsung Electronics, Republic of Korea), Howard Mao (Springsoft, Taiwan), Sri Parameswaran (Univ. of New South Wales, Australia) |
Title | Pessimism Reduction in Coupling-Aware Static Timing Analysis Using Timing and Logic Filtering |
Author | *Debasish Das (Northwestern Univ., United States), Kip Killpack, Chandramouli Kashyap, Abhijit Jas (Intel, United States), Hai Zhou (Northwestern Univ., United States) |
Page | pp. 486 - 491 |
Keyword | static timing analysis, crosstalk, algorithm |
Abstract | With continued scaling of technology into nanometer regimes, the impact of coupling induced delay variations are significant. While several coupling aware static timers have been proposed, the results are often pessimistic with many false failures. We present an integrated iterative timing filtering and logic filtering based approach to reduce pessimism. We use a realistic coupling model based on arrival times and slews and show that non-iterative pessimism reduction algorithms proposed by previous research can give potentially non-conservative timing results. On a functional block from an industrial 65nm microprocessor our algorithm showed a maximum pessimism reduction of 11.18\% of cycle time over converged timing filtering analysis that does not consider logic constraints. |
PDF file |
Title | A Fast Incremental Clock Skew Scheduling Algorithm for Slack Optimization |
Author | *Kui Wang, Hao Fang, Hu Xu, Xu Cheng (Microprocessor Research Center of Peking University, China) |
Page | pp. 492 - 497 |
Keyword | clock schedule , semi-synchronous circuits, useful skew, timing analysis |
Abstract | We propose a fast clock skew scheduling algorithm which minimizes clock period and enlarges the slacks of timing critical paths. To reduce the runtime of the timing analysis engine, our algorithm allows the sequential graph to be partly extracted. And the runtime of itself is almost linear to the size of the extracted sequential graph. Experimental results show its runtime is less than a minute for a design with more than ten thousands of flip-flops. |
PDF file |
Title | Clock Tree Synthesis with Data-Path Sensitivity Matching |
Author | *Matthew R. Guthaus (University of California Santa Cruz, United States), Dennis Sylvester (University of Michigan, United States), Richard B. Brown (University of Utah, United States) |
Page | pp. 498 - 503 |
Keyword | clock tree synthesis, variability, skew |
Abstract | This paper investigates methods for minimizing the impact of process variation on clock skew using buffer and wire sizing. While most papers on clock trees ignore data-path circuit variations and most papers on data-path circuit optimization disregard clock tree variation, we consider both. Using both clock and data-path variations together, we present a novel sensitivity-matching algorithm that allows clock tree skews to be intentionally correlated with data-path sensitivities to ameliorate timing violations due to variation. Our statistical tuning shows an improvement in terms of expected clock skew and clock skew variation over previously published robust algorithms. |
PDF file |
Title | Buffered Clock Tree Synthesis for 3D ICs Under Thermal Variations |
Author | Jacob Minz (Synopsys, United States), Xin Zhao, *Sung Kyu Lim (Georgia Inst. of Tech., United States) |
Page | pp. 504 - 509 |
Keyword | thermal-aware optimization, clock, 3D IC |
Abstract | In this paper, we study the buffered clock tree synthesis problem under thermal variations for 3D IC technology. Our major contribution is the Balanced Skew Theorem, which provides a theoretical background to efficiently construct a buffered 3D clock tree that minimizes and balances the skew values under two distinct non-uniform thermal profiles. Our clock tree synthesis algorithm named BURITO (Buffered Clock Tree With Thermal Optimization) first constructs a 3D abstract tree under the wirelength vs via-congestion tradeoff. This abstract tree is then embedded, buffered, and refined under the given non-uniform thermal profiles so that the temperature-dependent skews are minimized and balanced. Experimental results show that our algorithms significantly reduce and perfectly balance clock skew values with minimal wirelength overhead. |
PDF file |
Title | A Delay Model for Interconnect Trees Based on ABCD Matrix |
Author | *Guofei Zhou, Li Su, Depeng Jin, Lieguang Zeng (Tsinghua University, China) |
Page | pp. 510 - 513 |
Keyword | delay estimation, interconnect, VLSI |
Abstract | The accuracy of interconnect delay estimations can be improved by the method presented in this paper, in which the first two moments are obtained with ABCD matrix and a stable model to incorporate effects of transport delay into the delay estimate is developed. Simulation results show that the method share the same accuracy with traditional methods when rise time delay is much longer than transport delay and more accurate when the two are of the same order. |
PDF file |
Title | Analytical Model for the Impact of Multiple Input Switching Noise on Timing |
Author | Rajeshwary Tayade (University of Texas at Austin, United States), *Sani Nassif (IBM, United States), Jacob Abraham (University of Texas at Austin, United States) |
Page | pp. 514 - 517 |
Keyword | multiple input switching, dynamic variability, path delay estimation |
Abstract | The timing models used in current Static Timing Analysis tools characterize gate delays only for single input switching events. It is well known that the temporal proximity of signals arriving at different inputs causes significant variation in the gate delay. This variation in delay needs to be accounted for when selecting critical paths of a circuit. In this paper, a detailed analysis of Multiple Input Switching (MIS) behavior is presented that leads to a simple analytical model which can be used to estimate gate delay with MIS noise. The model presented requires minimum additional characterization effort, and can be employed in a statistical timing engine. The dynamic delay variability of a path caused due to MIS noise can be accurately estimated using the proposed model. |
PDF file |
Title | Determination of Optimal Polynomial Regression Function to Decompose On-Die Systematic and Random Variations |
Author | *Takashi Sato, Hiroyuki Ueyama, Noriaki Nakayama, Kazuya Masu (Tokyo Institute of Technology, Japan) |
Page | pp. 518 - 523 |
Keyword | process variation , log-likelihood estimate, AIC, model selection |
Abstract | A procedure that decomposes measured parametric device variation into systematic and random components is studied by considering the decomposition process as selecting the most suitable model for describing on-die spatial variation trend. In order to maximize model predictability, the log-likelihood estimate called corrected Akaike information criterion is adopted. Depending on on-die contours of underlying systematic variation, necessary and sufficient complexity of the systematic regression model is objectively and adaptively determined. The proposed procedure is applied to 90-nm threshold voltage data and found the low order polynomials describe systematic variation very well. Designing cost-effective variation monitoring circuits as well as appropriate model determination of on-die variation are hence facilitated.} |
PDF file |
Title | Within-Die Process Variations: How Accurately Can They Be Statistically Modeled? |
Author | Brendan Hargreaves, Henrik Hult, *Sherief Reda (Brown Univ., United States) |
Page | pp. 524 - 530 |
Keyword | process variations, statistical modeling |
Abstract | Within-die process variations arise during integrated circuit (IC) fabrication in the sub-100nm regime. These variations are of paramount concern as they deviate the performance of ICs from their designers’ original intent. These deviations reduce the parametric yield and revenues from integrated circuit fabrication. In this paper we provide a complete treatment to the subject of within-die variations. We propose a scan-chain based system, vMeter, to extract within-die variations in an automated fashion. We implement our system in a sample of 90nm chips, and collect the within-die variations data. Then we propose a number of novel statistical analysis techniques that accurately model the within-die variation trends and capture the spatial correlations. We propose the use of maximum-likelihood techniques to find the required parameters to fit the model to the data. The accuracy of our models is statistically verified through residual analysis and variograms. Using our successful modeling technique, we propose a procedure to generate synthetic within-die variation patterns that mimic, or imitate, real silicon data. |
PDF file |
Title | Chebyshev Affine Arithmetic Based Parametric Yield Prediction Under Limited Descriptions of Uncertainty |
Author | Jin Sun, Yue Huang (The University of Arizona, United States), Jun Li (Anova Solutions, United States), *Janet M. Wang (The University of Arizona, United States) |
Page | pp. 531 - 536 |
Keyword | Chebyshev Affine Arithmetic, Process Variations, Limited Description of Uncertainty, Dependency Bounds |
Abstract | In modern circuit design, it is difficult to provide reliable parametric yield prediction since the real distribution of process data is hard to measure. Most existing approaches are not able to handle the uncertain distribution property coming from the process data. Other approaches are inadequate considering correlations among the parameters. This paper suggests a new approach that not only takes care of the correlations among distributions but also provides a low cost and efficient computation scheme. The proposed method approximates the parameter variations with Chebyshev Affine Arithmetics (CAA) to capture both the uncertainty and the nonlinearity in Cumulative Distribution Functions (CDF). The CAA based probabilistic presentation describes both fully and partially specified process and environmental parameters. Thus we are capable of predicting probability bounds for leakage consumption under unknown dependency assumption among variations. The end result is the chip level parametric yield estimation based on leakage prediction. The experimental results demonstrate that the new approach provides reliable bound estimation while leads to 20% yield improvement comparing with interval analysis. |
PDF file |
Title | Distribution Arithmetic for Stochastical Analysis |
Author | *Markus Olbrich, Erich Barke (Leibniz University of Hannover, Germany) |
Page | pp. 537 - 542 |
Keyword | stochastic, robustness, arithmetic |
Abstract | This paper presents a novel arithmetic which allows calculations with fluctuating values. Given the distributions of initial random variables, the moments (such as expected value, variance and higher moments) of any calculated variable can be determined. Our approach is not limited to normal distributions and works with linear and nonlinear functions. Correlations between variables are taken into account automatically by the arithmetic. Examples show the accuracy and runtimes compared to Monte Carlo simulation. |
PDF file |
Title | Handling Partial Correlations in Yield Prediction |
Author | Sridhar Varadan (Texas A&M University, United States), *Janet Wang (University of Arizona, United States), Jiang Hu (Texas A&M University, United States) |
Page | pp. 543 - 548 |
Keyword | yield |
Abstract | In nanometer regime, IC designs have to consider the impact of process variations, which is often indicated by manufacturing/parametric yield. This paper investigates a yield model - the probability that the values of multiple manufacturing/circuit parameters meet certain target. This model can be applied to predict CMP (Chemical-Mechanical Planarization) yield. We focus on the difficult cases which have large number of partially correlated variations. In order to predict the yield for these difficult cases efficiently, we propose two techniques: (1) application of Orthogonal Principle Component Analysis (OPCA); (2) hierarchical adaptive quadrisection (HAQ). Systematic variations are also included in our model. Compared to previous work, the OPCA based method can reduce the error on yield estimation from 17.1%-21.1% to 1.3%-2.8% with 4.6X speedup. The HAQ technique can reduce the error to 4.1%-5.6% with 6X-9.4X speedup. |
PDF file |
Title | (Invited Paper) Reliability-Aware Design for Nanometer-Scale Devices |
Author | *David Atienza (EPFL, Swaziland), Giovanni De Micheli (LSI/EPFL, Swaziland), Luca Benini (DEIS/UNIBO, Italy), José L. Ayala, Pablo G. Del Valle (DACYA/UCM, Spain), Michael DeBole, Vijay Narayanan (CSE/PSU, United States) |
Page | pp. 549 - 554 |
Abstract | Continuous transistor scaling due to improvements in CMOS devices and manufacturing technologies is increasing processor power densities and temperatures; thus, creating challenges to maintain manufacturing yield rates and reliable devices in their expected lifetimes for latest nanometer-scale dimensions. In fact, new system and processor microarchitectures require new reliability-aware design methods and exploration tools that can face these challenges without significantly increasing manufacturing cost, reducing system performance or imposing large area overheads due to redundancy. |
PDF file |
Title | (Invited Paper) An Industrial Perspective of Power-aware Reliable SoC Design |
Author | *Soo-Kwan Eo, Sungjoo Yoo, Kyu-Myung Choi (Samsung Electronics, Republic of Korea) |
Page | pp. 555 - 557 |
PDF file |
Title | (Panel Discussion) How to Design Cool Chips for Hot Products |
Author | Moderator: Massoud Pedram (Univ. of Southern California, United States), Panelists: Giovanni De Micheli (EPFL, Swaziland), Jan Rabaey (Univ. of California, Berkeley, United States), Sookwan Eo (Samsung Electronics, Republic of Korea) |
Thursday, January 24, 2008 |
Title | (Keynote Address) The Future of Semiconductor Industry - A Foundry's Perspective |
Author | *F. C. Tseng (TSMC, Taiwan) |
Page | p. 558 |
PDF file |
Title | Soft Error Rate Reduction Using Redundancy Addition and Removal |
Author | *Kai-Chiang Wu, Diana Marculescu (Carnegie Mellon University, United States) |
Page | pp. 559 - 564 |
Keyword | SER Reduction, Soft Error, RAR, Redundancy Addition and Removal, Reliability |
Abstract | Due to current technology scaling trends such as shrinking feature sizes and reducing supply voltages, circuit reliability has become more susceptible to radiation-induced transient faults (soft errors). Soft errors, which have been a great concern in memories, are now a main factor in reliability degradation of logic circuits. In this paper, we propose a novel framework based on redundancy addition and removal (RAR) for soft error rate (SER) reduction. Several metrics and constraints are introduced to guide our proposed framework towards SER reduction in an efficient manner. Experimental results show that up to 70% reduction in output failure probability can be achieved with relatively low area overhead. |
PDF file |
Title | Localized Random Access Scan: Towards Low Area and Routing Overhead |
Author | *Yu Hu, Xiang Fu, Xiaoxin Fan (Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, China), Hideo Fujiwara (Graduate School of Information Science, Nara Institute of Science and Technology, Japan) |
Page | pp. 565 - 570 |
Keyword | Random Access Scan, Design-for-Testability |
Abstract | Conventional random access scan (RAS) designs are expensive in hardware overhead. In this paper, we present a localized RAS architecture (LRAS) to address this issue. A novel scan cell structure, which has fewer transistors than multiplexer-type scan cell, is proposed to eliminate the global test enable signal and to localize the row and column enable signals. Experimental results demonstrate that LRAS has less area overhead than scan chain based designs, while outperforms the state-of-the-art RAS scheme in routing overhead. |
PDF file |
Title | A Design-for-Diagnosis Technique for Diagnosing Both Scan Chain Faults and Combinational Circuit Faults |
Author | Fei Wang, *Yu Hu, Huawei Li, Xiaowei Li (Chinese Academy of Sciences, China) |
Page | pp. 571 - 576 |
Keyword | scan chain diagnosis, design for diagnosis, logic diagnosis, scan chain, design for testestability |
Abstract | The amount of die area consumed by scan chains and scan control circuit can range from 15%~30%, and scan chain failures account for almost 50% of chip failures. As the conventional diagnosis process usually runs on the faulty free scan chain, scan chain faults may disable the diagnostic process, leaving large failure area to time-consuming failure analysis. In this paper, a design-for-diagnosis (DFD) technique is proposed to diagnose faulty scan chains precisely and efficiently, moreover, with the assistant of the proposed technique, the conventional logic diagnostic process can be carried on with faulty scan chains. The proposed approach is entirely compatible with conventional scan-based design. Previously proposed software-based diagnostic methods for conventional scan designs can still be applied to our design. Experiments on ISCAS'89 benchmark circuits are conducted to demonstrate the efficiency of the proposed DFD technique. |
PDF file |
Title | GECOM: Test Data Compression Combined with All Unknown Response Masking |
Author | *Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda Univ., Japan) |
Page | pp. 577 - 582 |
Keyword | scan test, test data compression, X-masking, ATPG |
Abstract | This paper introduces GECOM technology, a novel test compression method with seamless integration of test GEneration, test COmpression (i.e. integrated compression on scan stimulus and masking bits) and all unknown scan responses Masking for manufacturing test cost reduction. Unlike most of prior methods, the proposed method considers the unknown responses during ATPG procedure and selectively encodes the specified 1 or 0 bits (either 1s or 0s) in scan slices for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed GECOM technology consists of GECOM architecture and GECOM ATPG technique. In the GECOM architecture, for a circuit with N internal scan chains, only c tester channels, where $c = \lceil \log_2{N} \rceil + 2 $, are required. GECOM ATPG generates test patterns for the GECOM architecture thus not only the scan inputs could be efficiently compressed but also all the unknown responses would be masked. Experimental results on both benchmark circuits and real industrial designs indicated the effectiveness of the proposed GECOM technique. |
PDF file |
Title | Mixed Integer Linear Programming-Based Optimal Topology Synthesis of Cascaded Crossbar Switches |
Author | *Minje Jun (Yonsei Univ., Republic of Korea), Sungjoo Yoo (Samsung Electronics, Republic of Korea), Eui-Young Chung (Yonsei Univ., Republic of Korea) |
Page | pp. 583 - 588 |
Keyword | cascaded crossbar, topology synthesis, MILP, bus |
Abstract | We present a topology synthesis method for high performance System-on-Chip (SoC) design. Our method provides an optimal topology of on-chip communication network for the given bandwidth, latency, frequency and/or area constraints. The optimal topology consists of multiple crossbar switches and some of them can be connected in a cascaded fashion for higher clock frequency and/or area efficiency. Compared to previous works, the major contribution of our work is the exactness of the solution from two aspects. First, the solving method of our work is exact by employing the mixed integer linear programming (MILP) method. Second, we generalize the crossbar switch representation in MILP in order that the optimal topology can include any arbitrary sizes of crossbar switches together. The experimental results show that the topologies optimized for the clock frequency (area) give up to 37.3% (12.7%) improvements compared to the conventional single large crossbar switch networks for two industrial strength SoC designs. |
PDF file |
Title | Automatic Interface Synthesis Based on the Classification of Interface Protocols of IPs |
Author | *ChangRyul Yun (Agency for Defense Development, Republic of Korea), DongSoo Kang (Chungnam National University, Republic of Korea), YoungHwan Bae, HanJin Cho (ETRI, Republic of Korea), KyoungSon Jhang (Chungnam National University, Republic of Korea) |
Page | pp. 589 - 594 |
Keyword | interface synthesis, protocol classification, IP reuse |
Abstract | In a System on a Chip (SoC) design, we use an IP-based design methodology to reduce design time. An interface circuit design is one of the most essential factors in IP-based design. However, it is not easy to generate interface circuits because IPs have various characteristics. For example, one IP may send only one outstanding address in a burst but another IP may need one address for each transfer in a burst. IPs also use different clock frequencies or different data widths. It is necessary to analyze the interface protocols of each IP to consider and resolve these differences during synthesis. In this paper, we categorize the various interface protocols and use the synthesis algorithm to select the appropriate structure based on the categorizations, clock frequencies, and data width differences of the IPs. Through the experiments, we show that we could automatically generate interface circuits for IPs with different clocks, different data widths, and no address concepts. Experiments also show the pros and cons of two structures based on the comparisons of the synthesis results of several IP pairs which could be employed between two alternative structures, namely, product FSM-based structure and FSMD-like structure. |
PDF file |
Title | The Shining Embedded System Design Methodology Based on Self Dynamic Reconfigurable Architectures |
Author | C. A. Curino, *L. Fossati, V. Rana, F. Redaelli, M. D. Santambrogio, D. Sciuto (Politecnico di Milano, Italy) |
Page | pp. 595 - 600 |
Keyword | Reconfiguration, Embedded System, Design flow |
Abstract | Complex design, targeting System-on-Chip based on reconfigurable architectures, still lacks a generalized methodology allowing both the automatic derivation of a complete system solution able to fit into the final device, and mixed hardware-software solutions, exploiting partial reconfiguration capabilities. The Shining methodology organizes the input specification of a complex System-on-Chip design into three different components: hardware, reconfigurable hardware and software, each handled by dedicated sub-flows. A communication model guarantees reliable and seamless interfacing of the various components. The developed system, stand-alone or OS-based, is architecture-independent. The Shining flow reduces the time for system development, easing the design of complex hardware/software reconfigurable applications. |
PDF file |
Title | Robust On-Chip Bus Architecture Synthesis for MPSoC Under Random Tasks Arrival |
Author | *Sujan Pandey (NXP Semiconductors Research, Netherlands), Rolf Drechsler (University of Bremen, Germany) |
Page | pp. 601 - 606 |
Keyword | On-chip bus synthesis, Synthesis for robustness |
Abstract | A major trend in a modern system-on-chip design is a growing system complexity, which results in a sharp increase of communication traffic on the on-chip communication bus architectures. In a real-time embedded system, task arrival rate, inter-tasks arrival time, and data size to be transferred are not uniform over time. This is due to the partial re-configuration of an embedded system to cope with dynamic workload. In this context, the traditional application specific bus architectures may fail to meet the real-time constraints. Thus, to incorporate the random behavior of on-chip communication, this work proposes an approach to synthesize an on-chip bus architecture, which is robust for a given distributions of random tasks. The randomness of communication tasks is characterized by three main parameters which are average tasks arrival rate, average inter-tasks arrival time, and data size. For synthesis, an on-chip bus requirement is guided by the worst-case performance need, while the dynamic voltage scaling technique is used to save energy when the workload is low or timing slack is high. This, in turn, results in an effective utilization of communication resources under variable workload. |
PDF file |
Title | A Multi-Processor NoC Platform Applied on the 802.11i TKIP Cryptosystem |
Author | *Jung-Ho Lee, Sung-Rok Yoon, Kwang-Eui Pyun, Sin-Chong Park (ICU, Republic of Korea) |
Page | pp. 607 - 610 |
Keyword | NoC, MPSoC, TKIP |
Abstract | Since 2001, there have been a myriad of papers on systematic analysis of Multi-Processor System on Chip (MPSoC) and Network on Chip (NoC). Nevertheless, we only have a few of their practical application. Till now, main interest of researchers has been to adapt NoC to the communication intensive multimedia system like H.263. However, this paper attempts to expand the domain of NoC platform to one of the wireless security algorithms (TKIP), because its inter-component transaction pattern shows considerable characteristic for NoC. This paper consists of the explanation on operational sequence of the algorithm in chosen architecture and the brief illustration of important composing NoC blocks (Network Interface, Router). |
PDF file |
Title | A Unified Methodology for Power Supply Noise Reduction in Modern Microarchitecture Design |
Author | Michael Healy, Fayez Mohamood, Hsien-Hsin S. Lee, *Sung Kyu Lim (Georgia Institute of Technology, United States) |
Page | pp. 611 - 616 |
Keyword | power noise, dynamic control, floorplanning |
Abstract | In this paper, we present a novel design methodology to combat the ever-aggravating high frequency power supply noise (di/dt) in modern microprocessors. Our methodology integrates microarchitectural profiling for noise-aware floorplanning, dynamic runtime noise control to prevent unsustainable noise emergencies, as well as decap allocation; all to produce a design for the average-case current consumption scenario. The dynamic controller contributes a microarchitectural technique to eliminate occurences of the worst-case noise scenario thus our method focuses on average-case noise behavior. |
PDF file |
Title | Heuristic Power/Ground Network and Floorplan Co-Design Method |
Author | *Xiaoyi Wang, Jin Shi, Yici Cai, Xianlong Hong (Tsinghua University, China) |
Page | pp. 617 - 622 |
Keyword | Floorplan, IR drop, P/G network optimization |
Abstract | It's a trend to consider power supply integrity at early stage to improve the design quality. In this paper, we propose a novel algorithm to optimize floorplan together with P/G network. Compared with previous methods, our algorithm can search the floorplan space more efficiently and therefore lead to better results. Further, we also propose a smart heuristic method to build P/G mesh grid with optimized topology. Experimental results show our method can speedup the floorplanning process by about 10 times and reduce the routing area of P/G network while maintaining the floorplan quality and P/G integrity. |
PDF file |
Title | Vertical Via Design Techniques for Multi-Layered P/G Networks |
Author | *Shuai Li, Jin Shi, Yici Cai, Xianlong Hong (Tsinghua University, China) |
Page | pp. 623 - 628 |
Keyword | P/G , multi-layered, via |
Abstract | In multi-layered power/ground (P/G) networks, to connect the whole network together, vertical vias are usually placed at intersections between metal wires of adjoining layers. In this paper, a deep study about the design of vertical vias is presented. First we present an efficient heuristic algorithm based on sensitivity analysis to optimize via allocation in early design stage. Compared with equal allocation, averagely our algorithm is capable of reducing worst voltage drop by 8.43% while using the same or even less number of vias. Also, adjoint network method is utilized and significantly improves the efficiency of our algorithm. Next, we demonstrate that by linking metal wires of nonadjacent layers, cross-layer vias are powerful in eliminating “hot” areas which suffer from large voltage drop on bottom layer. A similar heuristic algorithm is also developed for the addition of cross-layer vias. |
PDF file |
Title | Statistical Mixed Vt Allocation of Body-Biased Circuits for Reduced Leakage Variation |
Author | Jinseob Jeong, *Seungwhun Paik, Youngsoo Shin (KAIST, Republic of Korea) |
Page | pp. 629 - 634 |
Keyword | Statistical, Mixed Vt, Body Biasing |
Abstract | Leakage current is susceptible to variation of transistor parameters and environment such as temperature, which results in wide spread in leakage distribution. The spread can be reduced by employing body biasing: reverse body bias for too leaky dies and forward body bias for too slow dies. We investigate body biasing of mixed Vt circuits. It is shown that the conventional body biasing has limitation in reducing leakage variation of mixed Vt circuits. This is because low- and high-Vt devices do not track each other and their body biasing sensitivities are different. We present alternative body biasing scheme that targets compensating die-to-die variation of low Vt. Under this body biasing scheme, within-die profiles of lowand high-Vt, which we need for statistical allocation of mixed Vt, get wider thus become different from the original ones. We present an analytical procedure to derive new within-die profiles. Experiments with 45-nm predictive model show that the spread in leakage can be reduced to 4.5 on average as opposed to 9.4 from conventional body biasing on mixed Vt circuits. |
PDF file |
Title | Exploring High-Speed Low-Power Hybrid Arithmetic Units at Scaled Supply and Adaptive Clock-Stretching |
Author | Swaroop Ghosh, *Kaushik Roy (Purdue Univ., United States) |
Page | pp. 635 - 640 |
Keyword | Low Power, adder, high speed, hybrid, robust |
Abstract | In this paper, we explore various arithmetic units for possible use in high speed, high yield ALU design at scaled supply voltage with variable latency operation. We demonstrate that careful modification of the existing arithmetic units indeed make them further suitable for supply voltage scaling with tolerable area overhead. Simulation results on different adder and multiplier topologies show 18-60% improvement in power with 2-8% increase in die-area at iso-yield. We also extend our studies to design low power and high yield multipliers. These optimized low power datapath units can be used to construct a low power and robust ALU. |
PDF file |
Title | (Panel Discussion) Concurrent SoC and SiP Designs |
Author | Moderator: Wei-Chung Lo (ITRI, Taiwan), Panelists: C. P. Hung (ASE, Taiwan), Lung Chu (Cadence Design Systems, United States), Joungho Kim (KAIST, Republic of Korea), Epan Wu (VIA Technologies, Taiwan) |
Title | Circuit Lines for Guiding the Generation of Random Test Sequences for Synchronous Sequential Circuits |
Author | Irith Pomeranz (Purdue University, United States), *Sudhakar M. Reddy (University of Iowa, United States) |
Page | pp. 641 - 646 |
Keyword | random test sequences, synchronous sequential circuits |
Abstract | A procedure proposed earlier for improving the fault coverage of a random primary input sequence modifies the input sequence so as to avoid repeated synchronization of state variables. We show that in addition to the values of state variables, it is also important to consider repeated setting of other lines to the same values. A procedure and experimental results are presented to demonstrate the improvements in fault coverage of random primary input sequences when the values of selected lines are considered. |
PDF file |
Title | A New Low Energy BIST Using A Statistical Code |
Author | *Sunghoon Chun, Taejin Kim, Sungho Kang (Yonsei University, Republic of Korea) |
Page | pp. 647 - 652 |
Keyword | BIST, low energy test, test data compression |
Abstract | To tackle with the increased switching activity during the test operation, this paper proposes a new built-in self test (BIST) scheme for low energy testing that uses a statistical code and a new technique to skip unnecessary test sequences. From a general point of view, the goal of this technique is to minimize the total power consumption during a test and to allow the at-speed test in order to achieve high fault coverage. The effectiveness of the proposed low energy BIST scheme was validated on a set of ISCAS ’89 benchmark circuits with respect to test data volume and energy saving. |
PDF file |
Title | On Reducing Both Shift and Capture Power for Scan-Based Testing |
Author | Jia Li (Chinese Academy of Sciences, China), *Qiang Xu (The Chinese Univ. of Hong Kong, Hong Kong), Yu Hu, Xiaowei Li (Chinese Academy of Sciences, China) |
Page | pp. 653 - 658 |
Keyword | test power, shift power, capture power, scan-based testing |
Abstract | Power consumption in scan-based testing is a major concern nowadays. In this paper, we present a new X-filling technique to reduce both shift power and capture power during scan tests, namely LSC-filling. The basic idea is to use as few as possible X-bits to keep the capture power under the peak power limit of the circuit under test (CUT), while using the remaining X-bits to reduce the shift power to cut down the CUT’s average power consumption during scan tests as much as possible. In addition, by carefully selecting the X-filling order, our X-filling technique is able to achieve lower capture power when compared to existing methods. Experimental results on ISCAS’89 benchmark circuits show the effectiveness of the proposed methodology. |
PDF file |
Title | Robust Test Generation for Power Supply Noise Induced Path Delay Faults |
Author | *Xiang Fu, Huawei Li, Yu Hu, Xiaowei Li (Chinese Academy of Sciences, China) |
Page | pp. 659 - 662 |
Keyword | Power Supply Noise, Robust Test generation |
Abstract | In deep sub-micron designs, the delay caused by power supply noise (PSN) can no longer be ignored. A PSN-induced path delay fault (PSNPDF) model is proposed in this paper, and should be tested to enhance chip quality. Based on precise timing analysis, we also propose a robust test generation technique for PSNPDF. Concept of timing window is introduced into the PSNPDF model. If two devices in the same feed region simultaneously switch in the same direction, the current waveform of the two devices will have an overlap and excessive PSN will be produced. Experimental results on ISCAS’89 circuits showed test generation can be finished in a few seconds. |
PDF file |
Title | Test Vector Chains for Increased Targeted and Untargeted Fault Coverage |
Author | Irith Pomeranz (Purdue University, United States), *Sudhakar M. Reddy (University of Iowa, United States) |
Page | pp. 663 - 666 |
Keyword | n-detections, test generation |
Abstract | We introduce the concept of test vector chains, which allows us to obtain new test vectors from existing ones through single-bit changes without any test generation effort. We demonstrate that a test set T0 has a significant number of test vector chains that are effective in increasing the numbers of detections of target faults, i.e., faults targeted during the generation of T0, as well as untargeted faults, i.e., faults that were not targeted during the generation of T0. |
PDF file |
Title | Parallel Fault Backtracing for Calculation of Fault Coverage |
Author | *Raimund Ubar, Sergei Devadze, Jaan Raik, Artur Jutman (Tallinn University of Technology, Estonia) |
Page | pp. 667 - 672 |
Keyword | fault simulation, combinational circuits, stuck-at faults, critical path, Boolean differentials |
Abstract | An improved method for calculation of fault coverage with parallel fault backtracing in digital circuits with scan path is proposed. The method is based on structurally synthesized BDDs (SSBDD) which represent gate-level circuits at higher, macro level where macros represent subnetworks of gates. A topological analysis is carried out to generate an efficient model for backtracing of faults to minimize the repeated calculations because of the reconvergent fanouts. The algorithm is equivalent to exact critical path tracing. Because of the parallelism and higher abstraction level modeling the speed of analysis was considerably increased. Experimental data show that the speed-up of the new method is considerable compared to the previous similar approach. The speed of the fault analysis in several times outperforms the speed of the current state-of-the-art commercial fault simulators |
PDF file |
Title | ReSP: A Non-Intrusive Transaction-Level Reflective MPSoC Simulation Platform for Design Space Exploration |
Author | Giovanni Beltrame (European Space Agency, Netherlands), Cristiana Bolchini, *Luca Fossati, Antonio Miele, Donatella Sciuto (Politecnico di Milano, Italy) |
Page | pp. 673 - 678 |
Keyword | MPSoC, SystemC, Python, Simulation, reliability |
Abstract | This paper presents ReSP, a multi-processor simulation platform based on SystemC and Python (which provides the platform with reflective capabilities). The designer has an easy way to specify the architecture of a system, simulate and perform automatic analysis on it. The overhead associated with Python intermediate layer is around 1%. The advantages of our approach are: (a) easy integration of external IPs (b) fine grain simulation control (c) effortless integration of tools for system analysis and design space exploration. |
PDF file |
Title | Collaborative Hardware/Software Partition of Coarse-Grained Reconfigurable System Using Evolutionary Ant Colony Optimization |
Author | *Dawei Wang, Sikun Li, Yong Dou (College of Computer Science, National University of Defense Technology, China) |
Page | pp. 679 - 684 |
Keyword | Collaborative Design, Reconfigurable Computing, System-on-Chips, Hardware/Software Partitioning, Ant Colony Optimization |
Abstract | The flexibility, performance and cost effectiveness of reconfigurable architectures have lead to its widespread use for embedded applications. Reconfigurable system design is very complex for multi-fields experts to collaborate on application algorithm design, hardware/software co-design and system decision. However, existing reconfigurable system design methods and environments can only support hardware/software co-design, ignoring the collaboration between multi-field experts. This paper presents a collaborative partition approach of coarse-grained reconfigurable system design using evolutionary ant colony optimization. We create a distributed collaborative design environment for system decision engineers, software designers, hardware designers and application algorithm developers. The method not only utilizes the advantages of ant colony optimization for searching global optimal solutions, but also provides a framework for multi-field experts to work collaboratively. Experimental results show that the method improves the quality and speed of hardware/software partition for coarse-grained reconfigurable system design. |
PDF file |
Title | Design Space Exploration for a Coarse Grain Accelerator |
Author | *Farhad Mehdipour, Hamid Noori (Kyushu University, Japan), Morteza Saheb Zamani (Amirkabir University of Technology, Iran), Koji Inoue, Kazuaki Murakami (Kyushu University, Japan) |
Page | pp. 685 - 690 |
Keyword | extensible processor, design space exploration, reconfigurable accelerator |
Abstract | In the design process of a reconfigurable accelerator employing in an embedded system, multitude parameters may result in remarkable complexity and a large design space. Design space exploration as an alternative to the quantitative approach can be employed to find a right balance between the different design parameters. In this paper, a hybrid approach is introduced to analytically explore the design space for a coarse grain accelerator and determine a wise design point exploiting data extracted from applications, quantitatively. It also provides flexibility for taking into account new design constraints as well as new characteristics of applications. Furthermore, this approach is a methodological approach which reduces the design time and results in a point which satisfies the design goals. |
PDF file |
Title | Efficient Symbolic Multi–Objective Design Space Exploration |
Author | *Martin Lukasiewycz, Michael Glaβ, Christian Haubelt, Jürgen Teich (University of Erlangen-Nuremberg, Germany) |
Page | pp. 691 - 696 |
Keyword | design space exploration, pseudo-boolean solver, multi-objective, symbolic |
Abstract | Nowadays many design space exploration tools are based on Multi–Objective Evolutionary Algorithms (MOEAs). Beside the advantages of MOEAs, there is one important drawback as MOEAs might fail in design spaces containing only a few feasible solutions or as they are often afflicted with premature convergence, i.e., the same design points are revisited again and again. Exact methods, especially Pseudo Boolean solvers (PB solvers) seem to be a solution. However, as typical design spaces are multi–objective, there is a need for multi–objective PB solvers. In this paper, we will formalize the problem of design space exploration as multi–objective 0–1 ILP. We will propose (1) a heuristic approach based on PB solvers and (2) a complete multi–objective PB solver based on a backtracking algorithm that incorporates the non–dominance relation from multi–objective optimization and is restricted to linear objective functions. First results from applying our novel multi–objective PB solver to synthetic problems will show its effectiveness in small sized design spaces as well as in large design spaces only containing a few feasible solutions. For non–linear and large problems, the proposed heuristic approach is outperforming common MOEA approaches. Finally, a real world example from the automotive area will emphasize the efficiency of the proposed algorithms. |
PDF file |
Title | Scalable Unified Dual-Radix Architecture for Montgomery Multiplication in GF(P) and GF(2n) |
Author | *Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Kazunori Shimizu, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda University, Japan) |
Page | pp. 697 - 702 |
Keyword | Elliptic curve cryptography, dual-radix, Montgomery multiplication, scalability, unified |
Abstract | Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in $GF(P)$ or $GF(2^n)$, and unified architectures for $GF(P)$ and $GF(2^n)$ multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between $GF(P)$ and $GF(2^n)$ circuits of the multiplier because the critical path of $GF(P)$ circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in $GF(P)$ and $GF(2^n)$. The proposed architecture unifies $4$ parallel radix-$2^{16}$ multipliers in $GF(P)$ and a radix-$2^{64}$ multiplier in $GF(2^n)$ into a single unit. Applying lower radix to $GF(P)$ multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in $GF(P)$ reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute $GF(P)$ $256$-bit Montgomery multiplication in $0.23\mu s$. |
PDF file |
Title | Optimal Allocation and Placement of Thermal Sensors for Reconfigurable Systems and Its Practical Extension |
Author | *ByungHyun Lee, Taewhan Kim (Seoul National University, Republic of Korea) |
Page | pp. 703 - 707 |
Keyword | thermal sensor, allocation, placement, optimization |
Abstract | A dynamic monitoring of thermal behavior of hardware resources using thermal sensors is very important to maintain the operation of systems safe and reliable. This work proposes an effective solution to the problem of thermal sensor allocation and placement for reconfigurable systems at the post-manufacturing stage. Specifically, we define the sensor allocation and placement problem (SAPP), and propose a solution which formulates SAPP into the unate-covering problem (UCP) and solves it optimally. We then provide an extended solution to handle a practical design issue where the hardware resources for the sensor implementation on specific array locations have already been used up by the application logic. Experimental results using MCNC benchmarks show that our proposed technique uses 19.7% less number of sensors to monitor hotspots on the average than that used by the bisection based approaches. |
PDF file |
Title | Exploring Power Management in Multi-Core Systems |
Author | Reinaldo Bergamaschi (IBM T.J. Watson Research Center, United States), Guoling Han (University of California, Los Angeles, United States), Alper Buyuktosunoglu (IBM T.J. Watson Research Center, United States), Hiren Patel (Virginia Tech, United States), Indira Nair, *Gero Dittmann, Geert Janssen (IBM T.J. Watson Research Center, United States), Nagu Dhanwada (IBM EDA Laboratory, United States), Zhigang Hu, Pradip Bose, John Darringer (IBM T.J. Watson Research Center, United States) |
Page | pp. 708 - 713 |
Keyword | dynamic voltage and frequency scaling (DVFS), power management, multi-core systems modeling, performance and power simulation |
Abstract | Power dissipation has become a critical design metric in microprocessor-based system design. In a multi-core system, running multiple applications, power and performance can be dynamically traded off using an integrated power management (PM) unit. This PM unit monitors the performance and power of each core and dynamically adjusts the individual voltages and frequencies in order to maximize system performance under a given power budget (usually set by the operating system). This paper presents a performance and power analysis methodology, featuring a simulation model for multi-core systems that can be easily reconfigured for different scenarios and a PM infrastructure for the exploration and analysis of PM algorithms. Two algorithms have been implemented: one for discrete and one for continuous power modes based on non-linear programming. Extensive experiments are reported, illustrating the effect of power management both at the core and the chip level. |
PDF file |
Title | Dependability, Power, and Performance Trade-Off on a Multicore Processor |
Author | *Toshinori Sato (Kyushu University, Japan), Toshimasa Funaki (Kyushu Institute of Technology, Japan) |
Page | pp. 714 - 719 |
Keyword | power consumption, dependability, multicore processors, trade-off design, soft errors |
Abstract | As deep submicron technologies are advanced, we face new challenges, such as power consumption and soft errors. A naïve technique, which utilizes emerging multicore processors and relies upon thread-level redundancy to detect soft errors, is power hungry. It consumes at least two times larger power than the conventional single-threaded processor does. This paper investigates a trade-off between dependability and power on a multicore processor, which is named multiple clustered core processor (MCCP). It is proposed to adapt processor resources according to the requested performance. A new metric to evaluate a trade-off between dependability, power, and performance is proposed. It is the product of soft error rate and the popular energy-delay product. We name it energy, delay, and upset rate product (EDUP). Detailed simulations show that the MCCP exploiting the adaptable technique improves the EDUP by up to 21% when it is compared with the one exploiting the naïve technique. |
PDF file |
Title | High Performance Current-Mode Differential Logic |
Author | Ling Zhang (Univ. of California, San Diego, United States), Jianhua Liu (Altera, United States), Haikun Zhu (Qualcomm, United States), *Chung-Kuan Cheng (Univ. of California, San Diego, United States), Masanori Hashimoto (Osaka Univ., Japan) |
Page | pp. 720 - 725 |
Keyword | VLSI circuit design, differential logic, current-mode logic |
Abstract | This paper presents a new logic style, named Current-Mode Differential logic (CMDL), that achieves both high operating speed and low power consumption. Inspired by the low-voltage swing (LVS) logic, CMDL uses a shunt resistor at the differential output to obtain constant low swing signal without the need to reset low. Furthermore, conditional shunt transistors are used for the internal nodes to prevent high-voltage swing, thus entirely eliminate the power-hungry clocked reset network in LVS circuits. We show that the CMDL is suitable for high-end microprocessor integer core by providing three datapath modules implemented in CMDL. Our simulation results indicate that, operating at comparable speed with LVS logic, CMDL circuits can achieve up to 50% reduction of delay-power product compared to CMOS logic and LVS logic. In addition, CMDL reduces the power consumption of LVS by up to 40%. |
PDF file |
Title | NBTI Induced Performance Degradation in Logic and Memory Circuits: How Effectively Can We Approach a Reliability Solution? |
Author | Kunhyuk Kang, Saakshi Gangwal, Sang Phill Park, *Kaushik Roy (Purdue Univ., United States) |
Page | pp. 726 - 731 |
Keyword | Reliability, NBTI, Temporal degradation |
Abstract | This paper evaluates the severity of negative bias temperature instability (NBTI) degradation in two major circuit applications: random logic and memory array. For improved lifetime stability, we propose/select an efficient relia- bility-aware circuit design methodologies. Simulation results obtained from 65nm PTM node shows that NBTI induced degradation in random logic is considerably lower than that of a single transistor. As a result, simple delay guard-banding can efficiently mitigate the impact of NBTI in random logic. On the other hand, NBTI degradation in memory shows much severe effect especially when combined with the impact of random process variation, NBTI can dramatically reduce the READ stability of memory cells. Hence, aggressive design techniques such as stand-by VDD scaling or adaptive body biasing (ABB) are required in memory application to minimize the impact of NBTI. |
PDF file |
Title | (Invited Paper) Reaching the Limits of Low Power Design |
Author | J. S. Hobbs, *T. W. Williams (Synopsys, United States) |
Page | pp. 732 - 735 |
Abstract | As process technologies continue to shrink, and feature demands continue to increase, more and more capabilities are being pushed into smaller and smaller packages. But are we finally reaching the point where power density limitations make this trend no longer sustainable? What advanced techniques are in use today, and on the horizon, to address this? Are we limited only to hardware techniques, or can these power limitation issues be addressed with smarter software development? And how do we handle verification of these complex implementations? This paper explores possible methods for improving the "power capacity" of power sensitive designs. |
PDF file |
Title | (Invited Paper) Software-Cooperative Power-Efficient Heterogeneous Multi-Core for Media Processing |
Author | *Hiroaki Shikano, Masaki Ito, Kunio Uchiyama, Toshihiko Odaka (Hitachi, Japan), Akihiro Hayashi, Takeshi Masuura, Masayoshi Mase, Jun Shirako, Yasutaka Wada, Keiji Kimura, Hironori Kasahara (Waseda Univ., Japan) |
Page | pp. 736 - 741 |
Abstract | A heterogeneous multi-core processor (HMCP) architecture, which integrates general purpose processors (CPU) and accelerators (ACC) to achieve high-performance as well as low-power consumption with the support of a parallelizing compiler, was developed. The evaluation was performed using an MP3 audio encoder on a simulator that accurately models the HMCP. It showed that 16-frame encoding on the HMCP with four CPUs and four ACCs yielded 24.5-fold speed-up of performance against sequential execution on one CPU. Furthermore, power saving by the compiler reduced energy consumption of the encoding to 0.17 J, namely, by 28.4%. |
PDF file |
Title | (Invited Paper) Experiences of Low Power Design Implementation and Verification |
Author | *Shi-Hao Chen, Jiing-Yuan Lin (Global Unichip, Taiwan) |
Page | pp. 742 - 747 |
Abstract | In this paper, we present the experiences of some low power solutions that have been successfully implemented in 90nm/65nm production tape-outs. We also focus on power gating design, an effective low leakage solution, and present the experiences of power switch planning, optimization, and verification. Dynamic IR drop is an important issue in low power design, which may reduce the logic gate noise margins and result in functional or timing failures. We will present a low cost but effective methodology for dynamic IR drop prevention and fixing. |
PDF file |
Title | (Invited Paper) Low Power Architecture and Design Techniques for Mobile Handset LSI Medity™ M2 |
Author | *Shuichi Kunie, Takefumi Hiraga, Tatsuya Tokue, Sunao Torii, Taku Ohsawa (NEC, Japan) |
Page | pp. 748 - 753 |
Abstract | This paper presents the low power architecture and design techniques for the mobile handset LSI Medity™ M2. M2 is a second-generation mobile handset LSI which integrates a Digital baseband and Application processor on a chip. M2 is capable of supporting 3.2 Mbps HSDPA, WCDMA communications, and rich, high-resolution multimedia applications, while power consumption is kept almost the same as in its predecessor chip M1. To reduce power consumption, M2 adopts hardware management clock control schemes, Multiple Vt transistors, an On-chip Power Switch, and Back-bias control. Preliminary measurement results show the design to work very well. |
PDF file |
Title | An Efficient, Fully Nonlinear, Variability-Aware Non-Monte-Carlo Yield Estimation Procedure with Applications to SRAM Cells and Ring Oscillators |
Author | *Chenjie Gu, Jaijeet Roychowdhury (University of Minnesota, United States) |
Page | pp. 754 - 761 |
Keyword | Yield estimation, non-Monte-Carlo, SRAM, Ring oscillator |
Abstract | Failures and yield problems due to parameter variations have become a significant issue for sub-90-nm technologies. As a result, CAD algorithms and tools that provide designers the ability to estimate the effects of variability quickly and accurately are being urgently sought. The need for such tools is particularly acute for static RAM (SRAM) cells and integrated oscillators, for such circuits require expensive and high-accuracy simulation during design. We present a novel technique for fast computation of parametric yield. The technique is based on efficient, adaptive geometric calculation of probabilistic hypervolumes subtended by the boundary separating pass/fail regions in parameter space. A key feature of the method is that it is far more efficient than Monte-Carlo, while at the same time achieving better accuracy in typical applications. The method works equally well with parameters specified as corners, or with full statistical distributions; importantly, it scales well when many parameters are varied. We apply the method to an SRAM cell and a ring oscillator and provide extensive comparisons against full Monte-Carlo, demonstrating speedups of 100-1000X. |
PDF file |
Title | Analog Circuit Simulation Using Range Arithmetics |
Author | *Darius Grabowski, Markus Olbrich, Erich Barke (Leibniz University of Hannover, Germany) |
Page | pp. 762 - 767 |
Keyword | Simulation, affine arithmetic |
Abstract | The impact of parameter variations in integrated analog circuits is usually analyzed by Monte Carlo methods with a high number of simulation runs. Few approaches based on interval arithmetic were not successful due to tremendous overapproximations. In this paper, we describe an innovative approach computing transient and DC simulations of nonlinear analog circuits with symbolic range representations that keeps correlation information, and hence has a very limited overapproximation. The methods are based on affine and quadratic arithmetic. Ranges are represented by unique symbols so that linear correlation information is preserved. We demonstrate feasibility of the methods by simulation results using complex analog circuits. |
PDF file |
Title | LTCC Spiral Inductor Modeling, Synthesis, and Optimization |
Author | *Tuck-Boon Chan, Hsin-Chia Lu, Jun-Kuei Zeng, Charlie Chung-Ping Chen (National Taiwan University, Taiwan) |
Page | pp. 768 - 771 |
Keyword | LTCC, inductor, synthesis, optimization |
Abstract | In RF/microwave circuit design, inductor design is one of the most difficult and time-consuming task due to the tedious try-and-error optimization process. This paper brings forward a fast and accurate spiral inductor synthesis method which automatically generates physical layout of inductors according to electronic specification. The fusion of substrate-aware PEEC model with optimal nonlinear optimization engine, our modeling and synthesis strategies have been extensively verified with 3D solvers and has less than 6% error within measurement result. |
PDF file |
Title | Symmetry Constraint based on Mismatch Analysis for Analog Layout in SOI Technology |
Author | *Jiayi Liu, Sheqin Dong, Xianlong Hong, Yibo Wang, Ou He (Tsinghua University, China), Satoshi Goto (Waseda University, Japan) |
Page | pp. 772 - 775 |
Keyword | mismatch, analog, symmetry, SOI |
Abstract | The conventional tools for mismatch elimination such as geometric symmetry and common centroid technology can only eliminate systematic mismatch, but can do little to reduce random mismatch and thermal-induced mismatch. As the development of VLSI technology, the random mismatch is becoming more and more serious. And in the context of Silicon on Insulator (SOI), the self-heating effect leads to unbearable thermal-induced mismatch. Therefore, in this paper, we first propose a new model which can estimate the combination effect of both random mismatch and thermal-induced mismatch by mismatch analysis and SPICE simulation. And in order to meet the different sensitivities of different symmetry pairs, an automatic classification tool and a configurable optimization process are also introduced. All of these are embedded in the floorplanning process. The final experimental results prove the effectiveness of our method. |
PDF file |
Title | SPKM : A Novel Graph Drawing Based Algorithm for Application Mapping onto Coarse-Grained Reconfigurable Architectures |
Author | *Jonghee Yoon (Seoul National University, Republic of Korea), Aviral Shrivastava (Arizona State University, United States), Sanghyun Park, Minwook Ahn (Seoul National University, Republic of Korea), Reiley Jeyapaul (Arizona State University, United States), Yunheung Paek (Seoul National University, Republic of Korea) |
Page | pp. 776 - 782 |
Keyword | Reconfigurable, Mapping, CGRA, Compiler |
Abstract | Recently coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their efficiency and flexibility. While many CGRAs have demonstrated impressive performance improvements, the effectiveness of CGRA platforms ultimately hinges on the compiler. Existing CGRA compilers do not model the details of the CGRA architecture, due to which they are, i) unable to map applications, even though a mapping exists, and ii) use too many PEs to map an application. In this paper, we model several CGRA details in our compiler and develop a graph mapping based approach (SPKM) for mapping applications onto CGRAs. On randomly generated graphs our technique can map on average 4.5X more applications than the previous approaches, while using fewer CGRA rows 62% times, without any penalty in mapping time. We observe similar results on a suite of benchmarks collected from Livermore Loops, Multimedia and DSPStone benchmarks. |
PDF file |
Title | Block Remap with Turnoff: A Variation-Tolerant Cache Design Technique |
Author | *Mohammed Abid Hussain (Int'l Inst. of Information Tech., Hyderabad, India), Madhu Mutyam (Indian Inst. of Tech. Madras, India) |
Page | pp. 783 - 788 |
Keyword | process variations, data caches, performance, leakage energy |
Abstract | With reducing feature size, the effects of process variations are becoming more and more predominant. Memory components such as on-chip caches are more susceptible to such variations because of high density and small sized transistors present in them. In this paper, we propose a variation-tolerant design technique for process variation affected on-chip data caches. In our technique we selectively turnoff few blocks after rearranging them in such a way that all sets get almost equal number of process variation effected blocks. We show that our technique significantly reduces the performance loss and leakage energy consumption due to process variations. |
PDF file |
Title | ORB: An On-Chip Optical Ring Bus Communication Architecture for Multi-Processor Systems-on-Chip |
Author | *Sudeep Pasricha, Nikil Dutt (University of California, Irvine, United States) |
Page | pp. 789 - 794 |
Keyword | on-chip communication architectures, optical interconnects, MPSoC, performance, power analysis |
Abstract | As application complexity continues to increase, multi-processor systems-on-chip (MPSoC) with tens to hundreds of processing cores are becoming the norm. While computational cores have become faster with each successive technology generation, communication between them has become a bottleneck that limits overall chip performance. On-chip optical interconnects can overcome this bottleneck by replacing electrical wires with optical waveguides. In this paper we propose an optical ring bus (ORB) based on-chip communication architecture for next generation MPSoCs. ORB uses an optical ring waveguide to replace global pipelined electrical interconnects while preserving the interface with today’s bus protocol standards such as AMBA AXI. We present experiments to show how ORB has the potential to provide superior performance (more than 2×) and significantly lower power consumption (a reduction of more than 10×) compared to traditionally used pipelined, all-electrical bus-based communication architectures, for 65-22 nm technology nodes. |
PDF file |
Title | Webpage-Based Benchmarks for Mobile Device Design |
Author | *Marc Somers, JoAnn M. Paul (Virginia Tech., United States) |
Page | pp. 795 - 800 |
Keyword | webpage modeling, utilization, benchmarks, mobile computing |
Abstract | By investigating the content, structure and usage of webpages, we observe that webpages represent a fundamentally different standard for performance evaluation of computer designs. We found that specialized architectures, customized to webpage content, can improve performance up to 70% over a homogeneous multiprocessor with 25% additional improvement when individual user preferences are also considered. Thus, a new form of benchmark suite is required, based upon the rapidly evolving and divergent content of information exchanged via webpages on mobile devices. |
PDF file |
Title | (Panel Discussion) Best Ways to Use Billions of Devices on a Chip |
Author | Moderator: Grant Martin (Tensilica, United States), Panelists: Deming Chen (Univ. of Illinois, Urbana-Champaign, United States), Nikil Dutt (Univ. of California, Irvine, United States), Joerg Henkel (Karlsruhe Univ., Germany), Kyungho Kim (Samsung Electronics, Republic of Korea), Kazutoshi Kobayashi (Kyoto Univ., Japan) |
Page | pp. 801 - 802 |
Abstract | We all know that Moore's law is good for at least a few more generations of silicon process, and this will give rise to many integrated circuits having billions of transistors on them. The leading 45 nm processors being announced are getting close to a billion transistors as of 2007. But how can we best use these devices in the future? Integrating more and more features and functions onto SoCs may not be the optimal use for all of these billions of resources. Indeed, to even have a working device at 45, 32, 22 and 16 nm may require new architectures and new structures to be incorporated. |
PDF file |
Title | (Invited Paper) VEBoC: Variation and Error-Aware Design for Billions of Devices on a Chip |
Author | Shoaib Akram, Scott Cromar, Gregory Lucas, Alexandros Papakonstantinou, *Deming Chen (University of Illinois, Urbana-Champaign, United States) |
Page | pp. 803 - 808 |
Abstract | Billions of devices on a chip is around the corner and the trend of deep submicron (DSM) technology scaling will continue for at least another decade. Meanwhile, designers also face severe on-chip parameter variations, soft/hard errors, and high leakage power. How to use these billions of devices to deliver power-efficient, high-performance, and yet error-resilient computation is a challenging task. In this paper, we attempt to demonstrate some of our perspectives to address these critical issues. |
PDF file |
Title | (Invited Paper) Quo Vadis, BTSoC (Billion Transistor SoC)? |
Author | *Nikil Dutt (University of California, Irvine, United States) |
Page | p. 809 |
Abstract | Billion Transistor Systems-on-Chip (BTSoCs) present designers with a classic case of the “embarrassment-of-riches” syndrome: with so many devices at one’s disposal, designers may be tempted to integrate functionality willy-nilly, with no strategic rethinking of what this level of integration can both afford, as well as achieve. While many advocate “business-asusual” – including ad-hoc integration of functionality to achieve application-specific or domain-dependent designs – I believe BTSoCs present us with some opportunities for a paradigm shift in the architectural strategies and design processes for designing such complex chips. |
PDF file |
Title | (Invited Paper) Best Ways to use Billions of Devices on a Wireless Mobile SoC |
Author | *KyungHo Kim (Samsung Electronics, Republic of Korea) |
Page | p. 810 |
Abstract | A rapid growth in the field of Information Technologies (IT) over the last decade gave us unimaginable possibilities and lots of convenience in our life, i.e. highspeed Internet, mobile-TV, High-definition digital TV, 3D-gaming, mobile multimedia player, Ultra-Mobile PC and so on. Nowadays, the technology is even exceeding market demands. Apple produced 160GB iPod which stores 40,000 songs. Samsung showed a world-first mobile phone with 10Megapixel camera on it. HSDPAbased video telephony service was commercialized in Korea. |
PDF file |
Title | (Invited Paper) Best Ways to Use Billions of Devices on a Chip - Error Predictive, Defect Tolerant and Error Recovery Designs |
Author | *Kazutoshi Kobayashi, Hidetoshi Onodera (Kyoto University, Japan) |
Page | pp. 811 - 812 |
Abstract | Error rates on an LSI are increasing accord- ing to the Moore's law. Now is the time to start incorporat- ing error-tolerant design methodologies. This paper intro- duces sources of failures in semiconductor devices, levels of dependability according to applications of devices and some circuit-level techniques to detect or recover faults af- ter shipping. |
PDF file |