## AER Circuits, Systems, and Tools

Bernabe Linares-Barranco

Sevilla Microelectronics Institute (IMSE) - Spanish Research Council (CSIC)

Instituto de Microelectrónica de Sevilla CICA, Av. Reina Mercedes s/n, 41012 Sevilla, SPAIN

Phone: 34-95-5056670 Fax: 34-95-5056686 E-mail: bernabe@imse.cnm.es



#### Outline

- Introduction: AER, a technology for building large scalable neuromorphic systems
- Some useful circuits: calibration
  - LVDS interface
- Some example systems at IMSE:
- spatial contrast retina
- mixed-mode convolution chip
- fully digital convolution chip
- HW Tools from Sevilla: some FPGA-based PCBs
  - example use in CAVIAR
- SW Tool: Behavioral Matlab Simulator
  - Example 1: neocognitron emulation
  - Example 2: texture classification

#### Conventional Vision Sensing/Processing/Recognition





- Feature Extraction Stages
- Feature Combination Stages
- Classification/Decision Stages

# **Biology**



Recognition Delay < 150ms



Simon Thorpe Nature 1996

# **Biology**



Recognition Delay < 150ms

*feedforward1 spike/neuron* 



Simon Thorpe Nature 1996









#### Serre & Poggio (MIT)

#### Ventral Stream Model for Immediate Recognition



- projection field processing
- short-range & dense for first layers
- long-range & sparse for later layers
- hard-wired for first layers
- plastic for later layers
- first layers: massive 2D filtering for different angles and scales
- first layers: basic feature extraction
- later layers: grouping of features -> abstractions

#### AER (Address Event Representation)







### **Feature Extraction**

(AER Convolution Chip)



Matrix of integrators in the receiver chip

# **CORTICAL TISSUE**





- Events are routed to neighbors through local on-chip routing tables
- Any arbitrary multi-layer feed-forward + feed-back hierarchy can be programmed
- LVDS links allow low-power high-speed event traffic
- Each tile could be a 128x128 programmable kernel convolution chip with local re-routing and remapping capability
- Hundreds of convolution chips can be fit in a 'Cortical Tissue' PCB

#### Computing Power of one such Cortical Tissue PCB

- 120 chips & 436 AER inter-chip links
- Each chip 128x128 neurons and kernel up to 128x128
- Total of 2M neurons
- Total of 32G synapses
- If each AER link requires 30ns per AE:
- 14Geps (interchip)
- 238 Tconnections/sec

#### Outline

- Introduction: AER, a technology for building large scalable neuromorphic systems
- Some useful circuits: calibration
  - LVDS interface
- Some example systems at IMSE:
- spatial contrast retina
- mixed-mode convolution chip
- fully digital convolution chip
- HW Tools from Sevilla: some FPGA-based PCBs
  - example use in CAVIAR
- SW Tool: Behavioral Matlab Simulator
  - Example 1: neocognitron emulation
  - Example 2: texture classification

#### Calibration in Neuromorphic Cells

- large arrays
- small cell area
- very low currents (nano-pico amsp)
- high inter-pixel mismatch:

 $\sigma \approx 10-20\% \implies 6\sigma \approx 60-120\%$ 





#### **Compact Calibration Circuit**

[IEEE Trans. Neural Networks, Sep. 2003]

• Ladder-based digi-MOS:



3N + 1 unit transistors

N = number of bits







one point calibration









#### For Higher Precision



#### New Concept based on parallel/series MOS association

[Galup-Montoro et al., IEEE JSSC 1994]

From EKV/ACM models: 
$$I_{DS} = \frac{W}{L} [f(V_G, V_S) - f(V_G, V_D)]$$

• Generic: - parallel: 
$$\left(\frac{W}{L}\right)_{eq} = \left(\frac{W}{L}\right)_{A} + \left(\frac{W}{L}\right)_{B}$$
  
- series:  $\left(\frac{W}{L}\right)_{eq} = \frac{\left(\frac{W}{L}\right)_{A}\left(\frac{W}{L}\right)_{B}}{\left(\frac{W}{L}\right)_{A} + \left(\frac{W}{L}\right)_{B}}$ 

Consequently,

• Series association with equal W,  $\rightarrow L_{eq} = \sum L_i$ 









#### 4-bit Monte Carlo Simulation

sub-pA current mirror





- we don't need nice precise stairs, but good coverage
- we like down-steps
- we like randomness
- we use same  $W = 2\mu m$  and  $L = \{3.0, 1.8, 1.8, 1.0, 0.7\}$  for a 5-bit digi-MOS

- we don't need nice precise stairs, but good coverage
- we like down-steps
- we like randomness
- we use same  $W = 2\mu m$  and  $L = \{3.0, 1.8, 1.8, 1.0, 0.7\}$  for a 5-bit digi-MOS



#### And we want two additional features:

- FEATURE-1: no recalibration when changing operating current
- FEATURE-2: take maximum advantage of calibration range: B>A but B~A



#### FEATURE-1: no recalibration

• for simple current mirror





#### FEATURE-1: no recalibration

• by adding peripheral translinear tuning:



$$I_{oi} = \frac{I_1 I_2}{I_3} (2 + g(w_{cal}))$$

- $I_3$  is constant, so currents through branch  $I_{i3}$  is constant
- $M_1$  has similar bias condition than  $M_3$ , so  $I_1$  is also kept constant
- $M_2$  has similar bias condition than  $M_4$ , so  $I_{oi}$  is scaled by changing only  $I_2$

#### FEATURE-2: approach A and B





#### Experimental prototype CMOS 0.35um

• single current source calibrated at 10nA



#### Experimental prototype CMOS 0.35um

• single current source calibrated at 10nA


### Experimental prototype CMOS 0.35um

• DAC: five current sources calibrated at 10nA, 5nA, 2.5nA, 1.25nA, 625pA and 16°C



#### Another Translinear Tuning Circuit [TCAS-II, in Press]



- achieves higher precision (7-bit) using a 5-bit circuit
- degrades more rapidly when changing bias conditions

### Outline

- Introduction: AER, a technology for building large scalable neuromorphic systems
- Some useful circuits: calibration
  - LVDS interface
- Some example systems at IMSE:
- spatial contrast retina
- mixed-mode convolution chip
- fully digital convolution chip
- HW Tools from Sevilla: some FPGA-based PCBs
  - example use in CAVIAR
- SW Tool: Behavioral Matlab Simulator
  - Example 1: neocognitron emulation
  - Example 2: texture classification

# The bit serial LVDS AER interface

- Several options are possible:
  - Transmitting data and clock by different physical paths.
  - Recovering the clock using a PLL-based circuit.
  - Extracting the clock from the receiver data (e.g. using a Manchester coding).



# The bit serial LVDS AER interface

- In AER links we will need:
  - Keeping the receiver synchronized in the silent periods.
  - Detecting a new address start.
  - Implementing a fast and robust synchronization scheme.



# The bit serial LVDS AER interface (IV)

- Fast synchronization is a must in AER links because the events are generated in an asynchronous way.
- A Manchester coding scheme allows the receiver to recover the clock directly from the data flow.



### The bit serial LVDS AER interface (V).



# Transmitter circuit



# Receiver circuit

• The only requirement for the CDR design is that five delay elements must introduce a delay between  $T_b/2$  and  $T_b$ .



# **Receiver circuitry**

• A Delay Locked Loop is used to fix the delay introduced by the inverters. The phase difference between the reference clock and a 360°-delayed version of it is compared and the delay elements control voltage is changed depending on the phase error.



## Burst mode operation (II)



## Simulation



- ST 90nm CMOS
- 50cm cat5E UTP cable
- 5cm microstrip traces
- LVDS pads
- ESD protection circuits
- LVDS drivers available from ST 90nm library
- connectors
- simulated for all technology process corners
- temperature range 0-80°C
- 5% variation in Supply Voltage

# Simulation results

Signals involved in the clock recovery when the loop is locked



### Outline

- Introduction: AER, a technology for building large scalable neuromorphic systems
- Some useful circuits: calibration
  - LVDS interface
- Some example systems at IMSE:
  - spatial contrast retina
    - mixed-mode convolution chip
    - fully digital convolution chip
- HW Tools from Sevilla: some FPGA-based PCBs
  - example use in CAVIAR
- SW Tool: Behavioral Matlab Simulator
  - Example 1: neocognitron emulation
  - Example 2: texture classification



- Retina with AER output
- Output frequency proportional to instantaneous *Spatial Contrast*
- *Spatial Contrast* computation not limited to nearest neighbors
- Fully Asynchronous output (no frames)
- low mismatch (FPN)



- Retina with AER output
- Output frequency proportional to instantaneous *Spatial Contrast*
- *Spatial Contrast* computation not limited to nearest neighbors
- Fully Asynchronous output (no frames)
- low mismatch (FPN)



- Retina with AER output
- Output frequency proportional to instantaneous *Spatial Contrast*
- *Spatial Contrast* computation not limited to nearest neighbors
- Fully Asynchronous output (no frames)
- low mismatch (FPN)

 $OutputFreq = f(I_{photo}(x, y), I_{neighbours})$ 



use of diffusers

- Retina with AER output
- Output frequency proportional to instantaneous *Spatial Contrast*
- *Spatial Contrast* computation not limited to nearest neighbors
- Fully Asynchronous output (no frames)
- low mismatch (FPN)



- Retina with AER output
- Output frequency proportional to instantaneous *Spatial Contrast*
- *Spatial Contrast* computation not limited to nearest neighbors
- Fully Asynchronous output (no frames)
- low mismatch (FPN)

- each pixel decides when to generate an event
- there is no global periodic reset (no frames)



in-pixel calibration

- Retina with AER output
- Output frequency proportional to instantaneous *Spatial Contrast*
- *Spatial Contrast* computation not limited to nearest neighbors
- Fully Asynchronous output (no frames)
- low mismatch (FPN)

### **Calibration Technique**

Active Current Generation



- Active current sources, controlled digitally
- Can be used as a Current DAC

### **Digi-MOS**







### Example Layout for 0.35µm CMOS 5-bit digi-MOS



• unit transistor  $W = L = 3\mu m$ 



#### **Spatial Contrast Computation**

Michelson Contrast:  $I_{cont}(x, y) = I_{ref} \frac{I_{photo}(x, y) - I_{avg}(x, y)}{I_{photo}(x, y) + I_{avg}(x, y)}$ 

Weber Contrast: 
$$I_{cont}(x, y) = I_{ref} \frac{I_{photo}(x, y) - I_{avg}(x, y)}{I_{avg}(x, y)}$$

Simple Ratio Contrast: 
$$I_{cont}(x, y) = I_{ref} \frac{I_{avg}(x, y)}{I_{photo}(x, y)}$$

### Calibrating for Mismatch

Sums/Subtractions & Multiplications/Division:

$$I_{o} = I_{1} \frac{I_{2} - I_{3}}{I_{4}} \rightarrow I_{o} + \Delta_{o} = (I_{1} + \Delta_{1}) \frac{(I_{2} + \Delta_{2}) - (I_{3} + \Delta_{3})}{(I_{4} + \Delta_{4})}$$

$$I_o + \Delta_o \approx \frac{I_1 I_2}{I_4} (1 + \Delta_1 + \Delta_2 - \Delta_4) - \frac{I_1 I_3}{I_4} (1 + \Delta_1 + \Delta_3 - \Delta_4)$$

### Calibrating for Mismatch

Sums/Subtractions & Multiplications/Divisions:

$$I_{o} = I_{1} \frac{I_{2} - I_{3}}{I_{4}} \rightarrow I_{o} + \Delta_{o} = (I_{1} + \Delta_{1}) \frac{(I_{2} + \Delta_{2}) - (I_{3} + \Delta_{3})}{(I_{4} + \Delta_{4})}$$
$$I_{o} + \Delta_{o} \approx \frac{I_{1}I_{2}}{I_{4}} (1 + \Delta_{1} + \Delta_{2} - \Delta_{4}) - \frac{I_{1}I_{3}}{I_{4}} (1 + \Delta_{1} + \Delta_{3} - \Delta_{4})$$

**Only Multiplications/Divisions:** 

$$I_{o} = I_{1}\frac{I_{2}}{I_{4}} \rightarrow I_{o} + \Delta_{o} = (I_{1} + \Delta_{1})\frac{(I_{2} + \Delta_{2})}{(I_{4} + \Delta_{4})}$$
$$I_{o} + \Delta_{o} \approx \frac{I_{1}I_{2}}{I_{4}}(1 + \Delta_{1} + \Delta_{2} - \Delta_{4})$$
only one calibration current per pixel













$$f_{cont}(x, y) = \frac{I_{ref}}{C(V_{reset} - V_{ref})} \frac{I_{avg}(x, y)}{I_{photo}(x, y)}$$







### CMOS test prototype in AMS 0.35µm

| array size               | 32 x 32                     |
|--------------------------|-----------------------------|
| pixel size               | 58μm x 56μm                 |
| pixel components         | 104 transistors + 1         |
|                          | capacitor                   |
| photodiode quantum       | 0.34 @ 450nm                |
| efficiency               |                             |
| fill factor              | 3%                          |
| pixel current            | 20nA @ 1keps, 1nA @         |
| consumption              | standby                     |
| matching before          | 57%                         |
| calibration              |                             |
| (indoor light)           |                             |
| matching after           | 6.6%                        |
| calibration              |                             |
| (indoor light)           |                             |
| contrast sensitivity     | 10 Hz / % relative contrast |
|                          | @ 400Hz DC                  |
| range of diffusers       | ~10 pixels                  |
| noise standard deviation | ~6% fluctuation of spike    |
|                          | rate                        |
| dark current             | ~500fA                      |
| Handshaking cycle        | 15ns/ev (shorting Ack       |
|                          | and Rqst)                   |
|                          | _                           |


#### Pixel Layout







### Uncalibrated

#### indoor illumination

# bright illumination



### Calibrated for indoor



### Outline

- Introduction: AER, a technology for building large scalable neuromorphic systems
- Some useful circuits: calibration
  - LVDS interface
- Some example systems at IMSE: spa
- spatial contrast retina
  - mixed-mode convolution chip
  - fully digital convolution chip
- HW Tools from Sevilla: some FPGA-based PCBs
  - example use in CAVIAR
- SW Tool: Behavioral Matlab Simulator
  - Example 1: neocognitron emulation
    - Example 2: texture classification

### **AER Convolution Chip**



Pixel



Pixel







Pixel









### **Pixel Calibration**



• pixel current pulses may range from  $\sim 1$  pA to  $\sim 1$  µA

### Fabricated Chip



### Pixel Layout



- pixel size 90µm x 90µm
- 364 transistors + 1 capacitor
- kernel weight resolution: 4 bit
- calibration register resolution: 5 bit
- interpixel mismatch (after calibration): < 2%



















### PCB with 4 32x32 Conv. Chips + Event Routing





0

>>1

1)

V U

144



#### *kernel* {-3,+3,+7}





#### *kernel* {-3,+3,+7}





Short Frame Time (0.05ms)

Long Frame Time (150ms)

### Outline

- Introduction: AER, a technology for building large scalable neuromorphic systems
- Some useful circuits: calibration
  - LVDS interface
- Some example systems at IMSE:
  - spatial contrast retina
    - mixed-mode convolution chip
    - fully digital convolution chip
- HW Tools from Sevilla: some FPGA-based PCBs
  - example use in CAVIAR
- SW Tool: Behavioral Matlab Simulator
  - Example 1: neocognitron emulation
  - Example 2: texture classification

# Fully Digital Convolution Chip (I): Architecture

- Array of pixels (Digital)
- Random Access Memory (Kernel programmed)
- Horizontal Shift Block
- 2's complement
- Synchronous controller (input communication)
- AER block (output communication)
- Configuration registers



# Fully Digital Convolution Chip (II): The Pixel

- Arithmetic unit
- Accumulator (18 bits)
- Comparator
- AER output communication



## Fully Digital Convolution Chip (III): Layout





Photograph of the fabricated chip

### **Experimental Results (I): Single Chip Configuration**



Input image



Measured output

High-Pass Kernel



Ideal output

### **Experimental Results (II): Multichip Configuration**



### **Experimental Results (III): Multichip Configuration**



Input image





Gabor

edge-extraction



### Outline

- Introduction: AER, a technology for building large scalable neuromorphic systems
- Some useful circuits: calibration
  - LVDS interface
- Some example systems at IMSE:
- spatial contrast retina
- mixed-mode convolution chip
- fully digital convolution chip
- HW Tools from Sevilla: some FPGA-based PCBs
  - example use in CAVIAR
- SW Tool: Behavioral Matlab Simulator
  - Example 1: neocognitron emulation
  - Example 2: texture classification

### PCI-AER



- sequences AEs from the computer out to the AER port
- transforms video-frames to AER in real-time (uses rate coding)
- captures and timestamps AEs from the AER port into the computer
- peak rate 15Meps, sustained rate 10Meps.
- FPGA: Spartan-II
- performs bus mastering

### **USB-AER**



- USB connection to computer
- <u>sequencer</u>: either frame-AER or recorded AE-player (up to 25Meps)
- <u>monitor</u>: either AER-frame or timestamping and data-logging (up to 25Meps)
- data logging/playing up to 512 Kevents (very useful for multi-lab experiments)
- <u>mapper</u> (stand-alone mode) 25Meps; mappings from 1-1 up to 1-8
- VGA output
- firmware loaded through USB or MMC/SD card

### Splitter/Merger



- uses a CPLD as the communication center
- splitter: 1 to 4
- merger: 4 to 1
- reconfigured by jumpers
- delay introduced: 20ns

### **USB2AER**



- uses high speed USB2 (up to 6Meps between AER-port and computer)
- only functionalities: AE monitor & AE sequencer (AER-port to/from computer-USB2)
- monitoring & sequencing can be simultaneous
- no FPGA, just a CPLD (timestamping) and a microcontroller (for USB traffic management)
- USB powered
- compatible with jAER viewers and Matlab

### AER-Robot



- for controlling motors directly from an AER bus
- each PCB has 4 motor connectors



### The CAVIAR Vision System













- 4-layer system
- 45k neurons
- up to 5M synapses
- 12Geps
- 1-3ms latency for tracking
- scalable w/o performance degration



### Latency Measurement


#### Outline

- Introduction: AER, a technology for building large scalable neuromorphic systems
- Some useful circuits: calibration
  - LVDS interface
- Some example systems at IMSE:
- spatial contrast retina
- mixed-mode convolution chip
- fully digital convolution chip
- HW Tools from Sevilla: some FPGA-based PCBs
  - example use in CAVIAR
- SW Tool: Behavioral Matlab Simulator
  - Example 1: neocognitron emulation
  - Example 2: texture classification

# **CORTICAL TISSUE**





- Potentially Very High Computational Power: 2Mneurons, 32Gsynapses, 238Tconn/sec
- ¿How to reconfigure?
- ¿What hierarchies and structures?
- ¿What kernels?
- We need theories for implementing desired functionalities (hopefully before the HW is available)

#### MATLAB based AER Behavioral Simulator



500000

0



%First, we declare sources to the system
% SOURCES SOURCES DATA
sources {1} {data1}

%Next, we declare priorities
priorities {0.9,0.8,0.7,0.6,0.5,0.4,0.3,0.2}

| onexe, we decidie proc |
|------------------------|
|------------------------|

| 8NAME      | IN-CHANNELS | OUT-CHANNELS | PARAMS               | STATES   |
|------------|-------------|--------------|----------------------|----------|
| splitter   | {1}         | {2,4}        | <pre>{params1}</pre> | {state1} |
| h_sobel    | {2}         | {3}          | {params2}            | {state2} |
| imrotate90 | {4}         | (5)          | {params3}            | (state3) |
| h_sobel    | (5)         | (6)          | {params4}            | {state4} |
| imrotate90 | (6)         | {7}          | <pre>{params5}</pre> | {state5} |
| merger     | {3,7}       | (8)          | {params6}            | (state6) |
| ack        | (8)         | ()           | {params7}            | {state7} |

| _ |         |                 |        |    |    |    | _ |
|---|---------|-----------------|--------|----|----|----|---|
|   | 2,50000 | 2 <i>5</i> 0000 | 250010 | 45 | 29 | 1  |   |
|   | 291600  | 2916.50         | 291660 | 44 | 29 | -1 |   |
|   | 291650  | 291700          | 291710 | 43 | 30 | 1  |   |
|   | 399750  | 3997.50         | 399760 | 45 | 32 | 1  |   |
|   | 399800  | 399800          | 399810 | 44 | 42 | 1  |   |
|   | 399830  | 3998.50         | 399860 | 43 | 28 | -1 |   |
|   | 399900  | 399900          | 399910 | 23 | 40 | 1  |   |
|   | 399950  | 3999.50         | 399960 | 9  | 38 | 1  |   |
|   | 400000  | 0               | -1     | 2  | 41 | 1  |   |
|   | 400050  | 0               | -1     | з  | 42 | -1 |   |
|   | 400100  | 0               | -1     | 23 | 5  | 1  |   |
|   | 400150  | 0               | -1     | 26 | 32 | 1  |   |
|   | 400200  | 0               | -1     | 44 | 28 | 1  |   |
|   | 400250  | 0               | -1     | 45 | 30 | 1  |   |
|   | 400300  | 0               | -1     | 45 | 34 | -1 |   |

-1

sign

1

1

-1

1

1

y

29

29

29

30

29

Read Netlist & Conf. Find channel with 1st PreRqst Call AER module Write events on out channels Find channel with next PreRqst

45

### Multi-Chips Multi-Layer Processing Systems Neocognitron & Convolution Neural Networks





K. Fukushima 1969

Applied to handwritten character recognition

Example: Simplified Neocognitron



- 4 layers, 68 convolution modules
- inputs 16x16 b&w pixels
- 7 output categories

#### Large kernels









### **Texture Classification**



#### Detecting People & displaying using jAER



## Conclusions

- AER has high potential for building complex neurocortical hierarchies.
- A variety of AER sensors are available.
- With present day technology it is feasible to build programmable & reconfigurable "Cortical Tissues" with millions of neurons, billions of synapses, and Tconn/sec.
- We need to develop knowledge for configuring, programming, and training optimally such systems.