
 
 
 
|
|
Introduction
|
|
In this lab, you'll be developing a pipelined VHDL model for a pipelined
ASM chart of the DCT chip and estimate the performance.
In lab2, the performance of the most sequential version of our DCT chip
is approximately 208,000 ns. In its ASM chart, we find that the performance
obstacle is the loop iteration. Therefore, we have looked at these 3
optimization techniques to speed up the computation:
- Loop Unrolling: increase the parallelism in the model.
- Chaining: utilize the idle time of fast components and to decrease
the number of states in the loop.
- Multicycling: use more than 1 states to compute slow components
while running short delay components faster using short clock cycle.
There is a shortcoming of these techniques: note that when we are doing
P = A * B, the read-memory components are idle. When we are doing
Sum = Sum + P, the multiplier is idle. The loop goes back to a new
iteration only when it finishes the last state in the current loop.
The loop pipelining technique increase the concurrency in the ASM
chart: the next pair of matrix elements (A and B) to be multiplied
does not necessarily need to enter the loop after the previous pair
has finished. So while a set of data is being added to sum (Sum=Sum+P)
the next set of data can be multiplied (P=A*B). We can potentially
speed up our DCT computation based on this observation.
However, care must be taken when you are designing a real pipelined model.
In this lab, we are going to look at the techniques of VHDL pipeline modeling.
the VHDL code for the sequential design is also provided below. You are
responsible for improving performance of the present ASM chart, developing
a new pipelined ASM chart and its VHDL model.
|
Sequential Design
|
|
Figure 1: ASM Chart for the Sequential Design
 |
Figure 2: Illustration of Indexing
 |
As shown, mapping from behavioral model
to the most sequential design is pretty straight forward. Basically,
we allocate 1 clock cycle for each set of operations that do not
depend on each other.
If you look at the source listing, you will see the VHDL process used to
model the ASM chart. The behaviors associated with each block in ASM
chart are grouped into a state. There is a variable ``State'' to keep
track of where the current state is. The process is activated on
every rising edge of the clock. Only those operations in the
current state are excuted in the present clock cycle.
|
More Optimizations
|
|
As we discussed in class, several optimization techniques can be
applied to the sequential design in order to speed it up. For this
lab, we will be doing Loop Pipelining. Note that loop pipelining can
be combined with the optimizations done in previous lab like
unrolling, chaining etc. to improve performance further.
The pipeline for matrix multiplication example has 4 stages. These
are:
- A = a[i][k],
B = b[k][j]
- P = A * B
- if (k=0) then Sum = P
else Sum = P + Sum
- if (k=7) then c[i][j] = Sum,
if (count=511) then done=1
Figure 3: ASM chart for pipelined matrix multiplication
 |
The ASM chart for the pipelined matrix multiplication alternative is
shown in Figure 3. Note how the states for the
non-pipelined case become stages for the pipelined model. The ASM
chart given above, however, does not describe the complete pipelined
design since it just shows that the stages and the computation done in
each state when the pipeline if full. It does show the use of extra
counters, viz., count1, count2 and count3. Since there are 4
sets of data in the pipeline, we need to keep track of the indices of
these sets of data and hence need the registers. The pipeline timing
diagram makes things more clear.
Figure 4: Timing Diagram for pipelined matrix multiplication example
 |
The timing diagram for the pipelined matrix multiplcation example is given
in Figure 4. Note how the initial 3 states, S1, S2 and S3
fill up the pipeline, how all the 4 stages are active in State S4 and
how the States S5, S6 and S7 flush the pipeline.
|
Assignment
|
|
- Study the ASM chart and the VHDL code for sequential design given
in the handout and explained in the lecture. Understand the
ASM chart, reservation chart and the VHDL code for the matrix
multiplication example.
- Redo the ASM chart for the DCT generator by applying loop
pipelining optimization techniques to obtain performance improvement.
- Derive the VHDL model from your pipelined ASM chart.
- Estimate the performance of your design.
- Your report: The write-up should include the following
- Your pipelined ASM chart and a brief interpretation of it.
- Complete source listing of your pipelined ASM chart.
- Estimated performance of your design and the waveform that
can verify your estimated performance, e.g., if you estimate
your performance in the loop computation to be 10,000 ns,
then you should submit a waveform that shows that the time
when the "Done" signal goes from 0 to 1 is approximately
10,000 ns.
- List assumptions, if you need to make any.
- Give an estimate of the number of hours you spent inside and
outside the lab, including writing the report.
- Put your name and student ID on all submissions.
|
Source Listing
|
|
Source listing for pipelined implementation of the Matrix Multiplication
example is given below. The
matrix.vhd
file multiplies two 8 × 8 matrices that have initialized
with some arbitrary integers.
The test bench compares the results of the matrix muplicator to the
correct results. Note how the states S1, S2, S3 fill up the
pipline and how the states S5, S6, S7 flush the pipeline. In state
S4 the pipeline is full and all stages are simultaneously active
(ofcourse, on different sets of data).
|
|