## EE466

## VLSI Design

## Final Exam

Dec. 12, 2018. (3:10pm - 5:10pm)
Instructor: Dae Hyun Kim (daehyun@eecs.wsu.edu)

|  | Name: |  |
| :---: | :---: | :---: |
|  | WSU ID: |  |
| Problem | Points |  |
| 1 | 20 |  |
| 2 | 10 |  |
| 3 | 10 |  |
| 4 | 20 |  |
| 5 | 10 |  |
| 6 | 10 |  |
| 7 | 40 |  |
| 8 | 50 |  |
| Total | 170 |  |

* Allowed: Textbooks, cheat sheets, class notes, notebooks, calculators, watches, electronic devices.
* Not allowed: Chat apps.


## Problem \#1 (Sequential Logic, 20 points)

Schematic A below shows an explicit-pulsed D flip-flop. The signal strength of input $D$ is unknown (e.g., someone designed a circuit and its output is connected to input $D$ ).

<Schematic A>
Since we have no clue on the strength of input $D$, we suggest the following D-F/F design. We can properly size the two inverters in the dotted rectangle.

<Schematic B>
Question: Compare Schematic A and Schematic B (quantitatively and/or qualitatively) in terms of 1) setup time constraint, 2) hold time constraint, 3) clock-to-Q delay, and 4) output slew ( $\Delta V_{\bar{Q}} / \Delta t$ ) where $V_{\bar{Q}}$ is the voltage at the output node $\bar{Q}$.

1) Setup time constraint: $t_{\text {setup }, A}<>=t_{\text {setup }, B}$

The input of the transmission gate should be stable for $t_{\text {setup }}$ before a clock rising edge. Thus, $t_{\text {setup }, A}=t_{\text {setup }}$. For Schematic B, the input D should be stable for $t_{\text {setup }}+2 t_{\text {inv }}$ before a clock rising edge ( $t_{i n v}$ is the delay of an inverter).
2) Hold time constraint: $t_{\text {hold }, A}<\nabla=t_{\text {hold }, B}$

The input of the transmission gate should be stage for $t_{\text {hold }}$ after a clock rising edge. Thus, $t_{\text {hold,A }}=t_{\text {hold }}$. For Schematic B, a change at input D does not immediately lead to a change at the input value of the transmission gate. Thus, $t_{\text {hold, } B}=t_{\text {hold }}-2 t_{\text {inv }}$.
3) Clock-to-Q delay: $t_{C-Q, A}<>=t_{C-Q, B}$

It depends on the strength of input $D$. If the driver of input $D$ in Schematic $A$ is weaker than that in Schematic $\mathrm{B}, t_{C-Q, A}>t_{C-Q, B}$ will hold. If the driver of input D in Schematic A is stronger than that in Schematic $\mathrm{B}, t_{C-Q, A}<t_{C-Q, B}$ will hold.
4) Output slew: $s_{A}<>=s_{B}$

For the same reason as the answer for 3), the output slew is dependent on the strength of the driver of input D.

## Problem \#2 (DC Analysis of a Domino Logic, 10 points)

The left figure shows a domino-logic-based buffer design and the right figure shows a DC curve of the inverter in the schematic. $\mu_{n}=2 \mu_{p}$. $w$ : Transistor minimum width.



Draw (rough sketches will be accepted) three DC curves (x-axis: $V_{A}, \mathrm{y}$-axis: $V_{F}$ ) for the buffer for

1) $w_{p}=w_{n}=w$
2) $w_{p}=2 w, w_{n}=w$
3) $w_{p}=w, w_{n}=2 w$


## Problem \#3 (Sequential Logic, 10 points)

The following shows a schematic of a positive-edge triggered D-F/F.


Describe how you can estimate the hold time constraint of the F/F above. (Hold time constraint: input $D$ should be stable (should not change) for some time after a clock rising edge.)

Input D should be stable until the output of the third inverter in the inverter chain becomes 0 after each clock rising edge. Thus, the hold time constraint is approximately $3 \Delta_{\text {inv }}$ where $\Delta_{\text {inv }}$ is the delay of an inverter.

## Problem \#4 (Carry Select Adder, 20 points)

The following shows a schematic of a $2 k$-bit adder designed using $k$-bit carry select adders. The delay of a $k$-bit adder is $k \Delta_{F A}$, the delay of a $k$-bit MUX is $k \Delta_{M}$, and the delay of a two-input AND (or OR) gate is $\Delta_{M}$.


1) We are supposed to design an $N$-bit adder using carry select adders (\# groups: $\frac{N}{k}$ ). Find $k$ minimizing the delay of the $N$-bit adder (express the optimal $k$ as a function of $N, \Delta_{M}$, and $\Delta_{F A}$ ). Notice that the worst-case delay occurs at $C_{N}$ (the final carry out) or $S_{N-1: 0}$ (the final sum).

$$
\begin{gathered}
\tau=k \Delta_{F A}+\left(\frac{N}{k}-2\right) \cdot 2 \Delta_{M}+k \Delta_{M} \\
\frac{d \tau}{d k}=\Delta_{F A}-\frac{2 N \Delta_{M}}{k^{2}}+\Delta_{M}=0 \\
\therefore k=\sqrt{\frac{2 N \Delta_{M}}{\Delta_{F A}+\Delta_{M}}}
\end{gathered}
$$

2) Now, the $k$-bit adders are designed using conditional sum adders, so the delay of a $k$-bit adder is $\Delta_{M} \cdot \ln k$ instead of $k \Delta_{F A}$. Find $k$ minimizing the delay of the new $N$-bit adder (express the optimal $k$ as a function of $N$ and $\Delta_{M}$ ).

$$
\begin{gathered}
\tau=\Delta_{M} \cdot \ln k+\left(\frac{N}{k}-2\right) \cdot 2 \Delta_{M}+k \Delta_{M} \\
\frac{d \tau}{d k}=\frac{\Delta_{M}}{k}-\frac{2 N \Delta_{M}}{k^{2}}+\Delta_{M}=0
\end{gathered}
$$

$$
\therefore k=\frac{-1+\sqrt{1+8 N}}{2}
$$

## Problem \#5 (Carry Skip Adder, 10 points)

The following diagram shows a 16-bit carry-skip adder designed using 4-bit adders.


To improve the speed of the carry-skip adder, we replace the 4-bit adders (4-bit blocks designed using 4-bit ripple-carry adders) by 4-bit conditional sum adders. The delay of each multiplexer step in the conditional sum adders is $\Delta_{M}$ (so, if all the operands are available at time 0 , the delay of a 4-bit conditional sum adder is $3 \Delta_{M}$.) The delay of each 2:1 MUX in the schematic above is $\Delta_{M}$. The delay of each $p_{i: i-3}$ is $2 \Delta_{M}$. Calculate the delay of the new 16 -bit carry-skip adder.

Delay of the carry-out of any 4-bit adder block $=3 \Delta_{M}$
Delay of any $p_{i+3: i}=2 \Delta_{M}$
Delay of $c_{4}=\operatorname{MAX}\left(3 \Delta_{M}, 2 \Delta_{M}\right)+\Delta_{M}=4 \Delta_{M}$
Delay of $c_{8}=4 \Delta_{M}+\Delta_{M}=5 \Delta_{M}$
Delay of $c_{12}=5 \Delta_{M}+\Delta_{M}=6 \Delta_{M}$
Delay of $c_{16}=6 \Delta_{M}+\Delta_{M}=7 \Delta_{M}$
Delay of $s_{15: 12}=6 \Delta_{M}+\Delta_{M}=7 \Delta_{M}$
Thus, the total delay is $7 \Delta_{M}$.

## Problem \#6 (Carry Skip Adder, 10 points)

To radically improve the delay of a carry-skip adder, we design an $N$-bit carry-skip adder as follows:


We design the $k$-bit adder blocks using $k$-bit conditional sum adders where $k$ is $2 m^{i}$ ( $i$ is an integer greater than or equal to 0 ). We also design the group-propagation signal $p_{i: j}$ using OR gates hierarchically. The following shows the delays of the components:

- $k$-bit adder: $(1+\ln k) \cdot \Delta_{M}$
- $k$-bit group-propagation signal $p_{i: i-k+1}:\left(2+\ln \frac{k}{4}\right) \cdot \Delta_{M}$
- 2-bit MUX: $\Delta_{M}$

Find $m(m \geq 2)$ minimizing the total delay of an $N$-bit carry-skip adder designed using the new architecture shown above.

Delay of the carry-out of the first 2-bit adder block $=(1+\ln 2) \Delta_{M}$
Delay of $p_{1: 0}=(2+\ln 0.5) \Delta_{M}=(2-\ln 2) \Delta_{M}$
Delay of $c_{2}=\operatorname{MAX}\left((1+\ln 2) \Delta_{M},(2+\ln 0.5) \Delta_{M}\right)+\Delta_{M}=(2+\ln 2) \Delta_{M}$

Delay of the carry-out of the second $2 m$-bit adder block $=(1+\ln 2 m) \Delta_{M}=(1+\ln 2+$ $\ln m) \Delta_{M}$

Delay of $p_{2 m+1: 2}=\left(2+\ln \frac{2 m}{4}\right) \cdot \Delta_{M}=(2-\ln 2+\ln m) \cdot \Delta_{M}$
Delay of $c_{2 m+2}=\operatorname{MAX}\left((1+\ln 2+\ln m) \Delta_{M},(2-\ln 2+\ln m) \cdot \Delta_{M}\right)+\Delta_{M}=(2+\ln 2+$ $\ln m) \Delta_{M}$

Delay of the carry-out of the third $2 m^{2}$-bit adder block $=\left(1+\ln 2 m^{2}\right) \Delta_{M}=(1+\ln 2+$ $2 \ln m) \Delta_{M}$

Delay of $p_{2 m^{2}+2 m+1: 2 m+2}=\left(2+\ln \frac{2 m^{2}}{4}\right) \cdot \Delta_{M}=(2-\ln 2+2 \ln m) \cdot \Delta_{M}$
Delay of $c_{2 m^{2}+2 m+2}=\operatorname{MAX}\left((1+\ln 2+2 \ln m) \Delta_{M},(2-\ln 2+2 \ln m) \cdot \Delta_{M}\right)+\Delta_{M}=(2+$ $\ln 2+2 \ln m) \Delta_{M}$

Delay of $c_{N}=(2+\ln 2+x \ln m) \Delta_{M}$ where $x$ satisfies

$$
\begin{gathered}
2+2 m+\cdots+2 m^{x}=N \\
2+2 m+\cdots+2 m^{x}=\frac{2\left(m^{x+1}-1\right)}{m-1}=N \\
x=\frac{\ln \left(1+\frac{N(m-1)}{2}\right)}{\ln m}-1
\end{gathered}
$$

Thus, the delay of
$c_{N}=\left(2+\ln 2+\ln \left(1+\frac{N(m-1)}{2}\right)-\ln m\right) \Delta_{M}=\left(2+\ln 2+\ln \left(\frac{N m-N+2}{2 m}\right)\right) \Delta_{M}$
(The MSB sum has the same delay.)
To minimize the delay of $c_{N}, \mathrm{~m}=\mathbf{2}$.

## Problem \#7 (Prefix Adder, 40 points)

Use the following delay values:

- AND, OR, XOR: $\Delta$
- Two-level (sum-of-product) logic: $2 \Delta$
- $g_{i}=a_{i} \cdot b_{i}, p_{i}=a_{i} \oplus b_{i}$

We are designing a 1024-bit Kogge-Stone adder.

1) Represent $S_{999}$ hierarchically using group-generated and group-propagated carries ( $g_{i: k}, p_{i: k}$ ) and $c_{0}$ (primary carry-in), then compute the delay to compute $s_{999}$ assuming all the primary input signals are available at time 0 (10 points).

$$
\begin{gathered}
s_{999}=p_{999} \oplus c_{999} \\
c_{999}=g_{998: 0}+p_{998: 0} \cdot c_{0} \\
g_{998: 0}=g_{998: 487}+p_{998: 487} \cdot g_{486: 0} \\
g_{998: 487}=g_{998: 743}+p_{998: 743} \cdot g_{742: 487} \\
g_{998: 743}=g_{998: 871}+p_{998: 871} \cdot g_{870: 743} \\
g_{998: 871}=g_{998: 935}+p_{998: 935} \cdot g_{934: 871} \\
g_{998: 935}=g_{998: 967}+p_{998: 967} \cdot g_{966: 935} \\
g_{998: 967}=g_{998: 983}+p_{998: 983} \cdot g_{982: 967} \\
g_{998: 983}=g_{998: 991}+p_{998: 991} \cdot g_{990: 983} \\
g_{998: 991}=g_{998: 995}+p_{998: 995} \cdot g_{994: 991} \\
g_{998: 995}=g_{998: 997}+p_{998: 997} \cdot g_{996: 995} \\
g_{998: 997}=g_{998}+p_{998} \cdot g_{997}
\end{gathered}
$$

Delay $=\Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+\Delta=24 \Delta$
2) Represent $s_{768}$ hierarchically using group-generated and group-propagated carries ( $g_{i: k}, p_{i: k}$ ) and $c_{0}$ (primary carry-in), then compute the delay to compute $s_{768}$ assuming all the primary input signals are available at time 0 (10 points).

$$
\begin{gathered}
s_{768}=p_{768} \oplus c_{768} \\
c_{768}=g_{767: 0}+p_{767: 0} \cdot c_{0} \\
g_{767: 0}=g_{767: 256}+p_{767: 256} \cdot g_{255: 0} \\
g_{767: 256}=g_{767: 512}+p_{767: 512} \cdot g_{511: 256} \\
g_{767: 512}=g_{767: 640}+p_{767: 640} \cdot g_{639: 512} \\
g_{767: 640}=g_{767: 704}+p_{767: 704} \cdot g_{703: 640} \\
g_{767: 704}=g_{767: 736}+p_{767: 736} \cdot g_{735: 704} \\
g_{767: 736}=g_{767: 752}+p_{767: 752} \cdot g_{751: 736} \\
g_{767: 752}=g_{767: 760}+p_{767: 760} \cdot g_{759: 752} \\
g_{767: 760}=g_{767: 764}+p_{767: 764} \cdot g_{763: 760} \\
g_{767: 764}=g_{767: 766}+p_{767: 766} \cdot g_{765: 764} \\
g_{767: 766}=g_{767}+p_{767} \cdot g_{766}
\end{gathered}
$$

Delay $=\Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+\Delta=24 \Delta$
3) Calculate the total gate area to build the 1024-bit Kogge-Stone adder. Use the following area values. Notice that you should generate all sum ( $s_{1023: 0}$ ) and carry-out ( $c_{1024}$ ) signals (10 points).

- Two-input AND, OR gate: $k$
- Two-input XOR: $4 k$
- $g_{i}=a_{i} \cdot b_{i}, p_{i}=a_{i} \oplus b_{i}$

For position $j, s_{j}=p_{j} \oplus c_{j}$, so we need an XOR for each $s_{j}$. (1024*4k)
For position $j$, we should generate $g_{j}$ and $p_{j}$, so we need an AND and an XOR for each $j$. (1024*5k)

A unit merging $g_{i}, p_{i}, g_{j}, p_{j}$ needs one OR and two AND gates.
Now, let's count how many those units we need.
Level 1: $g, p_{1023: 1022}, \ldots, g, p_{1: 0}=>1023$
Level 2: $g, p_{1023: 1020}, \ldots, g, p_{2: 0}=>1022$
Level 3: $g$, $p_{1023: 1016}, \ldots, g, p_{4: 0}=>1020$
Level 4: $g, p_{1023: 1008}, \ldots, g, p_{8: 0}=>1016$
Level 5: $g, p_{1023: 992}, \ldots, g, p_{16: 0}=>1008$
Level 6: $g, p_{1023: 960}, \ldots, g, p_{32: 0}=>992$
Level 7: $g, p_{1023: 896}, \ldots, g, p_{64: 0}=>960$
Level 8: $g, p_{1023: 768}, \ldots, g, p_{128: 0}=>896$
Level 9: $g, p_{1023: 512}, \ldots, g, p_{256: 0}=>768$
Level 10: $g, p_{1023: 0}, \ldots, g, p_{512: 0}=>512$
so we need 9217 merging units. (9217*3k=27651k)
$c_{j}=g_{j-1: 0}+p_{j-1: 0} \cdot c_{0}$, so we need an AND and an OR to generate $c_{j}$. (1024*2k)
Thus, the total area $=1024 * 4 \mathrm{k}+1024 * 5 \mathrm{k}+27651 \mathrm{k}+1024 * 2 \mathrm{k}=\mathbf{3 8 , 9 1 5} \mathrm{k}$
4) When we designed a Kogge-Stone adder, we used an XOR gate to calculate $p_{i}=a_{i} \oplus b_{i}$. Prove that we can also use $a_{i}+b_{i}$ ( + is an OR operation) for $p_{i}$, i.e., prove that replacing $p_{i}=a_{i} \oplus b_{i}$ by $p_{i}=a_{i}+b_{i}$ does not change the final sum and carry-out values. (10 points).

The only difference between $a_{i}+b_{i}$ and $a_{i} \oplus b_{i}$ is when both $a_{i}$ and $b_{i}$ are 1 . For position $i$, the carry-out is $c_{i}=g_{i}+p_{i} c_{i-1}$. If the OR is used for $p_{i}, c_{i}=1+c_{i-1}=1$ if both $a_{i}$ and $b_{i}$ are 1 . If the XOR is used for $p_{i}, c_{i}=1+0=1$. Thus, they have the same carry out value. Thus, the replacement does not change the final sum and carry-out values.

## Problem \#8 (Carry Look-Ahead Adder, 50 points)

The max. fanout is 4 . Use the following delay values:

- AND, OR, XOR: $\Delta$
- Two-level (sum-of-product) logic: $2 \Delta$
- $g_{i}=a_{i} \cdot b_{i}, p_{i}=a_{i} \oplus b_{i}$

We are designing a 1024-bit carry look-ahead adder.

1) Represent $s_{999}$ hierarchically using group-generated and group-propagated carries ( $g_{i: k}, p_{i: k}$ ) and $c_{0}$ (primary carry-in), then compute the delay to compute $s_{999}$ assuming all the primary input signals are available at time 0 (10 points).

$$
\begin{gathered}
s_{999}=p_{999} \oplus c_{999} \\
c_{999}=g_{998}+p_{998} \cdot g_{997}+p_{998} \cdot p_{997} \cdot g_{996}+p_{998} \cdot p_{997} \cdot p_{996} \cdot c_{996} \\
c_{996}=g_{995: 992}+p_{995: 992} \cdot c_{992} \\
c_{992}=g_{991: 976}+p_{991: 976} \cdot g_{975: 960}+p_{991: 976} \cdot p_{975: 960} \cdot c_{960} \\
c_{960}=g_{959: 896}+p_{959: 896} \cdot g_{895: 832}+p_{959: 896} \cdot p_{895: 832} \cdot g_{831: 768}+p_{959: 896} \cdot p_{895: 832} \cdot p_{831: 768} \\
\cdot c_{768} \\
c_{768}=g_{767: 512}+p_{767: 512} \cdot g_{511: 256}+p_{767: 512} \cdot p_{511: 256} \cdot g_{255: 0}+p_{767: 512} \cdot p_{511: 256} \cdot p_{255: 0} \\
\cdot c_{0} \\
g_{255: 0}=g_{255: 192}+p_{255: 192} \cdot g_{191: 128}+p_{255: 192} \cdot p_{191: 128} \cdot g_{127: 64}+p_{255: 192} \cdot p_{191: 128} \cdot p_{127: 64} \\
\cdot g_{63: 0} \\
g_{63: 0}=g_{63: 48}+p_{63: 48} \cdot g_{47: 32}+p_{63: 48} \cdot p_{47: 32} \cdot g_{31: 16}+p_{63: 48} \cdot p_{47: 32} \cdot p_{31: 16} \cdot g_{15: 0} \\
g_{15: 0}=g_{15: 12}+p_{15: 12} \cdot g_{11: 8}+p_{15: 12} \cdot p_{11: 8} \cdot g_{7: 4}+p_{15: 12} \cdot p_{11: 8} \cdot p_{7: 4} \cdot g_{3: 0} \\
g_{3: 0}=g_{3}+p_{3} \cdot g_{2}+p_{3} \cdot p_{2} \cdot g_{1}+p_{3} \cdot p_{2} \cdot p_{1} \cdot g_{0} \\
\text { Delay }=\Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+2 \Delta+\Delta=20 \Delta
\end{gathered}
$$

2) Represent $s_{768}$ hierarchically using group-generated and group-propagated carries ( $g_{i: k}, p_{i: k}$ ) and $c_{0}$ (primary carry-in), then compute the delay to compute $s_{768}$ assuming all the primary input signals are available at time 0 (10 points).

$$
\begin{gathered}
s_{768}=p_{768} \oplus c_{768} \\
c_{768}=g_{767: 512}+p_{767: 512} \cdot g_{511: 256}+p_{767: 512} \cdot p_{511: 256} \cdot g_{255: 0}+p_{767: 512} \cdot p_{511: 256} \cdot p_{255: 0} \\
\cdot c_{0} \\
g_{255: 0}=g_{255: 192}+p_{255: 192} \cdot g_{191: 128}+p_{255: 192} \cdot p_{191: 128} \cdot g_{127: 64}+p_{255: 192} \cdot p_{191: 128} \cdot p_{127: 64} \\
\cdot g_{63: 0} \\
g_{63: 0}=g_{63: 48}+p_{63: 48} \cdot g_{47: 32}+p_{63: 48} \cdot p_{47: 32} \cdot g_{31: 16}+p_{63: 48} \cdot p_{47: 32} \cdot p_{31: 16} \cdot g_{15: 0} \\
g_{15: 0}=g_{15: 12}+p_{15: 12} \cdot g_{11: 8}+p_{15: 12} \cdot p_{11: 8} \cdot g_{7: 4}+p_{15: 12} \cdot p_{11: 8} \cdot p_{7: 4} \cdot g_{3: 0} \\
g_{3: 0}=g_{3}+p_{3} \cdot g_{2}+p_{3} \cdot p_{2} \cdot g_{1}+p_{3} \cdot p_{2} \cdot p_{1} \cdot g_{0}
\end{gathered}
$$

Delay $=\Delta+5 *(2 \Delta)+\Delta=12 \Delta$
3) Calculate the total gate area to build the 1024-bit carry look-ahead adder. Use the following area values. Notice that you should generate all sum ( $s_{1023: 0}$ ) and carry-out ( $c_{1024}$ ) signals (20 points).

- Two-input AND, OR gate: $k$
- Three-input AND, OR gate: $2 k$
- Four-input AND, OR gate: $3 k$
- Two-input XOR: $4 k$
- $g_{i}=a_{i} \cdot b_{i}, p_{i}=a_{i} \oplus b_{i}$
- For a carry look-ahead unit for $a_{i+3: i}, b_{i+3: i}, c_{i}$, use the following formulae:
o $\quad c_{i+1}=g_{i}+p_{i} \cdot c_{i}$ (for this, you need a two-input OR and a two-input AND)
o $\quad c_{i+2}=g_{i+1}+p_{i+1} \cdot g_{i}+p_{i+1} \cdot p_{i} \cdot c_{i}$ (for this, you need a three-input AND, a two-input AND, a three-input OR)
o $\quad c_{i+3}=g_{i+2}+p_{i+2} \cdot g_{i+1}+p_{i+2} \cdot p_{i+1} \cdot g_{i}+p_{i+2} \cdot p_{i+1} \cdot p_{i} \cdot c_{i}$ (for this, you need a four-input AND, three-input AND, a two-input AND, a four-input OR)
- For a group carry look-ahead unit for $g_{i+3: i}, p_{i+3: i}, c_{i}$, use the following formulae:

○ $g^{\prime}($ group generation $)=g_{i+3}+p_{i+3} \cdot g_{i+2}+p_{i+3} \cdot p_{i+2} \cdot g_{i+1}+p_{i+3} \cdot p_{i+2}$. $p_{i+1} \cdot g_{i}$ (for this, you need a two-input AND, a three-input AND, a four-input AND, a four-input OR)
o $p^{\prime}$ (group propagation) $=p_{i+3} \cdot p_{i+2} \cdot p_{i+1} \cdot p_{i}$ (for this, you need a four-input AND)
o $\quad c^{\prime}$ (group carry) $=g^{\prime}+p^{\prime} \cdot c_{i}$ (for this, you need a two-input AND and a twoinput OR)
$g, p_{i}$ for each $\mathrm{i}: 1024^{*}(\mathrm{k}+4 \mathrm{k})$
$s_{i}$ for each i: $1024^{*}(4 \mathrm{k})$
Level-1 carry look-ahead unit: 2 k for $c_{i+1}$, 5 k for $c_{i+2}$, 9 k for $c_{i+3}$, 9 k for group $g$, 3 k for group p, total 28k
\# Level-1 carry look-ahead units: 1024/4 = 256
For all the other carry look-ahead unit: 9 k for group $g$, 3 k for group $p$, and 2 k for $c^{\prime}$, total 14 k .
\# Level-2 carry look-ahead units: 1024/16 = 64
\# Level-3 carry look-ahead units: 1024/64 = 16
\# Level-4 carry look-ahead units: 1024/256 = 4
\# Level-5 carry look-ahead units: 1024/1024 = 1
The total area $=1024 * 5 \mathrm{k}+1024 * 4 \mathrm{k}+256 * 28 \mathrm{k}+85 * 14 \mathrm{k}=17,574 \mathrm{k}$

The max. fanout is 2 . Use the following delay values:

- AND, OR, XOR: $\Delta$
- Two-level (sum-of-product) logic: $2 \Delta$
- $g_{i}=a_{i} \cdot b_{i}, p_{i}=a_{i} \oplus b_{i}$

We are designing a 1024-bit carry look-ahead adder.
4) Represent $s_{999}$ hierarchically using group-generated and group-propagated carries ( $g_{i: k}, p_{i: k}$ ) and $c_{0}$ (primary carry-in), then compute the delay to compute $s_{999}$ assuming all the primary input signals are available at time 0 ( 10 points).

$$
\begin{gathered}
s_{999}=p_{999} \oplus c_{999} \\
c_{999}=g_{998}+p_{998} \cdot c_{998}(2 X) \\
c_{998}=g_{997: 996}+p_{997: 996} \cdot c_{996}(4 X) \\
c_{996}=g_{995: 992}+p_{995: 992} \cdot c_{992}(8 X) \\
c_{992}=g_{991: 960}+p_{991: 960} \cdot c_{960}(64 X) \\
c_{960}=g_{959: 896}+p_{959: 896} \cdot c_{896}(128 X) \\
c_{896}=g_{895: 768}+p_{895: 768} \cdot c_{768}(256 X) \\
c_{768}=g_{767: 512}+p_{767: 512} \cdot c_{512}(512 X) \\
c_{512}=g_{511: 0}+p_{511: 0} \cdot c_{0}(1024 X) \\
g_{511: 0}=g_{511: 256}+p_{511: 256} \cdot g_{255: 0} \\
g_{255: 0}=g_{255: 128}+p_{255: 128} \cdot g_{127: 0} \\
g_{127: 0}=g_{127: 64}+p_{127: 64} \cdot g_{63: 0} \\
g_{63: 0}=g_{63: 32}+p_{63: 32} \cdot g_{31: 0} \\
g_{31: 0}=g_{31: 16}+p_{31: 16} \cdot g_{15: 0} \\
g_{15: 0}=g_{15: 8}+p_{15: 8} \cdot g_{7: 0} \\
g_{7: 0}=g_{7: 4}+p_{7: 4} \cdot g_{3: 0} \\
g_{3: 0}=g_{3: 2}+p_{3: 2} \cdot g_{1: 0} \\
g_{1: 0}=g_{1}+p_{1} \cdot g_{0}
\end{gathered}
$$

Delay $=\Delta+17 *(2 \Delta)+\Delta=36 \Delta$

