Backend dielectric reliability simulator for microprocessor system

Chang-Chih Chen *, Fahad Ahmed, Dae Hyun Kim, Sung Kyu Lim, Linda Milor

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA

A R T I C L E   I N F O

Article history:
Received 4 June 2012
Received in revised form 30 June 2012
Accepted 1 July 2012
Available online 24 July 2012

A B S T R A C T

Backend dielectric breakdown is one of the major sources of wearout for microprocessors. We present test data and a methodology to accurately estimate the lifetime for a microprocessor system due to backend dielectric breakdown. Our methodology incorporates activity in the nets surrounding each dielectric segment in the layout, temperature, and all layout spacings among parallel tracks. We analyze several layouts using our methodology and show the impact of backend dielectric wearout on microprocessor system lifetime.

1. Introduction

Each technology generation reduces the interconnect dimensions without always reducing the supply voltage in proportion. This results in higher electric fields within the backend dielectric. At the same time, as the dielectric constant ($k$) decreases to reduce parasitics, as prescribed by the International Technology Roadmap for Semiconductors, the porosity of materials must increase, at the possible cost of increasing the vulnerability of materials to breakdown. These factors combine to increase the risk of failure of chips due to backend dielectric breakdown in the newer technology nodes.

The standard approach to assess backend dielectric reliability is using process data. The typical test structure is a comb structure, as shown in Fig. 1a. In testing a comb structure, a voltage difference is applied between the two combs. The current between the combs is monitored to determine the time-to-failure ($TF$).

Test structures are stressed at high voltages and high temperatures to accelerate dielectric breakdown. Appropriate adjustments and extrapolations are made to the test results to scale them to operating conditions. In addition, corrections are also needed to account for the difference between the vulnerable area of the microprocessor and the test structure.

The physics describing backend IC failure mechanisms has matured as a result of years of refinement to existing theories. However, the extension of these models to large and complex microprocessor systems has not proven to be straightforward and is complex. Microprocessor system reliability analysis requires techniques to extend the results gathered from small test structures to large complex microprocessors. Such an endeavor includes methods to manage the deluge of data that comes with analyzing large layouts.

The purpose of this paper is to present a methodology to assess microprocessor lifetimes based on low-k TDDB test structure lifetimes, by developing the link between data collected from test structures and the microprocessor system. We demonstrate the feasibility of our methodology by presenting results from a simulator based on the proposed methodology.

Because backend dielectric breakdown is activity and temperature dependent, our methodology includes determining the stress for each dielectric segment of a microprocessor while running benchmarks and a method to estimate the temperature distribution for a microprocessor system by using a thermal modeling tool.

The ultimate purpose of our work is to introduce backend dielectric reliability in the design of a microprocessor system, by conveying to the designer accurate estimates of processor lifetimes, including the breakdown among layers and blocks, in a designer-friendly manner. This enables a designer to make any updates in the design to enhance reliability prior to committing a design to manufacture.

In this paper, we first summarize our methodology to estimate microprocessor lifetime, based on data collected from test structures, in Section 2. Section 3 discusses our test structures and test data. In Section 4, we outline our methodology to incorporate the microprocessor geometries, temperature profile, and stress conditions in the simulator. Section 5 gives an overview of the acquisition flow for thermal and electrical stress profiles. Next, in Section 6, we study the estimated lifetimes for the microprocessor system under study from our simulator, and we conclude the paper in Section 7.

2. Backend dielectric breakdown models and microprocessor lifetime estimation

The most important reliability concerns for interconnects are electromigration [1–4], stress-induced voiding [5–7], and time-dependent dielectric breakdown (TDDB) of the backend dielectric.

* Corresponding author.
E-mail address: changchih@gatech.edu (C.-C. Chen).

© 2012 Elsevier Ltd. All rights reserved.
Our purpose is to consider only TDDB of the backend dielectric. Future work will add in these other wearout mechanisms.

2.1. TDDB models

We note that models that describe backend TDDB, although they may have been initially developed for device TDDB, are of the general form [8–11]

\[
\ln \eta = A - \gamma E^m - E_a/kT
\]

(1)

where \(A\) is a constant that depends on the material properties of the dielectric, \(\gamma\) is the field acceleration factor, \(m\) is one for the \(E\) model and \(1/2\) for the \(\sqrt{E}\) model, and \(\eta\) is the characteristic lifetime. In this paper, only \(E\) and \(\sqrt{E}\) models are considered. The electric field, \(E = V/S\), is a function of voltage, \(V\), and linespace between any two lines, \(S\). A typical layout has a large number of linespaces. The temperature dependence is modeled with the Arrhenius relationship in Eq. (1) [10,12], where \(k\) is the Boltzmann constant, \(q\) is the electronic charge, \(\phi_h\) is the trap barrier height, \(\varepsilon\) is the dielectric constant, \(\pi\) is a mathematical constant and the activation energy, \(E_a \propto q(\phi_h - \sqrt{qE/\pi\varepsilon})\), is field dependent.

Eq. (1) provides a correction between the electric field during use conditions and accelerated stress tests. Geometries with different line spacings scale differently to use conditions, as noted in [13,14]. Eq. (1) also provides a correction between chip operating conditions and accelerated stress conditions.

2.2. Microprocessor lifetime models

It should be noted that microprocessor systems wearout for a variety of reasons, both related to devices and interconnect. All of these wearout mechanisms happen simultaneously. It is common to describe reliability mechanisms with a Weibull distribution

\[
P(T_F) = 1 - \exp \left(-\left(\frac{T_F}{\eta}\right)^\beta\right),
\]

(2)

having two parameters: the characteristic lifetime, \(\eta\), and shape parameter, \(\beta\). The characteristic lifetime is the time-to-failure at the 63% probability point, when 63% of the population have failed, and the shape parameter describes the dispersion of the failure rate population. Typically, the shape parameter is close to one. If we have a collection of \(n\) independent wearout mechanisms modeled with Weibull distributions, having parameters, \(\eta_i\), \(i = 1, \ldots, n\), and \(\beta_i\), \(i = 1, \ldots, n\), then the characteristic lifetime of the system, \(\eta_{\text{processor}}\), is the solution of [13,15,16]:

\[
1 = \sum_{i=1}^{n} (\eta_{\text{processor}}/\eta_i)^{\beta_i}
\]

(3)

Similarly [15],

\[
\beta_{\text{processor}} = \sum_{i=1}^{n} \beta_i (\eta_{\text{processor}}/\eta_i)^{\beta_i}
\]

(4)

The components in Eqs. (3) and (4) could be different wearout mechanisms, different layers of a microprocessor, different geometries within a layer, or different geometries within a layer at different temperatures. Hence, all a reliability simulator has to do is to (a) determine the characteristic lifetimes and shape parameters for all of the underlying wearout mechanisms and geometries, after all components are scaled for temperature and to use conditions with Eq. (1) and (b) apply Eqs. (3) and (4) to solve for \(\eta_{\text{processor}}\) and \(\beta_{\text{processor}}\).

3. The test structures and test results

3.1. The test structures

We have designed test structures to assess the impact of linespace and area on Cu/low-k TDDB. The details of the test struc-
Weibull parameters and combine these to determine the full-
and at 150 nm dual-damascene process and were tested at 3.6 MV/cm impacts of those irregular geometries [17]. We also take into account the affected by irregular geometries. The data set from several samples is fit with a Weibull distribution to estimate \( \eta \) and \( \beta \).

In previous work, we have shown that full chip lifetimes may be affected by irregular geometries. We also take into account the impacts of those irregular geometries in this work. Fig. 2 shows the top views of these test structures and the fragments of these test structures are shown in Fig. 3.

The test structures were manufactured with an industrial 45 nm dual-damascene process and were tested at 3.6 MV/cm and at 150 °C, and a current limit of 10 \( \mu \)A between the lines was set to detect dielectric breakdown. This current limit detects hard failure. The increase in current at hard breakdown is very rapid, making the time-to-failure not very sensitive to the exact value of the current limit.

3.2. Test results

The data collected from the test structures is presented in [13,17]. We’ve extracted \( \eta \) and \( \beta \) by fitting the data with a Weibull distribution. Once these parameters have been determined for the unit area, the relationship between characteristic lifetimes for different areas is known.

4. TDDB lifetime simulator for microprocessor systems

4.1. Vulnerable area and full processor reliability simulation

The simulator operates by determining the vulnerable length of the microprocessor layout for each linespace. The vulnerable length is defined as the length of a block of dielectric between two copper lines separated by linespace \( S \), as shown in Fig. 4. A given layout is analyzed by determining the pairs \((S, L)\) for each layer and all linespaces. The details of our methodology can be found in [13,15–17].

Next, after feature extraction, we compute feature-level Weibull parameters and combine these to determine the full-microprocessor lifetime parameters. Let \( \eta \) be the Weibull characteristic lifetime for a test structure with vulnerable linespace \( S \) of length \( L \). Then, if the microprocessor has a vulnerable length \( L_{ij} \) associated with the same linespace, \( S \), on the \( j \)th layer, the corresponding characteristic lifetime of the portion of the layer with linespace \( S \) is [13,15]

\[
\eta_{ij} = \eta_{ij}(L_{ij}/L_{ij})^{1/\beta},
\]

where \( \beta_{ij} \) is the Weibull shape parameter for the \( i \)th linespace in the \( j \)th layer. If there is no test structure with the linespace, \( S \), \( \eta_{ij} \) is found using other test structures and the field acceleration Eq. (1).

Since each layer has many spacings, \( S \), and a microprocessor has many layers, the characteristic lifetimes and shape parameters are combined with (3) and (4).

4.2. Vulnerable area and vulnerable feature extraction

We have developed our layout extraction tool using the standard object oriented programming language C++. The layout extraction flow is shown in Algorithm 1. Two inputs to the program are a layout \( L \) whose features are to be extracted and maximum line spacing, \( S_{\text{max}} \). The program then outputs a table for vulnerable areas and vulnerable features (#TLa/b, #TTa, #TTb, #PTT). The detailed explanation of Algorithm 1 is given in [17].

4.3. Temperature modeling for microprocessor

The design under study was implemented on an FPGA board. For analyzing the impact of backend dielectric wearout on a microprocessor system, we have used the well-known open-source LEON3 IP core processor [18] with superscalar abilities. The microprocessor logic units consist of a 32-bit general purpose integer unit (IU), a 32-bit multiplier (MUL), a 32-bit divider (DIV) and a memory management unit (MMU). Storage blocks include a window-based register file unit (RF), separate data (D-Cache) and instruction (I-Cache) caches and cache tag storage units (Dtags and Itags).

For modeling the temperature distribution of a microprocessor, we collected the activity of nets of the system under study, based on running a series of standard benchmarks [19] on the system and used the temperature modeling tool HotSpot [20] to estimate the temperature distribution for every single unit of the microprocessor system. Fig. 5 shows the average temperature distribution when the microprocessor system is running a set of standard benchmarks.
should be noted that the backend dielectric TDDB under AC stress
microprocessor dielectrics undergo AC stress. Nonetheless, it
However, the test structure is stressed with DC stress while the
tionship between test conditions and use conditions is given in Eq.(1).

Parameter $g$ is the characteristic lifetime adjusted to the
linespace for a linespace, $g_{si}$ is the characteristic lifetime
for each linespace. If we have a collection of
other dimension to the problem, because now we have to consider

Characteristic lifetimes for the microprocessor system can be
Including the temperature map in the layout statistics adds an-

Including the temperature map in the layout statistics adds an-


Algorithm 1. The pseudocode of the layout extraction flow

\[ a_{\text{dc}} = 0.5 \]

\[ g = \frac{a_{\text{dc}}}{m} \]

\[ g = 2a_{\text{dc}} \]

\[ \eta_{ac} = \frac{\eta_{ac}}{a_{\text{dc}}} \]

where $\eta_{ac}$ is the characteristic lifetime under use conditions, with a
probability of stress of $a$, and $\eta_{ac}$ is the characteristic lifetime under
dc test conditions. Hence, in prior work, $\eta_{ac} = 2\eta_{ac}$.

In our current implementation, we compute the probability that
each adjacent net has opposite voltages. Let’s suppose that there are
$n$ different probabilities of segments being under stress, $a_n$ for
linespace, $S$. Then the corresponding characteristic lifetime
for linespace, $S$, under use conditions is

\[ \eta_{ac} = \eta_{ac} \sum_{n} (a_n^2)^{-1/\beta}. \]

In this work, instead of assuming a fixed stress probability, we
collect the activity profiles of each net within the microprocessor while
running benchmarks. The microprocessor system includes 195 k
nets which form around 21 million dielectric segments to be ana-
lyzed in the layout.

5. Electrical/thermal profile acquisition

The time-to-failure of TDDB is a function of device stress and
the thermal profile. To get accurate lifetime results, a framework
for the accurate acquisition of spatial and temporal thermal/elect-
crical stress of the system was constructed. Fig. 6 summarizes
the electrical and thermal profile acquisition flow. For activity
tracking, the hardware RTL/netlist was synthesized for emulation
on an FPGA, and counters were placed at the I/O ports, which track
both the state probabilities and the toggle rates of the ports during
application runtime, as illustrated in Fig. 7. A standard set of
benchmarks were used as the applications for the analysis.

The I/O activities and the gate-level netlist were then used for
activity propagation to each net in the design, depending on its
logic behaviour, for a complete stress/transition probability profile of
the internal nodes of the microprocessor under study. Thus we
have the probability of a transition occurring at any node and the
probability at each state, i.e. the probability at logic “1”. It is this
probability at logic “1” and logic “0” that we need to compute
the probability that each dielectric segment is under stress. The

![Fig. 6. The flow for extracting electrical and thermal profiles.](Image)
probabilities of dielectric stress of each dielectric segment then can be determined by

$$a = a_1(1 - a_2) + a_2(1 - a_1).$$

(9)

where $a$ is the probability of dielectric stress, $a_1$ and $a_2$ are the stress probabilities of each net, in each pair of nets which border the dielectric segments.

The netlist was also used for layout generation. The RC information from the layout, together with the net activity, was used for the extraction of the power profile and the consequent thermal profile, through the power simulator [22] and the thermal simulator [19], respectively, for every single unit of the microprocessor system.

Then, using the layout, the thermal profile and the calculated probability of voltage stress, we can use device level models to characterize TDDB in every unit of the microprocessor under study to estimate the lifetime of the system.

The runtime for the TDDB simulator is the sum of the time taken to extract features from the layout, propagate activities to each net in the design, and a constant time to evaluate Eqs. (3) and (4). Complexity of feature extraction and database extraction is $O(n)$, where $n$ is the number of feature since bucket-sort is used. Complexity of extracting statistics from the features is also $O(n)$, because we scan the bucket from the bottom most element, and the maximum number of features within a fixed distance from an element is constant. Complexity of activity propagation is $O(n^2)$, where $n$ is the number of gates in the system. Lifetime is estimated in constant time. Hence, the overall complexity of the TDDB simulator is $O(n)$.

6. Estimated lifetime for the microprocessor system

A set of standard benchmarks were run on the microprocessor system under study. The microprocessor includes around 20–25 k gates, while the runtime for executing a set of standard benchmarks on the system is around 15 min. The electrical and thermal profiles, together with the lifetime models from Section 4, were then used to estimate the lifetime of each functional unit in the microprocessor system.

The microprocessor system can be broken down into two distinct groups: the storage units and the combinational logic units. The storage units include the data cache, the instruction cache, the two cache units for tag storage, and the register file. The combinational logic units include the memory management unit, the integer unit, the multiplier, and the divider.

We have estimated the lifetime of each microprocessor unit and analyzed the lifetime for every metal layer in the design technology used, as shown in Figs. 8 and 9.

The lifetime of the system under study was clearly limited by the Metal1 layer, as seen in Figs. 8 and 9. As we move up in the metal layer stack, the metal spacing also increases, resulting in an increased time-to-failure. Our analysis shows that the data-cache and the instruction-cache were the lifetime limiting units in the microprocessor. On-line reconfiguration, through redundancy allocation, was not considered here, but could improve the lifetime of these units. Among the combinational blocks, lifetime was limited by the MMU and the IU, while the MUL and the DIV blocks had relatively better lifetimes. Figs. 5 and 8 clearly suggest a strong temperature dependence of the system lifetime.

From Figs. 8 and 9, we also can find that the estimated lifetimes for the “bitcnts” benchmark are longer than the estimated life-
times for the “basicmath” benchmark. The reason may be that the stress probability of each dielectric segment is lower when the “bitcnts” benchmark is executed on the system. The stress probability distributions for the “bitcnts” and “basicmath” benchmarks are shown in Figs. 10 and 11, respectively.

However, electrical stress also plays an extremely important role in determining the lifetime. The assumption of a fixed stress probability for each net is inaccurate, as seen in Figs. 10 and 11. Contrary to our earlier assumption of a stress probability of 0.5, most dielectric segments have stress probabilities of 0 or 1.

The use of accurate electrical stress and thermal profiles through the proposed methodology is expected to result in improved system backend lifetime estimates. The new lifetime figures, as shown in Figs. 12 and 13, indicate that the assumption of fixed activity levels [17] might lead to an underestimation in lifetime numbers of up to 35%.

7. Conclusion

This paper presents a flow to obtain the thermal and electrical stress profiles from microprocessor systems while running standard benchmarks. Taking into account the detailed thermal and electrical stress profiles, a methodology was proposed to accurately assess state-of-art microprocessor reliability based on the backend TDBB wearout mechanism. The methodology relies on the link between the device level wearout models and the chip layout. It takes into account the architecture through the temperature and activity profiles. Combining the wearout model, the thermal profile, and the electrical stress profile, this work provides insight
into the backend TDDB-critical microprocessor functional units for the whole system through using standard benchmarks.

References


