# A 12.8-Gbaud ADC-Based Wireline Receiver With Embedded IIR Equalizer

Jae-Won Nam<sup>D</sup>, Student Member, IEEE, and Mike Shuo-Wei Chen, Senior Member, IEEE

Abstract—This article demonstrates an analog-to-digital converter (ADC)-based receiver for NRZ/PAM4 modulation, featuring a time-to-digital converter (TDC)-assisted multi-bit/cycle asynchronous successive approximation register (SAR) ADC with embedded IIR equalization filter driven by the differential source followers with an active gain. It re-uses the existing sampling network of time-interleaved (TI) ADCs and incorporates active G<sub>m</sub>-C integrators to form a tunable IIR equalizer response. The prototype is fabricated in 65-nm complementary metal–oxide–semiconductor (CMOS) and achieves an efficiency of 2.43-pJ/b using the 12.8-Gbuad PAM4 modulation scheme. The eight-way TI ADC measures 4.84 peak effective number of bit with power consumption of 36.3 mW while occupying 0.24 mm<sup>2</sup> core area.

*Index Terms*—Complementary metal–oxide–semiconductor (CMOS), equalization, successive approximation register analog-to-digital converter (SAR ADC), switched-capacitor filter, wireline.

## I. INTRODUCTION

HE emerging trend of high throughput and flexible wireline communication systems can benefit from an analogto-digital converter (ADC)-based receiver [1], [2], which enables a higher order pulse-amplitude modulation (PAM) scheme and reconfigurable digital equalization for different channel conditions. Recent advancements in high-speed ADCs facilitate the use of this receiver architecture [3]. One common ADC topology is the time-interleaved (TI) asynchronous successive approximation register (SAR) ADC [4]; this ADC can achieve good power efficiency, relaxed clock routing, and high speed for a medium resolution. On the other hand, the receiver front end typically uses a continuous-time linear equalizer (CTLE) and/or discrete-time feedforward equalization to eliminate unwanted intersymbol interference (ISI) effects, particularly for pre-cursors. Those equalizers can impose significant area and power overhead, as well as speed/stability constraints, especially when there is an equalization loop around the ADC, such as a mixed-signal decision feedback equalizer (DFE). In this case, the ADC conversion latency can affect the effectiveness of equalization. In this article, we explore circuit- and

Manuscript received July 1, 2019; revised September 19, 2019 and October 28, 2019; accepted November 21, 2019. Date of publication December 9, 2019; date of current version February 25, 2020. This article was approved by Associate Editor Qun Jane Gu. (*Corresponding author: Jae-Won Nam.*)

J.-W. Nam is with Intel Corporation, Hillsboro, OR 97124 USA (e-mail: jaewon.nam@intel.com).

M. S.-W. Chen is with the Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA 90089 USA.

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2019.2956395

(a)

Mixed-signal Equalizer

Digital Equalizer

DOUT

Fig. 1. Block diagrams of (a) digital equalizer, (b) mixed-signal equalizer, and (c) analog equalization.

architecture-level techniques to improve ADC-based receiver efficiency with discrete-time IIR equalizers: one technique uses a time-to-digital converter (TDC)-assisted multi-bit/cycle asynchronous SAR ADC, and the other embeds active G<sub>m</sub>-Cbased infinite impulse response IIR equalization into the TI-ADC sampling network [5]. The embedded IIR filter boosts the high-frequency component and can be used to relax the remaining equalizer requirements, such as front-end CTLE, feedforward equalizer (FFE), or DSP equalizer. The IIR filter response is reconfigurable via an on-chip switched capacitor bank and a tunable transconductance (Gm) control. Since the feedback path of this IIR equalizer is completely in analog domain, i.e., inside the sampling network, it does not increase the requirement of the ADC quantization latency, unlike the case of a mixed-signal DFE. The proof-of-concept prototype is implemented in 65-nm complementary metaloxide-semiconductor (CMOS). The overall power consumption, mainly including the IIR filter, eight-way TI-ADC, and delay-locked loop, is 62.1 mW under 12.8-Gbaud rate and PAM4 modulation, i.e., energy efficiency of 2.43 pJ/bit.

This remainder of this article is organized in the following manner. Section II introduces the detailed motivation of the proposed IIR equalization technique. Section III describes the implementation details of the receiver architecture. Section IV presents the measurement results, and Section V provides the conclusion.

0018-9200 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 2. Frequency responses of (a) three-tap FFE, (b) passive sampler-based DTLE [6], and the proposed IIR equalizer with (c) normal, (d) asymmetric, and (e) selective frequency boost.

## II. ADC-BASED RECEIVER EQUALIZATION TECHNIQUE

Due to the improved digital computing and advanced high-speed ADC in scaled technology, there is an increased interest in digital equalization techniques [see Fig. 1(a)]. In this architecture, an ADC must provide a relatively high dynamic range to maintain the required signal-to-noise ratio. Therefore, analog-domain equalization techniques that process un-quantized analog signals remain to be the interest and can be combined with these DSP equalizations, such as the mixedsignal equalization illustrated in Fig. 1(b). For the analog equalization [see Fig. 1(c)], such as CTLE, a degenerative RC network with a differential common-source amplifier topology is widely selected to create a high-frequency gain for compensating given channel loss. Furthermore, a multiple-stage gain amplifier enlarges the signal swing to utilize the full-scale range of an ADC's input dynamic range. In a typical CTLE implementation, the first-order channel loss compensation is performed, and the bandwidth of the amplifier is usually limited. Consequently, it is not energy efficient for high-speed link applications. In order to further achieve tunability in CTLE's frequency response, multiple switches and resistor/capacitor banks are required, which occupies the extra active area. In other words, it is challenging to reconfigure the frequency response to support various channel profiles and data rates.

On the other hand, analog discrete-time equalization, such as the discrete-time linear equalizer (DTLE), automatically scales its frequency response according to the sampling frequency instead of relying on passive component values, as in CTLE. In addition, the DTLE can be embedded with TI samplers, as long as they can adequately sample the input signal. As an example, a DTLE implemented in a receiver [6] employs a master–slave sampling circuit to realize an IIR response from the continuous charge sharing between the current sample and the previously stored value. However, since the charge-sharing circuit is implemented in each TI sampler, the charge-sharing operation is N times slower than the overall sampling rate (F<sub>s</sub>), assuming an N-way TI sampler. Therefore, the achievable IIR response is constrained since the unit delay in each sampler is always at  $N/F_s$ . In other words, the achievable IIR response depends on the number of TI channels and imposes challenges for implementing a generic equalizer response.

In this work, we configure an IIR-based DTLE by combining three-tap FIR (second order) and one tap in the feedback, while all filter coefficients can be adjusted independently. Fig. 2 illustrates comparisons of the existing discrete-time equalization techniques. Similar to the conventional linear gain boosting, i.e., high-pass filter shape, we can create a large gain difference between dc and Nyquist frequency with a linearly increasing curve. However, a larger high-frequency boost results in a worse signal loss at a low frequency, as depicted in Fig. 2(a)-(c); hence, the required signal-tonoise ratio of the system ultimately determines the most suitable filter configuration. Consequently, we design the IIR



Fig. 3. Block diagram of the proposed ADC-based receiver architecture.

filter such that the numerator and denominator coefficients can be independently set, thereby resulting in more possible frequency responses for the equalizer. Note that the created IIR equalizer can compensate for the long-tail ISI scenario without using as many taps as those used in the FIR DTLE case. For example, lower signal loss at a low frequency and more boosted gain at selective high-frequency shaping can be feasible, as illustrated in Fig. 2(d) and (e). Hence, the proposed equalizer can reciprocally compensate even nonlinear channel profiles by re-configuring the equalizer to be an inverse loss function. In case different input bandwidths (data rate) of the receiver are targeted, the reconfigurable equalizer response can provide different levels of gain boosting accordingly, i.e., less gain boosting for a lower data rate case. This is possible by tuning all coefficients independently. In this prototype, a high-frequency boost of over 10 dB is provided by our proposed IIR-based DTLE. In addition, in the lab characterization, we change the IIR equalizer response based on different physical channels and data rates, which is discussed in Section IV.

# **III. RECEIVER IMPLEMENTATION**

## A. ADC-Based Receiver Architecture

As depicted in Fig. 3, we further improve the ADC frontend circuits to maximize circuit utilization by embedding the equalizer into the TI ADC sampler array and applying an enhanced differential source follower to provide an active gain. Consequently, the requirements of front-end CTLE and PGA as well as that of the back-end DSP equalizer can be relaxed. Fig. 4 illustrates the proposed receiver architecture that mainly consists of dual input buffers to minimize channel-to-channel crosstalk between TI-ADC samplers, an eight-way 5.6-b TI-ADC with the proposed embedded tunable IIR filter, and on-chip reference regulators for the ADC reference voltage (VREF). Given the external reference clock (1.6 GHz) driven by an on-chip CML buffer, a delay-locked loop circuit creates eight-phase sampling clock signals for a proper TI operation. We also design various on-chip calibration circuits to compensate for comparator offsets, input buffer gain/offsets, and sampling skews in TI-ADC clocks. The full-scale range of the ADC can be controlled by an on-chip VREF regulator from 200 to 800  $\mathrm{mV}_{\mathrm{pp}}$ , and the common-mode voltage of the receiver input is provided by an on-chip regulator and on-chip 100- $\Omega$  differential termination resistors. We elaborate critical building blocks in the following.



Fig. 4. Simplified block diagram of the proposed TI ADC.



Fig. 5. (a) Source follower, (b) super-source follower, (c) differential source follower, and (d) proposed differential super-source follower.

## B. Proposed Differential Super-Source Follower

For a high-speed ADC buffer design, the source follower topology depicted in Fig. 5(a) is widely chosen due to its small output impedance and simple structure. Fig. 5(b) illustrates a super-source follower, which is an advanced topology to further reduce the output impedance by applying an extra regulation feedback path [7]. In order to implement a differential topology that rejects even-order harmonics, a prior work [8] inserts a cross-coupling pair at the output node of the source follower to configure a differential source follower while sacrificing the gain and bandwidth, as illustrated in Fig. 5(c). In this work, we combine the cross-couple pair with a differential super-source follower topology, as depicted in Fig. 5(d). Interestingly, the proposed buffer circuit enhances the bandwidth beyond the conventional source follower due

| $\backslash$                    | Source Follower<br>(Fig. 5a)    | Super source follower<br>(Fig. 5b)                        | Proposed differential super source Follower<br>(Fig. 5d)                           |  |  |  |
|---------------------------------|---------------------------------|-----------------------------------------------------------|------------------------------------------------------------------------------------|--|--|--|
| DC-Gain<br>(G <sub>Buff</sub> ) | $\frac{g_{m1}R_S}{1+g_{m1}R_S}$ | $\frac{g_{m1}R_S(1+g_{m2}R_D)}{1+g_{m1}R_S(1+g_{m2}R_D)}$ | $\frac{g_{m1}R_S(1+g_{m2}R_D)}{1+\{g_{m1}+g_{m2}R_D(g_{m1}-g_{m3})\}R_S}$          |  |  |  |
| R <sub>out</sub>                | $R_S \parallel rac{1}{g_{m1}}$ | $R_{\mathcal{S}} \  \frac{1}{g_{m1}(1+g_{m2}R_D)}$        | $R_{\mathcal{S}} \  \frac{1}{g_{m1}(1+g_{m2}R_D)} \  \frac{-1}{g_{m3}(g_{m2}R_D)}$ |  |  |  |
| GBW                             | $\frac{g_{m1}}{C_L}$            | $\frac{g_{m1}(1+g_{m2}R_D)}{C_L}$                         | $\frac{g_{m1}(1+g_{m2}R_D)}{C_L}$                                                  |  |  |  |

TABLE I Voltage Gain and Output Resistance Comparison

Note: Body effect and channel modulation are ignored, and same load capacitance  $(C_L)$  is assumed.

to the reduced output impedance from negative feedback on the regulation path. Note that proposed buffer can provide active voltage gain ( $G_{Buff} > 0$  dB) while maintaining the same gain-bandwidth product (GBW) as in the super-source follower (summarized in Table I). Therefore, we can mitigate the front-end PGA gain requirement from the active gain of the buffer by strengthening the cross-coupling path, i.e., M3 in Fig. 5(d), as this reduces the denominator of the gain equation and, hence, increases the dc gain. In order to compare the supply noise impact, we evaluated a power supply rejection ratio (PSRR) simulation under the large input signal swing and achieved 40-dB PSRR from the conventional source follower topology. In the revised topologies, with equivalent target specifications, we observed PSRR degradations: -6.3 dB of the super-source follower, -22 dB of the pseudo-differential source, and -5.8 dB of the proposed differential supersource follower in comparison with the conventional source follower. Note that there is a significant PSRR drop in the pseudo-differential source follower topology, while the feedback path of super-source follower mitigates to improve PSRR degradation. In practice, dual-ADC buffers are arranged to drive massive sampling networks at eight-way time-interleaved ADC. Each buffer drives even- and odd-channel sub-ADCs, and mismatch errors are calibrated by pMOS input transistor body voltage control with on-chip 9-b RDACs.

# C. IIR Equalization

Fig. 6(a) illustrates the implementation of a single-channel four-tap discrete-time IIR equalization filter followed by a single slice of the TI-ADC. Both feedback and feedforward paths in the IIR filter employ active  $G_m$ -C integrators to provide signal gain. The current integration has an advantage of reconfiguring the filter response, as the individual weight control is viable for each IIR tap. In the sampling phase  $\Phi_S$  [see Fig. 6(b)], the input voltage signal is sampled on  $C_S$ ,  $C_{FW}$ , and  $C_{FB}$  as the current tap of IIR; the adjacent IIR taps (both feedback and feedforward taps) are created by integrating the current over a fixed time duration ( $T_{IIR}$ ) in  $\Phi_F$  phase, as depicted in Fig. 6(c), through the  $G_m$ -C topology. In this prototype, the target IIR equalization response contains two delay taps in the feedforward path for the numerator of the transfer function and one delay tap in the feedback path for the denominator. Therefore, we use two extra samplers,  $C_{FW}$  and  $C_{FB}$ ;  $C_{\rm FW}$  transfers the current input sample to the next and two-tap later channel, and  $C_{\rm FB}$  is used for implementing the current IIR output to the next channel's G<sub>m</sub>-cell input during the  $\Phi_F$  phase. Note that the G<sub>m</sub>-induced current is integrated on top of the ADC's sampling capacitor  $(C_S)$ , which not only reuses the existing S/H circuitry but also enables a tunable IIR filter by varying the transconductance (G<sub>m</sub>) value. Once the sampling and IIR current integration ( $\Phi_S$  and  $\Phi_F$ ) are completed, the sub-ADC begins the SAR conversion process with a charge redistribution of  $C_S$ , as depicted in Fig. 6(d). As a result, each slice of eight-way TI-ADC with IIR filter associated with TI clock signals (as shown in Fig. 7) is created by an on-chip delay-locked loop circuit. Here, we derive the relationship between various circuit design parameters and the IIR transfer function H[z], which is provided in the following:

$$H[z] = \frac{a_0 + a_1 z^{-1} + a_2 z^{-2}}{1 + b z^{-1}}$$
(1)

where the coefficients

$$a_{0} = \frac{C_{S} + C_{FB}}{C_{S} + C_{FB} + C_{P}}$$

$$a_{1} = \frac{-g_{m\alpha 1}}{C_{S} + C_{FB} + C_{P}} \cdot T_{IIR}$$

$$a_{2} = \frac{g_{m\alpha 2}}{C_{S} + C_{FB} + C_{P}} \cdot T_{IIR}$$

$$b = \frac{g_{m\beta}}{C_{S} + C_{FB} + C_{P}} \cdot T_{IIR} \qquad (2)$$

can be adjusted by  $g_{m\alpha 1}$ ,  $g_{m\alpha 2}$ , and  $g_{m\beta}$ , respectively, as well as capacitors with given current integration time (T<sub>IIR</sub>). Note that  $g_{m\beta}$  mainly contributes the gain difference between the high and low frequencies of the equalizer. In terms of stability, a pole must be placed within the unit circle, as illustrated in Fig. 8, leading to the following design constraint:

$$|b| < 1 \iff T_{\text{IIR}} < \frac{C_S + C_{\text{FB}} + C_P}{g_{m\beta}}.$$
 (3)

This, in turn, produces a converging series, i.e., a stable response. In addition, we must ensure an IIR filtering time ( $T_{IIR}$ ) between the input sampling and the ADC conversion phase, as depicted in Fig. 9; therefore, the required timing constraint of  $T_{DEL,IIR}$  must satisfy

$$T_{\text{DEL,IIR}} = T_{\text{IIR}} + T_{\text{QNT}} < 1\text{UI} - T_{\text{S/H}}$$
(4)



Fig. 6. (a) Single slice circuit of TI-ADC with embedded IIR filter. (b) Input signal sampling phase. (c) Current integration (IIR filter operation) phase. (d) Quantization phase using SAR ADC.



Fig. 7. Timing diagram of eight-way TI clock signals.

where  $T_{QNT}$  and  $T_{S/H}$  are time taken for quantizer and sample-and-hold circuit, respectively. In order to build simple clocking, we decide to use same duty cycle of  $T_{S/H}$  and  $T_{IIR}$  (78.5 ps of each).

For a higher baud-rate ADC-based receiver architecture, we need to increase an interleaving factor  $(M_{ADC})$  of TI-ADC. As a result, the proposed ADC-based receiver with embedded IIR architecture will eventually be limited by a lack of either settling the sampler's bandwidth and IIR-filtering time. Considering sequential, current integration



Fig. 8. Pole/zero plot in z-plane.



Fig. 9. Simplified block and timing diagram of the proposed IIR equalization.

in a time-interleaving order, there is an inverse relationship between  $T_{IIR}$  and  $M_{ADC}$ , as shown in the following:

$$T_{\rm IIR} < \frac{1}{M_{\rm ADC} \cdot f_{\rm subADC}} \tag{5}$$



Fig. 10. Schematic of the proposed unit IIR filter.

where  $f_{subADC}$  is a sub-ADC sampling rate. This is derived from the constraining time-interleaving current integration at each slice must not be overlapped in the time domain. Therefore, an increased interleaving factor causes excessive power consumption of G<sub>m</sub>-cell to complete current integration under the given T<sub>IIR</sub>.

In the practical circuit implementation of the  $G_m$  stage (see Fig. 10), we use a revised folded-cascode amplifier with railto-rail input, featuring multiple input differential pairs. The coefficients of each IIR filter can be adjusted by tuning current mirroring ratios between the tail current sources of the input pairs. The  $G_m$  cell incorporates a reset switch at the output node to minimize the memory effect caused by the parasitic capacitance ( $C_P$ ). In order to ensure linearity (total harmonic distortion, THD) at the level of 40 dB, a source degeneration technique is applied.

In terms of the noise contributions due to embedded IIR equalizer, they are mainly from active G<sub>m</sub> cell. Since there is a feedback loop, there exists a noise shaping effect as in the DFE's case. However, comparing with the mixedsignal DFE approach, there is no ADC quantization noise within the feedback loop and, hence, less noise enhancement. This is because the proposed IIR entirely operates in the analog domain. According to the post-layout simulation, the proposed active IIR equalizer yields a total noise power of 2.4-2.8-mV<sub>rms</sub>, depending on the viable equalizer gain configuration. In comparison with other noise sources in the receiver, this noise level due to the IIR equalizer (10-dB highfrequency boost, in this case) merely contributes  $\sim 10\%$  in the rms noise power within the overall noise budget. The noise power of the receiver mainly comes from a sampling clock jitter (35%) and an ADC quantization noise (20%). Here, the noise components are referred to as the sampled analog signal of the quantizer input node. As the data rate goes up and a sampling clock jitter becomes a dominant noise source, the overall receiver performance will ultimately be determined by cost-efficient clock generation and distribution.

### D. Proposed ADC Topology

In this prototype, we propose a TDC-assisted multi-bit/cycle SAR ADC architecture. The multi-bit/cycle comparison is



Fig. 11. VTC curves and extra decision threshold with time delay T<sub>D</sub>.

conducted via both voltage- and time-domain comparators. For asynchronous SAR conversion [9], [10], each voltage comparator generates a ready signal to indicate whether its comparison is complete, as illustrated in Fig. 11. As the delay of the ready signal scales with the input voltage, it converts the voltage information into time with a certain voltage-to-time transfer function. The resolving speed of the strong-arm comparator can be approximated as in [11]

$$T_{\text{COMP}}(V_{\text{IN}}, V_{\text{TH}}) \propto \frac{K}{\omega_{-3 \text{ dB}}} \ln \frac{V_{FS}}{|V_{\text{IN}} - V_{\text{TH}}|}$$
(6)

where  $V_{IN}$  is the comparator input,  $V_{FS}$  is the full-scale voltage,  $\omega_{-3\,dB}$  is the 3-dB bandwidth of the comparator, *K* is a constant depending on the comparator structure, and  $V_{TH}$  is the comparator's VREF.

Here, we insert a delay for the individual ready signal of the comparator. This shifts the voltage-to-time transfer functions, which can be used to create three extra comparison levels through time comparisons among them; a similar approach is applied in the Flash ADC example [12], as illustrated in Fig. 12. For example, given two voltage comparators A and B with  $\pm V_{\text{TH}}$  VREFs, the corresponding voltage-to-time conversion (VTC) curves (T<sub>COMP,A</sub> and T<sub>COMP,B</sub>), including delayed ones (T<sub>COMP,Ad</sub> and T<sub>COMP,Bd</sub>), are defined as

$$T_{\text{COMP,A}} \triangleq T_{\text{COMP}}(V_{\text{IN}}, +V_{\text{TH}})$$
  

$$T_{\text{COMP,B}} \triangleq T_{\text{COMP}}(V_{\text{IN}}, -V_{\text{TH}})$$

$$T_{\text{COMP,Ad}} = T_{\text{COMP,A}} + T_{\text{D}}$$
(7)

$$\Gamma_{\text{COMP,Bd}} = T_{\text{COMP,B}} + T_{\text{D}}.$$
(8)

Next, we create extra decision thresholds through time interpolation at points X, Y, and Z (annotated in Fig. 11); these are derived in the following manner:

$$T_{\text{COMP,Ad}} = T_{\text{COMP,B}} \Rightarrow -\tanh\left(\frac{\omega_{-3 \text{ dB}} T_{\text{D}}}{2K}\right) \cdot V_{\text{TH}}$$
$$T_{\text{COMP,A}} = T_{\text{COMP,Bd}} \Rightarrow +\tanh\left(\frac{\omega_{-3 \text{ dB}} T_{\text{D}}}{2K}\right) \cdot V_{\text{TH}}$$
$$T_{\text{COMP,A}} = T_{\text{COMP,B}} \Rightarrow 0. \tag{9}$$

Hence, the time-interpolation-driven voltage-domain quantization threshold ( $V_{\text{TH,time}}$ ) can be derived as shown in the following:

$$V_{\rm TH,time} = \tanh\left(\frac{\omega_{-3\,\rm dB}T_{\rm D}}{2K}\right) \cdot V_{\rm TH}.$$
 (10)



Fig. 12. (a) Proposed single-channel TDC-assisted asynchronous SAR ADC architecture and (b) asynchronous clock generation for multi-bit/cycle SAR and voltage-to-time response.

One interesting property of this time quantization is that the resultant three decision thresholds automatically scale with different VREFs (i.e.,  $\pm V_{\text{TH}}$ , as illustrated in Fig. 11), thereby allowing us to use the same time interpolation circuit for each SAR conversion. In order to employ the extra decision thresholds ( $\pm V_{\text{TH},\text{time}}$ ) as the quantization VREFs,  $V_{\text{TH},\text{time}}$  has to be one half of  $V_{\text{TH}}$ . Therefore, the required T<sub>D</sub> can be calculated as

$$T_{\rm D} = \frac{K}{\omega_{-3\,\rm dB}} \ln 3 \approx 1.098 \cdot \left(\frac{K}{\omega_{-3\,\rm dB}}\right). \tag{11}$$

Furthermore, in order to ensure the  $\pm 0.5$  DNL of the quantizer,  $T_D$  must remain within the range between Max[ $T_D$ ] and min[ $T_D$ ] that are, respectively, derived as

$$\operatorname{Max}[T_{\rm D}] = \frac{K}{\omega_{-3\,\mathrm{dB}}} \ln 7 \approx 1.946 \cdot \left(\frac{K}{\omega_{-3\,\mathrm{dB}}}\right) \qquad (12)$$

$$\min[T_{\rm D}] = \frac{K}{\omega_{-3\,\rm dB}} \ln \frac{5}{3} \approx 0.511 \cdot \left(\frac{K}{\omega_{-3\,\rm dB}}\right). \quad (13)$$

Since  $T_D$  affects the actual time comparison level according to PVT variations, we conducted a post-layout simulation under various PVT conditions (-40 °C/100 °C, SS/TT/FF, and 10% supply voltage variations); the error caused by  $T_D$  remains within the DNL  $\pm 0.1$  LSB range. In this prototype, the on-chip delay tuner (formed by 3-b CDAC) is used for foreground calibration while observing a static performance.

In this prototype, the hybrid voltage and time comparison yield ten quantization levels for each SAR conversion, as illustrated in Fig. 12(a). The single voltage-mode comparator occupies about 150  $\mu$ m<sup>2</sup> of the active area, while the time-domain comparator needs more than four times less.



Fig. 13. Measured ADC performances.

Due to this simple and compact layout of the time-domain comparator, we can significantly save the silicon area of the proposed quantizer. In order to accommodate the incomplete settling and conversion error during the first SAR conversion, we incorporate redundancy in this ADC by doubling the search range for the second SAR conversion. Consequently, this SAR effectively creates 50 quantization levels, i.e., 35.7-dB SQNR. The redundancy is implemented by selecting appropriate VREFs ( $\pm 4/5 \cdot$  VREF and  $\pm 4/25 \cdot$  VREF) for two-cycle SAR conversions that are generated by a separate reference capacitor DAC. Both signal and reference differential CDACs comprise 50 unit capacitors; thus, 40- and 8-unit capacitors are selected to create the corresponding VREFs for each SAR cycle.



Fig. 14. Measured eye diagrams, channel response, and BER bathtub of the PAM4.

Fig. 12(b) illustrates the asynchronous clock generation of this ADC. Here, we design a multi-input ready signal detector logic to select the slowest ready signal among the three comparators as an indicator that all the comparisons are complete. We also apply a pre-determined time-out period to advance the SAR conversion in case any of the voltage comparators encounters the metastability issues.

## IV. MEASUREMENT

The silicon prototype is fabricated in a 65-nm CMOS technology with a 1.2-V supply for the clock network and 1.0 V for the analog core. Fig. 13 depicts the measurement results of standalone ADC performance without IIR equalization and channel loss. Other than the foreground channelto-channel sub-ADC offset/gain calibration, all circuit mismatches, including comparator offsets, signal/reference DAC gain difference, sampling skews, and input buffer offset/gain mismatches, are corrected by on-chip calibration circuits. The dynamic performances of individual sub-ADCs and TI-ADC at a sampling rate of 12.8 GS/s with differential 0.8 Vpp sinusoids are plotted in Fig. 13. The peak SNDR results for sub-ADC are 30.9 and 25 dB at low (20 MHz) and high (~6.4 GHz) input frequencies, respectively. In the low input frequency, both quantization noise and comparator noise limit receiver performance, while in the high input frequency, increased sampling of clock jitter noise and third order harmonic distortion from the ADC buffer constraint achievable by ENOB. The SNDR of TI-ADC is maintained above 24 dB up to the Nyquist frequency, thereby yielding a FOM<sub>Walden</sub> of 212.5 fJ/conv.-step. The measured peak INL and DNL are +0.71/-0.87 and +0.73/-0.36 LSB, respectively.

Fig. 14 illustrates that two channels with a loss of 14.6 and 9.6 dB at 6.4 GHz are used to evaluate the receiver's performance under NRZ and the PAM4 modulation scheme with a  $2^9-1$  PRBS pattern, respectively. Note that we separately optimize the IIR equalizer in order to maximize the eye openings for NRZ and PAM4 under different



Fig. 15. Measured eye diagrams before and after applying the proposed embedded IIR equalization technique.



Fig. 16. Chip micrograph.

channel conditions. Since there is no CTLE, VGA, or DSP equalization as in a typical wireline receiver, we applied fourtap TX FFE using an arbitrary waveform generator (AWG) and an 8-dB external wide-band gain to characterize the combined performance of IIR equalization and ADC under the PAM4 modulation. According to the measured bathtub, RX achieves  $\sim 0.2$  UI opened eye with  $< 10^{-4}$  BER (limited by the maximal ADC output bit storage in our test setup), thereby yielding an efficiency of 2.43 pJ/b. Fig. 15 depicts the situation before and after the application of the proposed equalization technique at 6.4 and 12.8Gb/s under NRZ and PAM4 modulations, respectively, without using any TX gain and FFE. The comparison table on the recent ADCbased receiver with a similar data rate [3], [13]-[18] and the area/power breakdown are summarized in Table II. This work consumes 62.1 mW, whereas the IIR filter and TI-ADC consume 1.74 and 36.3 mW, respectively. Also, the power consumption comparison between the voltage-and time-domain comparators shows the energy saved on the proposed quantizer while reducing the number of power-hungry voltage-domain comparators. Fig. 16 illustrates the micrograph of the receiver

| TABLE II                                                                                                        |
|-----------------------------------------------------------------------------------------------------------------|
| COMPARISON TABLE OF THE STATE-OF-THE ART ADC-BASED RECEIVER WITH SIMILAR DATA RATE AND CORE AREA/POWER BREAKDOW |
|                                                                                                                 |

|                                    |        | This work                        |       | S. Kiran [13]<br>CICC 2018          | A. Roshan-Zamir<br>CICC 2018 [14]  | L. Wang [3]<br>ISSCC 2018 | S. Rylov [15]<br>ISSCC 2016                          | D. Cui [16]<br>ISSCC 2016 | E. Z. Tabasy [17]<br>CICC 2012 | B. Zhang [18]<br>ISSCC 2013 |
|------------------------------------|--------|----------------------------------|-------|-------------------------------------|------------------------------------|---------------------------|------------------------------------------------------|---------------------------|--------------------------------|-----------------------------|
| Technology                         |        | 65nm CMOS                        |       | 65nm CMOS                           | 65nm CMOS                          | 16nm FinFET               | 32nm SOI                                             | 28nm CMOS                 | 90nm CMO S                     | 40nm CMOS                   |
| Power supply (V)                   |        | 1.0 / 1.2                        |       | 0.9 / 1.1                           | 1.2                                | 0.9                       | 1.0                                                  | 1.0/1.5                   | 1.3                            | 1.0                         |
| Sampling rate (GS/s)               |        | 12.8                             |       | 16                                  | 28                                 | 32.1875                   | 25.6                                                 | 16                        | 1.6                            | 8.5 – 11.5                  |
| ADC Architecture                   |        | 5.6-bit<br>x8 TI-SAR             |       | 6-bit<br>x8 TI-SAR                  | 2-bit<br>x4 TI-Flash               | 6-bit<br>x8 TI-Flash      | 5-bit<br>Flash                                       | 8bit<br>x32 TI-SAR        | 6-bit<br>x16 TI-SAR            | 5-bit<br>Flash              |
| Pre-E qualization                  | RX AFE | Buffer<br>Embedded<br>IIR filter |       | CTLE+VGA<br>Embedded<br>FFE (3-tap) | CTLE<br>1-tap FIR<br>1-tap IIR DFE | CTLE<br>VGA               | CTLE<br>VGA                                          | CTLE<br>PGA               | Embedded<br>DFE                | CTLE<br>PGA                 |
|                                    | TX FFE | None                             | 4-tap | None                                | 2-tap                              | 3-tap                     | None                                                 | N/A                       | None                           | None                        |
| Post-Equalization                  |        | None                             | None  | 12-tap FFE<br>2-tap DFE             | None                               | FFE<br>0 : 0 : 8          | 8-tap FFE<br>8-tap DFE                               | N/A                       | None                           | Adaptive<br>FFE/DFE         |
| Channel Loss (dB)                  |        | 14.6                             | 9.6   | 30                                  | 20.8                               | 8.6 21.7 29.5             | 40                                                   | 32                        | N/A                            | 34                          |
| RX power<br>efficiency<br>(pJ/bit) | NRZ    | 4.85                             |       |                                     |                                    |                           | 17.7 12.1                                            |                           | 12.5                           | 18.9                        |
|                                    | PAM4   |                                  | 2.43  | 5.1875                              | 4.63                               | 1.55 2.87 4.4             |                                                      | 10                        |                                |                             |
| RX power (mW)                      |        | 62                               | 2.1   | 166                                 | 259                                | 100 185 284               | 453 310                                              | 320                       | 20.1                           | 195                         |
| Notes for power calculation        |        | Embedded IIR Filter<br>ADC       |       | AFE<br>ADC                          | CTLE<br>ADC<br>DSP<br>CDR          | AF E<br>AD C              | AFE<br>ADC<br>CDR<br>DSP<br>AFE<br>ADC<br>CDR<br>CDR | AFE<br>ADC                | Embedded DFE<br>ADC            | AFE<br>ADC                  |
| RX core area (mm <sup>2</sup> )    |        | 0.24                             |       | 1.58                                | 0.51                               | 0.1625                    | 0.39                                                 | 0.89                      | 0.24                           | 0.82                        |
| Number of CTLE+VGA stage           |        | (1-stage Buffer)                 |       | 2+1+1(B uff)                        | 1                                  | 1+1                       | 2+1                                                  | 1+2                       | None                           | 1+2                         |
| ΔG (dB) <sup>†</sup>               |        | 11.5                             |       | 11                                  | 6^                                 | 6                         | 12                                                   | 7                         | N/A <sup>^</sup>               | 7~8*                        |

 $^{\dagger}\Delta G$  = Gain difference between high and low Frequency

^Embedded FFE/DFE effects are not included.

\* Estimated value from published data



die occupying an active area of 1.28 mm  $\times$  0.75 mm (core: 0.24 mm^2).

## V. CONCLUSION

In this article, we presented an ADC-based receiver for NRZ/PAM4 modulation, featuring a TDC-assisted multibit/cycle asynchronous SAR ADC with embedded discretetime IIR equalizer. It aimed to minimize the circuit implementation overhead by reusing the existing sampling network of TI ADCs and incorporates active  $G_m$ -C integrators to form a tunable IIR equalizer response and relax requirements of the front-end CTLE or back-end DSP equalizers. We concluded that compared to the discrete-time FIR equalizer, this embedded IIR topology can potentially provide more equalization gain using fewer filter taps.

## ACKNOWLEDGMENT

The authors would like to thank Dr. Z. (Joe) Wu and J.-R. Guo from Intel Corporation, Santa Clara, CA, USA, for

their helpful advice and the helpful technical discussion with Dr. H. Hashemi and Dr. S. Chung.

#### REFERENCES

- S. Palermo *et al.*, "CMOS ADC-based receivers for high-speed electrical and optical links," *IEEE Commun. Mag.*, vol. 54, no. 10, pp. 168–175, Oct. 2016.
- [2] S. Kiran, S. Cai, Y. Zhu, S. Hoyos, and S. Palermo, "Digital equalization with ADC-based receivers: Two important roles played by digital signal processingin designing analog-to-digital-converter-based wireline communication receivers," *IEEE Microw. Mag.*, vol. 20, no. 9, pp. 62–79, May 2019.
- [3] L. Wang, Y. Fu, M.-A. LaCroix, E. Chong, and A. C. Carusone, "A 64Gb/s PAM-4 transceiver utilizing an adaptive threshold ADC in 16nm FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 110–111.
- [4] J.-W. Nam, M. Hassanpourghadi, A. Zhang, and M. S.-W. Chen, "A 12-Bit 1.6, 3.2, and 6.4 GS/s 4-b/cycle time-interleaved SAR ADC with dual reference shifting and interpolation," *IEEE J. Solid-State Circuits*, vol. 53, no. 6, pp. 1765–1779, Jun. 2018.
- [5] J.-W. Nam and M. S.-W. Chen, "A 12.8-Gbaud ADC-based NRZ/PAM4 receiver with embedded tunable IIR equalization filter achieving 2.43pJ/b in 65nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf.* (CICC), May 2019, pp. 1–4.

- [6] A. Manian and B. Razavi, "A 40-Gb/s 14-mW CMOS wireline receiver," IEEE J. Solid-State Circuits, vol. 52, no. 9, pp. 2407–2421, Sep. 2017.
- [7] G. Giustolisi, G. Palmisano, and G. Palumbo, "1.5 V power supply CMOS voltage squarer," *Electron. Lett.*, vol. 33, no. 13, pp. 1134–1135, Jun. 1997.
- [8] D. Kim, S. Park, and M. Song, "A wideband fully differential source follower," *Analog Integr. Circuits Signal Process.*, vol. 72, no. 1, pp. 155–161, Jul. 2012.
- [9] M. S.-W. Chen and R. W. Broderson, "A 6b 600MS/s 5.3mW Asynchronous ADC in 0.13μm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2669–2680, Dec. 2006.
- [10] J.-W. Nam and M. S.-W. Chen, "An embedded passive gain technique for asynchronous SAR ADC achieving 10.2 ENOB 1.36-mW at 95-MS/s in 65 nm CMOS," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 63, no. 10, pp. 1628–1638, Oct. 2016.
- [11] H. J. M. Veendrick, "The behaviour of flip-flops used as synchronizers and prediction of their failure rate," *IEEE J. Solid-State Circuits*, vol. 15, no. 2, pp. 169–176, Apr. 1980.
- [12] J. Liu, C.-H. Chan, S.-W. Sin, U. Seng-Pan, and R. P. Martins, "A 89fJ-FOM 6-bit 3.4GS/s flash ADC with 4× time-domain interpolation," in Proc. IEEE Asian Solid-State Circuits Conf., Nov. 2015, pp. 1–4.
- [13] S. Kiran, S. Cai, Y. Luo, S. Hoyos, and S. Palermo, "A 32 Gb/s ADCbased PAM-4 receiver with 2-bit/stage SAR ADC and partially-unrolled DFE," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, May 2018, pp. 1–4.
- [14] A. Roshan-Zamir *et al.*, "A 56 Gb/s PAM4 receiver with low-overhead threshold and edge-based DFE FIR and IIR-tap adaptation in 65nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, May 2018, pp. 1–4.
- [15] S. Rylov et al., "A 25Gb/s ADC-based serial line receiver in 32nm CMOS SOI," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.* Papers, Jan./Feb. 2016, pp. 56–57.
- [16] D. Cui et al., "A 320mW 32Gb/s 8b ADC-based PAM-4 analog front-end with programmable gain control and analog peaking in 28nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan./Feb. 2016, pp. 55–59.
- [17] E. Z. Tabasy, A. Shafik, S. Huang, N. Yang, S. Hoyos, and S. Palermo, "A 6b 1.6GS/s ADC with redundant cycle 1-tap embedded DFE in 90nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Sep. 2012, pp. 1–4.
- [18] B. Zhang et al., "A 195mW / 55mW dual-path receiver AFE for multistandard 8.5-to-11.5 Gb/s serial links in 40nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 34–35.



**Jae-Won Nam** (S'13) received the Ph.D. degree in electrical engineering from the University of Southern California (USC), Los Angeles, CA, USA, in 2019.

From 2008 to 2012, he was a Research Staff with Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea, working on implementing mixed-signal read-out integrated circuits and systems for various sensor interfaces. In the fall of 2017, he was a Graduate Intern with Data-Center-Group, Intel Corporation, Santa Clara,

CA, USA, working on the next-generation high-speed I/O architectures. He is currently an Analog engineer with I/O Circuit Technology Team, Advanced Design Group, Intel Corporation, Hillsboro, OR, USA. His current research interests include designing low-power, high-speed and high-resolution analog-to-digital data converters and high-speed I/O interface circuits.

Dr. Nam was a recipient of the President Award from Korea Advanced Institute of Science and Technology IT Convergence Campus (KAIST-ICC) in 2006 and the Outstanding Employee Award from ETRI in 2009 and a co-recipient of the Silver Prize from the Tenth Korea Intellectual Property Office (KIPO) Circuit Design Contest in 2009. He was a recipient of the Ph.D. Fellowship from the Viterbi-USC Graduate School of Engineering from 2012 to 2014. He received the Best Student Paper (third place) Award at IEEE Custom Integrated Circuits Conference (CICC) in 2019.



**Mike Shuo-Wei Chen** (M'6–SM'18) received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1998, and the M.S. and Ph.D. degrees in electrical engineering from the University of California at Berkeley, Berkeley, CA, USA, in 2002 and 2006, respectively.

From 2006 to 2010, he was a member of the Analog IC Group, Atheros Communications (now Qualcomm Atheros, San Jose, CA, USA), working on mixed-signal and RF circuits for various wireless

communication products. He is currently an Associate Professor with the Electrical Engineering Department, University of Southern California (USC), Los Angeles, CA, USA, where he is currently the Colleen and Roberto Padovani Early Career Chair. As a graduate student researcher, he proposed and demonstrated the asynchronous SAR ADC architecture, which has been adopted for low-power high-speed analog-to-digital conversion products in the industry. He leads an analog mixed-signal circuit group, focusing on high-speed low-power data converters, bioinspired/biomedical electronics, radio frequency synthesizers, and DSP-enabled analog circuits and systems. His research group has been exploring new circuit architectures that excel beyond the technology limitation.

Dr. Chen was a recipient of the NSF Faculty Early Career Development (CAREER) Award in 2014, the DARPA Young Faculty Award (YFA) in 2014, the Analog Devices Outstanding Student Award for Recognition in IC design in 2006, and the UC Regents' Fellowship at Berkeley in 2000. He has also achieved an honorable mention in the Asian Pacific Mathematics Olympiad in 1994.