| March 11, 201 | 13 5:00:22pm WSPC/123-JCSC                                                                                                                                      | 1350039                                                                 | ISSN: 0218-1266                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 1 <sub>st</sub> Reading                                                      |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
| 1             | Journal of Circuits, Systems, and Compu-<br>Vol. 22, No. 6 (2013) 1350039 (12 pages)<br>© World Scientific Publishing Company<br>DOI: 10.1142/S0218126613500394 | ıters                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | World Scientific<br>www.worldscientific.com                                  |
| 3             |                                                                                                                                                                 |                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                              |
| 5<br>7        | DESIGN OF AN AREA<br>SHIFT-BA                                                                                                                                   | A-EFFICIE                                                               | ENT HIGH-TH                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | IROUGHPUT<br>*                                                               |
| 0             |                                                                                                                                                                 |                                                                         | C DECODER                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | u<br>L                                                                       |
| y             | YUN-CHING                                                                                                                                                       | TANG <sup>†,‡,§</sup> , H                                               | ONG-REN WANG <sup>†</sup>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | ,                                                                            |
| 11            | HONGCHII<br><sup>†</sup> Departr                                                                                                                                | n LIN <sup>1</sup> <sup>**</sup> and J<br>nent of Electric              | UN-ZHE HUANG'<br>cal Engineering,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                              |
| 13            | Ivatio                                                                                                                                                          | nai Chung Hsir<br>Taichung 402,                                         | ig Oniversity,<br>Taiwan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                              |
| 15            | <sup>‡</sup> Departr<br>Hsiuping U                                                                                                                              | nent of Electros<br>University of Sc                                    | nic Engineering,<br>ience Technology,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                              |
| 17            |                                                                                                                                                                 | Taichung 412,<br><sup>§</sup> tangyc@hust.<br><sup>¶</sup> hclin@nchu.e | $Taiwan \\ edu.tw \\ edu.tw$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                              |
| 19            | R                                                                                                                                                               | eceived 4 Febru<br>Accepted 31 Ju                                       | 1ary 2012                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                              |
| 21            |                                                                                                                                                                 | Publishe                                                                | d                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                              |
| 23            | An area-efficient high-throughput s<br>specially designed (512, 1,024) parit<br>the min-sum algorithm (MSA). To                                                 | shift-based LD<br>y-check matrix                                        | PC decoder archited<br>is effective for participated in the second s | cture is proposed. The<br>al parallel decoding by<br>ag, two data frames are |
| 25            | fed into the decoder to minimize idle<br>unit (VNU). Thus, the throughput<br>architecture, the measurement                                                      | e time of the ch<br>is increased to                                     | eck node unit (CNU<br>o almost two-fold. U                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | ) and the variable node<br>Inlike the conventional                           |
| 27            | registers. Therefore, hardware costs<br>also reduced, which increases energ                                                                                     | are reduced. Ro<br>y efficiency. A                                      | nt registers instead<br>outing congestion and<br>n implementation of                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | l critical path delay are<br>f the proposed decoder                          |
| 29            | using TSMC $0.18 \mu\text{m}$ CMOS process<br>frequency of 56 MHz, a supply volta                                                                               | s achieves a dec<br>age of 1.8 V, an                                    | coding throughput of<br>a core area of 5.13                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | f $1.725$ Gbps, at a clock $8 \text{ mm}^2$ . The normalized                 |
| 31            | area is smaller and the throughput<br>reported using the conventional arc                                                                                       | per normalize<br>hitectures.                                            | d power consumptio                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | on is higher than those                                                      |
| 33            | <i>Keywords</i> : Low-density parity-chec<br>decoder.                                                                                                           | ek codes; VLSI                                                          | decoder architectu                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | res; shift-based LDPC                                                        |
| 35            | 1. Introduction                                                                                                                                                 |                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                              |
| 37            | With the advances in informatio                                                                                                                                 | n transmiss                                                             | ion, the error co                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | rrection code has been                                                       |
| 39            | used to correct the transmission<br>The low-density parity-check (LI                                                                                            | n errors and DPC) code fi                                               | reduce required<br>rst introduced b                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | l transmission energy.<br>y Gallager in $1962^1$ is a                        |
| 41            |                                                                                                                                                                 |                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                              |

### March 11, 2013 5:00:2

1

3

5

7

9

11

#### 5:00:22pm WSPC/123-JCSC

-JCSC 1350039

### $Y.\text{-}C. \ Tang \ et \ al.$

binary linear block code, which is capable of approaching the Shannon limit. Since the last decade, advances in VLSI technology have generated interest in the use of LDPC<sup>1-3</sup> codes. Several specifications have adopted the LDPC codes as an error correction code to enhance transmission quality, such as DVB-S2, 10 GBASE-T, WiFi (802.11 n), WiMax (802.16e) and the fourth-generation mobile communications (4 G).

The belief propagation algorithm  $(BPA)^4$  with slight simplification of the theoretical algorithm provides accurate decoding capability. Blanksby *et al.*<sup>5,6</sup> implemented a 1-Gb/s fully parallel decoder using BPA, but the circuit occupied a large chip area. Complex circuits can be simplified by using the min-sum algorithm (MSA),<sup>7</sup> which provides acceptable accuracy.

In addition to decoding algorithms, circuit complexity can be further reduced in the structured LDPC by using a regular structure of parity-check matrix (*H* matrix), which is more appropriate for hardware implementation compared to a randomly generated *H* matrix. Commonly used structured LDPC codes include quasi-cyclic LDPC (QC-LDPC) codes,<sup>8</sup> array LDPC codes<sup>9</sup> and Reed Solomon-based LDPC codes (RS-LDPC).<sup>10</sup> The *H* matrices are composed of shifted sub-matrices that must be carefully selected, since the numbers and sizes of sub-matrices as well as the minimum number of cycles (i.e., girth) affect the decoding performance and circuit complexity.

These LDPC decoders can be implemented using fully parallel,<sup>11</sup> partial parallel,<sup>12</sup> 21or serial<sup>13</sup> architectures. In the fully parallel architecture, each check or variable node requires a processor. Therefore, its throughput is very high. However, it occupies 23huge chip area due to numerous processors and complex interconnections caused by a quite large number of irregular edges. On the contrary, the interconnect complexity 25can be reduced by employing the serial architecture, which is impractical because 27of small throughput, and thus limits the applications of serial LDPC decoders. The partial parallel architecture significantly reduces the node processing units rather than the fully parallel architecture. Therefore, this trade-off choice has been widely 29used in many studies.<sup>14</sup> For example, the early version<sup>12</sup> employs the many multiplexers (mux) and de-multiplexers (demux) with long latency. The other flexible 31architecture<sup>15</sup> results in greater circuit complexity and smaller throughput than the standard architectures do. 33

For practical applications of LDPC decoders, IC chips tend to have low cost and low power. In this paper, a specially designed *H* matrix of LDPC decoder was implemented using the partial parallel LDPC architecture with the special shiftregister technique to achieve the best performance, including a figure of merit defined as throughput divided by normalized power.<sup>16,17</sup>

The organization of this paper is as follows. After the background is discussed, the design of the proposed LDPC code is described. Next, the architecture of the partial
 parallel LDPC decoder is proposed. Then, the experimental results and comparison are presented. The final section is the conclusion.

March 11, 2013

5:00:22pm

ISSN: 0218-1266 1 st Reading

An Area-Efficient High-Throughput Shift-Based LDPC Decoder

#### 2. Background 1

The QC-LDPC codes are structured LDPC codes with good regularity, which is 3 appropriate for hardware implementation. Figure 1 illustrates its H matrix  $(H_{m \times n})$ , which is composed of  $p \times p$  shifted unitary matrices  $S_{i,j}$ , where i and j mean the row 5and column indices are integers from 1 to m/p and n/p, respectively. This QC-LDPC code with block length n has column and row weights of m/p and n/p, which 7 represent the number of 1's in a column and a row, respectively. Each matrix  $S_{i,j}$ in an H matrix is a  $p \times p$  unitary matrix shifting to the right  $s_{i,j}$  times, where 9  $0 \leq s_{i,j}$ 

Figure 2 shows that the conventional LDPC decoding  $\operatorname{architecture}^{12}$  can be 11 divided into four major modules: variable node units (VNUs), check node units (CNUs), and two message storage units ( $\Delta$  and  $\Lambda$  registers). The message storage 13units store the updated CNU and VNU data. The CNUs start to work only after the entire VNU computation is complete, and vice versa. In the QC-LDPC H matrix 15in Fig. 1, the shifted unitary matrices can be calculated column by column in the VNU operation and then compared row by row in the CNU operation. This partial 17parallel architecture significantly reduces the node processors to p rather than to n or *m* in the fully parallel architecture. That also results in reduced routing complexity. 19However, the additional mux and demux are required.<sup>12</sup> Registers with mux and demux can be replaced by memories like register files, which are marked by the 21dashed lines in Fig. 2. Besides, data "re-order" blocks for aligning updated data in the correct positions are needed before data are restored in the registers. In specific 23

- 25 $H_{mea} = \begin{bmatrix} s_{1,1} & s_{1,2} & \cdots & s_{1,(a/p)-1} & S_{1,(a/p)} \\ S_{2,1} & & & S_{2,(a/p)} \\ \vdots & & \ddots & \vdots \\ S_{(m/p)-1,1} & & S_{(m/p)-1,(a/p)} \\ S_{2,2} & S_{2,2} & S_{2,2} \\ S_{2,2} & S_{2,2}$ 27
- 29

31

33

35



39

41

Fig. 2. Architecture of conventional partial parallel LDPC decoder.



37

39

41

to reduce latency, but the idling time of CNUs or VNUs still approximates to 50%. To design a good H matrix, girth which is the minimum cycle, or cycle length, 3 significantly determines the decoding performance. The conventional QC-LDPC and RS-LDPC codes were further enhanced to the partition-and-shift LDPC (PS-LDPC) 5 $code^{18}$  which theoretically optimizes the girth. The algorithm counts 2t translations between the shifted unitary matrices. If and only if there exists a closed loop of 7 2t cycles exits in the tanner graph, then Eq. (1) holds,<sup>18</sup> where "mod" means modulus operation and  $s_{\alpha 1,\beta 1}, s_{\alpha 2,\beta 2}, \ldots, s_{\alpha 2t,\beta 2t}$ , represent the shifted numbers in the shifted 9 unitary matrix. The minimum cycle number, 2t, is the girth. 11  $\mathrm{mod}[(-1)^{1}\mathbf{s}_{\alpha_{1},\beta_{1}}+(-1)^{2}\mathbf{s}_{\alpha_{2},\beta_{2}}+\cdots+(-1)^{2t-1}\mathbf{s}_{\alpha_{2t-1},\beta_{2t-1}}+(-1)^{2t}\mathbf{s}_{\alpha_{2t},\beta_{2t}},p]=0.$ (1)Here, this PS-LSPC code algorithm is adopted to maximize the girth. 13153. Design of LDPC Code 17Figure 3 shows the proposed H matrix using PS-LDPC codes which can be partitioned into  $4 \times 4$  sub-blocks. Each sub-block contains k shifted unitary matrices. The 19column and row weights of each sub-block are 1 and k, respectively.

permutations of the H matrix,<sup>12</sup> some overlapped scheduling processes are also used

In our design, all diagonal sub-blocks  $(H_{14}, H_{23}, H_{32} \text{ and } H_{41})$  are zero matrices. The advantage is to avoid the data access in memories at the same triggered edge. For example, if all sub-blocks are non-zero matrices, after the CNUs work for the last row  $(H_{41}, H_{42}, H_{43} \text{ and } H_{44})$ , the updated data for  $H_{41}$  must be stored in the memory as shown in Fig. 2 and read out at the same time for the next VNU operation of the first column  $(H_{11}, H_{21}, H_{31} \text{ and } H_{41})$ . The register-file memories cannot be applied in this situation. The solution is to use flip-flop based registers. However, the mux and demux before and after the registers increase the critical paths and routing complexity. With zero diagonal sub-blocks, the critical paths can be shortened and routing complexity can be reduced.

To further reduce the number of mux and power consumption, the sub-blocks  $H_{11}$ ,  $H_{22}$ ,  $H_{33}$  and  $H_{44}$  are composed of k unitary matrices without shift. Figure 4 illustrates the proposed parity-check matrix, in which "I" indicates a  $p \times p$  unitary matrix, and each sub-block has k shifted or non-shifted unitary matrices. The parameter k gives the coding rate of (k-1)/k. The coding length is  $4 \times k \times p$ , where p can be used to adjust the coding length. Therefore, it is a very flexible approach to provide

| H <sub>11</sub> | H <sub>11</sub> | H <sub>12</sub>   | 2 H <sub>13</sub>            | H <sub>14</sub> |
|-----------------|-----------------|-------------------|------------------------------|-----------------|
| H <sub>21</sub> | H <sub>21</sub> | $\mathbf{H}_{22}$ | H <sub>23</sub>              | $H_{24}$        |
| H <sub>31</sub> | H <sub>31</sub> | H <sub>32</sub>   | <sub>2</sub> H <sub>33</sub> | H <sub>34</sub> |
| H <sub>41</sub> | H <sub>41</sub> | H <sub>42</sub>   | 12 H <sub>43</sub>           | H44             |

Fig. 3. Partition of the *H* matrix.

An Area-Efficient High-Throughput Shift-Based LDPC Decoder



41 Fig. 6. Performance comparison of the randomly generated and proposed LDPC codes with different iterations for BPSK modulation.

Y.-C. Tang et al.

LDPC code with the same (1,024, 3, 6) matrix properties by using BPSK modulation with 4, 8 and 16 iterations. The performance of the proposed LDPC code with girth = 8 is virtually identical to that of the randomly generated LDPC code with girth = 6. Since the curve of eight iterations is close to that of 16 iterations, eight iterations were adopted in the following implementation.

7

# 4. Shift-Based LDPC Decoding Architecture

Figure 7 shows the partial parallel decoding architecture using the min-sum algorithm with the specially designed LDPC code given in Fig. 5. The variable node calculation contains 256 columns in the H matrix, so 256 VNUs work in parallel. Similarly, the check node calculation runs 128 rows, thus 128 CNUs work in parallel.
Both the variable node and the check node operations require four clock cycles, respectively, in a decoding iteration.

Fespectively, in a decoding iteration.
For the optimal use of VNUs and CNUs with the minimal idle time, two frames of the received log likelihood ratios (LLR) are stored in the input buffers. The first frame is transferred to the VNUs with the updated data from the check-to-variable storage unit (CTVSU), if available, during the first four clocks. Then, the second frame of data is fed to the VNUs in the next four clocks. The process is named as "collection." The VNU outputs are stored in the variable-to-check storage unit (VTCSU), which is composed of shift registers. In the following "check" procedure, VTCSU data are sent to the CNUs for updating. The iteration is complete when the data are stored in the CTVSU. Note that the two data frames are computed using

The architectures of the VTCSU and CTVSU message storage units differ from those of the conventional approaches given in Fig. 2. Either register-file memories or registers with mux and demux occupy large chip areas and consume more power compared to the proposed shift registers-based technique. Figure 8 shows the delay clock cycles for the VTCSU. That shows how the outputs of VNUs are scheduled to the inputs of CNUs. In Fig. 8(a), the sub-blocks marked by the bold lines give the 31

From channel 33 ¥ Input <u>buffer</u> VNU<sub>1</sub> CNU<sub>1</sub> 35Input VNU<sub>2</sub> CNU₂ buffer Δ Δ CTVSU VTCSU ٠ Output 37buffer • Output 39 VNU<sub>256</sub> CNU<sub>128</sub> buffer Hrad Deci ¥ Decoded bits 41

VNUs and CNUs alternatively to maximize throughput.

Fig. 7. Architecture of LDPC decoder.

An Area-Efficient High-Throughput Shift-Based LDPC Decoder



13 Fig. 8. (Color online) (a) Examples of delay clock cycles in the VTCSU. (b) Delay clock cycles for each sub-block.

15 examples of the delay cycles of three sub-blocks in the VTCSU. Sub-blocks  $H_{11}$ ,  $H_{22}$ , 17  $H_{33}$  and  $H_{44}$  sequentially enter the CNUs after four clock cycles of the outputs of 17 VNUs. However, the others require different delay clock cycles to enter VNUs as 18 indicated by the underlined numbers in Fig. 8(b), which shows the sub-block  $H_{ij}$ , 19 where *i* or *j* is from 1 to 4. The  $H_{14}$ ,  $H_{23}$ ,  $H_{32}$  and  $H_{41}$  are zero matrices.

Figure 9 shows an implementation of the VTCSU using 12 blocks of shift registers categorized as A1 to A4, B1 to B6 and C1 to C2. Figure 10 plots the corresponding timing diagram. The VNU outputs  $(\Delta_1 + \Delta_2, \Delta_3 + \Delta_4, \Delta_5 + \Delta_6)$  enter the register blocks A4, B6, and C2; the outputs of A1  $(\Delta_1 + \Delta_2)$ , B1  $(\Delta_3 + \Delta_4)$ , and C1  $(\Delta_5 + \Delta_6)$  enter the CNUs. The shift registers A4 to A1 sequentially shift  $H_{11}, H_{22}, H_{33}$  and  $H_{44}$  sequentially without mux due to zero-shift unitary matrices. The shift



Fig. 9. Shift register-based architecture of VTCSU.

Y.-C. Tang et al.



Fig. 10. Timing diagram of VTCSU.

registers B6 to B1 store  $H_{13}$ ,  $H_{24}$ ,  $H_{31}$  and  $H_{42}$ . The number of delay paths of  $\Delta_3 + \Delta_4$  could be two or six, i.e., B6  $\rightarrow$  B1, or B6  $\rightarrow$  B5  $\rightarrow$  B4  $\rightarrow$  B3  $\rightarrow$  B2  $\rightarrow$  B1, respectively. The registers B5 to B2 are also shared with  $H_{12}$ ,  $H_{21}$ ,  $H_{34}$  and  $H_{43}$ , which are stored in the dedicated registers C2 and C1 as well. The delay paths of  $\Delta_5 + \Delta_6$  delay either three clock cycles (i.e., C2  $\rightarrow$  B5  $\rightarrow$  C1 or C2  $\rightarrow$  B2  $\rightarrow$  C1) or five clock cycles (i.e., C2  $\rightarrow$  B4  $\rightarrow$  B3  $\rightarrow$  B2  $\rightarrow$  C1 or C2  $\rightarrow$  B5  $\rightarrow$  B4  $\rightarrow$  B3  $\rightarrow$  C1).

The proposed VTCSU has several advantages. It not only eliminates the demux, but also reduces the number of mux. The mux to re-order the data are inserted before the registers B1 and C1 due to the zero matrices of  $H_{14}$  and  $H_{41}$ . Therefore, the minimum number of delay cycles is two and the critical paths are reduced. The routing can be distributed between different registers to reduce routing complexity.

Table 1 compares the numbers of mux and demux used in the conventional
 register based architecture and the proposed shifter based design for the same LDPC
 code. The difference is the blocks marked by the dash lines in Fig. 2 and the VTCSU
 and CTVSU in Fig. 7. The area of a 2:1 mux can be estimated to be one-third of a

| Table 1. | Comparison | of the | mux and | demux |
|----------|------------|--------|---------|-------|
|----------|------------|--------|---------|-------|

| 39 | Table 1. Comparison of the max and demax. |                                                                      |               |                 |  |
|----|-------------------------------------------|----------------------------------------------------------------------|---------------|-----------------|--|
| 00 |                                           | 4:1 mux                                                              | 2:1 mux       | 1:4 de-mux      |  |
| 41 | Conventiona                               | $\begin{array}{cc} 1 & 6 \times 1024 \\ & 4 \times 1024 \end{array}$ | 0<br>6 × 1024 | $6 \times 1024$ |  |
|    | Toposcu                                   | 4 × 1024                                                             | 0 × 1024      | 0               |  |



An Area-Efficient High-Throughput Shift-Based LDPC Decoder

1 4:1 mux. The significant difference is no demux required in our design, so the gate count and the routing complexity are reduced.

3

5

# 5. Experimental Results and Comparison

The proposed shift-based (1,024, 3, 6) LDPC decoder using eight iterations with
eight clocks per iteration was designed and implemented using the 0.18 μm CMOS process. Table 2 compares the simulated performance between the conventional and
the proposed architectures. The gate count is reduced by 15%, so the reduced critical paths and routing complexity result in the maximum operating frequency increased
by 25%. Note that the contribution of area reduction is mainly attributed to the VTCSU and CTVSU in Fig. 7. Those in the conventional LDPC decoder occupy
approximately 47% of the chip area. If the area of VTCSU and CTVSU is reduced by 32% the net area improvement is about 47% × 32% = 15%.

15 Figure 11 shows that microphotograph of the chip occupies an area of  $10.8 \text{ mm}^2$ whereas the core occupies an area of  $5.18 \text{ mm}^2$ . The chip works at a clock frequency

17



Fig. 11. (Color online) Die microphotograph of the proposed decoder.

Y.-C. Tang et al.

Table 3. Comparison of LDPC decoders with measured data.

|                                          | $\mathrm{JSSC'02}^{11}$ | $\mathrm{TCASI'06}^{12}$ | $\rm JSSC'08^{15}$ | $\mathrm{TVLSI'10}^{16}$ | $\mathrm{TVLSI'10}^{18}$ | This work       |
|------------------------------------------|-------------------------|--------------------------|--------------------|--------------------------|--------------------------|-----------------|
| CMOS technology                          | $0.16\mu{ m m}$         | $0.18\mu{ m m}$          | $0.13\mu{ m m}$    | $90\mathrm{nm}$          | $0.18\mu{ m m}$          | $0.18\mu{ m m}$ |
| Parallelism                              | Fully                   | Partial                  | Fully              | Fully                    | Partial                  | Partial         |
| Spec.                                    | (1,024,512)             | (1,024,512)              | (660, 480)         | (1,024,512)              | _                        | (1,024, 512)    |
| Code rate                                | 0.5                     | 0.5                      | 0.73               | 0.5                      | 2/5, 3/5, 4/5            | 0.5             |
| Iterations                               | 64                      | 8                        | 15                 | 16                       | 15                       | 8               |
| Supply voltage (V)                       | 1.5                     | 1.62                     | 1.2                | 1.2                      | 1.8                      | 1.8             |
| Frequency (MHz)                          | 64                      | 200                      | 300                | 400                      | 125                      | 56              |
| Throughput (Mbps)                        | 1,000                   | 985                      | 3,300              | 13,210                   | 104.5                    | 1,725           |
| Chip area $(mm^2)$                       | 52.5                    | 10.08                    | 7.3                | 4.97                     | 9.76                     | 5.18            |
| Normalized area                          | 3.91                    | 0.59                     | 1.36               | 1.17                     | _                        | 0.305           |
| $(10^{-3}{ m mm^2})$                     |                         |                          |                    |                          |                          |                 |
| Power (mW)                               | 690                     |                          | 1383               | 577                      | 486                      | 51.1            |
| Normalized power (mW)                    | 306.67                  | _                        | 960.42             | 400                      | 150                      | 15.8            |
| Normalized energy<br>(pJ/bit/iter)       | 4.79                    |                          | 19.40              | 1.89                     | 95.69                    | 1.14            |
| Throughput/normalized<br>power (Mbps/mW) | 3.26                    | —                        | 3.44               | 33                       | 0.7                      | 109             |

17

of 56 MHz at a supply voltage of 1.8 V. The throughput is 1.725 Gbps because two frames are decoded simultaneously.

Table 3 summarizes the results of performance comparisons with measurement results reported in the literature. Due to various processes and supplied voltages in the literature, some papers<sup>16,17</sup> define the following three normalized parameters to evaluate circuit performance.

Nomalized Area = 
$$\frac{\text{Chip Area}}{(\text{Codeword})^2(1 - \text{Code Rate}) \times (\text{Technology})^2},$$
 (2)

27

29

25

Nomalized Power =  $\frac{\text{Total Power}}{(\text{Core Power Supply})^2}$ , (3)

31 Nomalized Energy = 
$$\frac{\text{Normalized Power}}{(\text{Throughput } \times \text{Iterations})}$$
. (4)

Owing to technology scaling, the chip area is inversely proportional to the square of channel length. Thus, the normalized area is usually defined as chip area divided by (Technology)<sup>2</sup>. Besides, if the *H* matrix is large, the chip area is also large. Therefore, the normalized area is further divided by the sizes of *H* matrix, which is proportional to (codeword) × [(codeword) × (1 - code rate)], as shown in Eq. (2).

Power consumption is proportional to the square of supply voltage, so the normalized power is given in Eq. (3). To evaluate the power efficiency, it is reasonable to compare the energy required for each bit per iteration. Therefore, the normalized energy is defined as the normalized power divided by the product of throughput and iteration numbers as shown in Eq. (4).

March 11, 2013 5:00:

5:00:34pm WSP

WSPC/123-JCSC 1350039

9 ISSN: 0218-1266

 $_{266}$  **1** st Reading

An Area-Efficient High-Throughput Shift-Based LDPC Decoder

In this work, all three normalized factors are very small. Specifically, for high throughput efficiency, we propose one figure of merit: throughput divided by normalized power. This parameter is the highest in the proposed decoder than in the other decoders. Based on the above analyzes, the proposed architecture is area-efficient and power-efficient. It can decode more data in terms of unit area and unit energy.

7

23

1

3

5

# 6. Conclusion

9 A special PSLDPC parity-check matrix equivalent to the randomly generated matrix is proposed. The proposed matrix is suitable for chip implementation with 11 small area and power consumption for high throughput applications. Unlike the conventional architecture, no demux are required, and mux as well as critical paths 13 are also reduced. In an implementation with TSMC 0.18  $\mu$ m CMOS process, the proposed architecture occupies  $10.8 \,\mathrm{mm^2}$  with core area of  $5.18 \,\mathrm{mm^2}$ . The measured 15throughputs are 1.46 Gbps at 1.62 V and higher than 1.7 Gbps at 1.8 V. The small normalized area and the high throughput per normalized power indicate that the 17 proposed LDPC decoder is area-efficient and power-efficient and should be considered for use in the future LDPC decoders. 19

# 21 References

- 1. S. Haykin, *Communication Systems* 4th edn. (John Wiley & Sons, Inc).
- 2. R. Gallager, Low-density parity-check codes, IRE Tans. Inf. Theory 7 (1962) 21–28.
- T. Ohtsuki, LDPC codes in communications and broadcasting, *IEICE Trans. Commun.* E90-B3 (2007).
  - D. J. C. MacKay, Good error-correcting codes based on very sparse matrices, *IEEE Trans. Inf. Theory* 45 (1999) 399-431.
- 27 Trans. Inf. Theory 45 (1999) 399-431.
  5. X. Y. Hu, E. Eleftheriou, D. M. Arnold and A. Dholakia, Efficient implementations of the sum-product algorithm for decoding LDPC codes, *IEEE Global Telecommun. Conf.*29 2 (2001) 1036.
- K. Kasai, T. Shibuya and K. Sakaniwa, Detailedly represented irregular low-density parity-check codes, *IEICE Trans. Fundamentals* E86-A (2003) 2435-2444.
- J. Chen and M. C. Fossorier, Decoding low-density parity-check codes with normalized APP-based algorithm, *Proc. IEEE GLOBECOM*, Vol. 2 (Nov. 2001), pp. 1026–1030.
- 8. M. P. C. Fossorier, Quasi-cyclic low-density parity-check codes from circulant permutation matrices, *IEEE Trans. Inf. Theory* **50** (2004) 1788–1793.
- 35
   9. J. L. Fan, Array codes as low-density parity-check codes, Proc. 2nd Int. Symp. Turbo Codes and Related Topics (Sep. 2000), pp. 543-546.
- 37 10. I. Djurdjevic, J. Xu, K. Abdel-Ghaffar and S. Lin, A class of low-density parity-check codes constructed based on Reed-Solomon codes with two information symbols, *IEEE Comm. Lett.* 7 (2003) 317-319.
  39 10. Letter and the second second
- A. Blanksby and C. Howland, A 690-mw 1-Gb/s 1024-b, rate-1/2 low-density paritycheck code decoder, *IEEE J. Solid-State Circuits* 37 (2002) 404-412.
- S. H. Kang and I. C. Park, Loosely coupled memory-based decoding architecture for low density parity check codes, *IEEE Trans. Circuits Syst.* I 53 (2006) 1045–1056.

 $\mathrm{March}\; 11,\, 2013$ 

Y.-C. Tang et al.

| 1  | 13. | E. Yeo, P. Pakzad, B. Nikolić and V. Anantharam, VLSI architectures for iterative decoders in magnetic recording channels. <i>IEEE Trans. Magn.</i> <b>37</b> (2001) 748–755                 |
|----|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 3  | 14. | T. Zhang and K. K. Parhi, VLSI implementation-oriented (3, k)-regular low-density<br>parity-check codes, <i>IEEE Workshop on Signal Processing Systems</i> (SiPS) (Sept. 2001),<br>pp. 25-36 |
| 5  | 15. | B. Xiang, R. Shen, A. Pan, D. Bao and X. Zeng, An area-efficient and low-power multirate decoder for quasi-cyclic low-density parity-check codes, <i>IEEE Trans. VLSI</i>                    |
| 7  | 16. | Syst. 18 (2010) 1447–1460.<br>A. Darabiha, C. Carusone and F. R. Kschischang, Power reduction techniques for LDPC decoders. <i>IEEE J. Solid State Cinemite</i> 42 (2008) 1825–1845.         |
| 9  | 17. | N. Onizawa, T. Hanyu and V. C. Gaudet, Design of high-throughput fully parallel LDPC decoders based on wire partitioning, <i>IEEE Trans. VLSI Syst.</i> <b>18</b> (2010) 482–489.            |
| 11 | 18. | J. L. Membe and J. M. F. Moura, Partition-and-shift LDPC codes, <i>IEEE Trans. Magnet.</i> <b>41</b> (2005).                                                                                 |
| 13 |     |                                                                                                                                                                                              |
| 15 |     |                                                                                                                                                                                              |
| 17 |     |                                                                                                                                                                                              |
| 19 |     |                                                                                                                                                                                              |
| 21 |     |                                                                                                                                                                                              |
| 23 |     |                                                                                                                                                                                              |
| 25 |     |                                                                                                                                                                                              |
| 27 |     |                                                                                                                                                                                              |
| 29 |     |                                                                                                                                                                                              |
| 31 |     |                                                                                                                                                                                              |
| 33 |     |                                                                                                                                                                                              |
| 35 |     |                                                                                                                                                                                              |
| 37 |     |                                                                                                                                                                                              |
| 39 |     |                                                                                                                                                                                              |
| 41 |     |                                                                                                                                                                                              |