INTRODUCTION
IN various computing and signal processing applications, parallel multiplier has been a basic building block for many algorithms. Many high performance algorithms and architectures have been proposed to accelerate multiplication. Multiplication can be divided into three steps:
generating partial products,
summing up all partial products until only two rows remain,
and adding the remaining two rows of partial products by using a carry propagation adder.
In the first step, two methods are commonly used to generate partial products. The first method generates partialproduct directly by using a 2-input AND gate. The second one uses radix-4 modified Booth encoding (MBE) to generate partial products. Radix-4 MBE has been widely used in parallel multipliers to reduce the number of partial products by a factor of two. In the speed performance of using radix-4 MBE was denied. However, it is found herein that these results depend on the implementation of MBE scheme.
After generating partial products, a partial product reduction tree (PPRT) is used to sum up all the partial products efficiently. The Wallace tree and Carry-save tree were developed to solve this problem. Both approaches employ 3:2 counter, i.e., full adder, as their basic element. Generally, a counter compresses (n-1) rows of partial products into log2 (n)rows of partial products. However, the delay of an n ΓΏ 1 : log2 n(n)counter is still proportional to log2 (n) 1 times of a full adder (FA) as the inputs are assumed to arrive simultaneously. Therefore, using larger counters to build PPRT is not beneficial. The introduction of 4:2 compressor was a departure from the counter-based scheme. As the delay paths are well balanced, the latency for a 4:2 compressor is only three XOR delays, rather than two full adder delays. Note that the difference between the compressor and the traditional balanced delay tree is that the compressor considers the fast path and the slow path of a full adder. To further speed up, a search algorithm, Three-Dimensional- reduction-Method (TDM) , was proposed.
The TDM algorithm finds optimal PPRT by carefully modeling the delay paths of a counter and constructing n:2 column compressor according to inputs arrival time. Owing to the effectiveness of the column compressor, the PPRT constructed by using TDM algorithm outperforms the conventional designs. However, few studies have been done on using TDM with MBE. This paper examines the performance of parallel multiplier constructed with TDM and MBE. According to our results, such a design can be faster and occupy a smaller area than a non-Booth design.To generate the product in 2's complement format, a fast carry-propagation adder is required to add the final two rows of partial products from the PPRT.
The problem of designing a final adder is that the input signals do not arrive simultaneously, unlike the ordinary carry-propagation adder design that assumes all the inputs arrive simultaneously. Several techniques have been developed to eliminate or reduce the final adder delay. The Left-to-Right-Carry-Free algorithm proposed in requires n-level conversions to generate n-bit MSB products. It was improved in by reducing the levels required. However, this approach still cannot fully exploit the unequal delay profile because it applies to the MSB-part only. In a hybrid adder structure, which consists of ripple-carry adder, carry-skip adder, and conditional-sum adder blocks, was proposed. However, their empirical methodology is not general enough and requires many trials to determine the final adder partition boundary for different sizes of multiplier, thus increasing design effort. In this project, a propose design methodology for high-speed Booth encoded parallel multiplier.
IN various computing and signal processing applications, parallel multiplier has been a basic building block for many algorithms. Many high performance algorithms and architectures have been proposed to accelerate multiplication. Multiplication can be divided into three steps:
generating partial products,
summing up all partial products until only two rows remain,
and adding the remaining two rows of partial products by using a carry propagation adder.
In the first step, two methods are commonly used to generate partial products. The first method generates partialproduct directly by using a 2-input AND gate. The second one uses radix-4 modified Booth encoding (MBE) to generate partial products. Radix-4 MBE has been widely used in parallel multipliers to reduce the number of partial products by a factor of two. In the speed performance of using radix-4 MBE was denied. However, it is found herein that these results depend on the implementation of MBE scheme.
After generating partial products, a partial product reduction tree (PPRT) is used to sum up all the partial products efficiently. The Wallace tree and Carry-save tree were developed to solve this problem. Both approaches employ 3:2 counter, i.e., full adder, as their basic element. Generally, a counter compresses (n-1) rows of partial products into log2 (n)rows of partial products. However, the delay of an n ΓΏ 1 : log2 n(n)counter is still proportional to log2 (n) 1 times of a full adder (FA) as the inputs are assumed to arrive simultaneously. Therefore, using larger counters to build PPRT is not beneficial. The introduction of 4:2 compressor was a departure from the counter-based scheme. As the delay paths are well balanced, the latency for a 4:2 compressor is only three XOR delays, rather than two full adder delays. Note that the difference between the compressor and the traditional balanced delay tree is that the compressor considers the fast path and the slow path of a full adder. To further speed up, a search algorithm, Three-Dimensional- reduction-Method (TDM) , was proposed.
The TDM algorithm finds optimal PPRT by carefully modeling the delay paths of a counter and constructing n:2 column compressor according to inputs arrival time. Owing to the effectiveness of the column compressor, the PPRT constructed by using TDM algorithm outperforms the conventional designs. However, few studies have been done on using TDM with MBE. This paper examines the performance of parallel multiplier constructed with TDM and MBE. According to our results, such a design can be faster and occupy a smaller area than a non-Booth design.To generate the product in 2's complement format, a fast carry-propagation adder is required to add the final two rows of partial products from the PPRT.
The problem of designing a final adder is that the input signals do not arrive simultaneously, unlike the ordinary carry-propagation adder design that assumes all the inputs arrive simultaneously. Several techniques have been developed to eliminate or reduce the final adder delay. The Left-to-Right-Carry-Free algorithm proposed in requires n-level conversions to generate n-bit MSB products. It was improved in by reducing the levels required. However, this approach still cannot fully exploit the unequal delay profile because it applies to the MSB-part only. In a hybrid adder structure, which consists of ripple-carry adder, carry-skip adder, and conditional-sum adder blocks, was proposed. However, their empirical methodology is not general enough and requires many trials to determine the final adder partition boundary for different sizes of multiplier, thus increasing design effort. In this project, a propose design methodology for high-speed Booth encoded parallel multiplier.
No comments:
Post a Comment