next up previous
Next: The traditional approach to Up: Analysis of Bernstein's Factorization Previous: Bibliography


Using off-the-shelf hardware for the circuit approach

In subsections 5.3-5.5 we were concerned primarily with custom-produced hardware, in accordance with the focus on throughput cost. In practice, however, we are often concerned about solving a small number of factorization problems. In this case, it may be preferable to use off-the-shelf components (especially if they can be dismantled and reused, or if discreteness is desired). Tables 2-4 in Section 5.5 contain the parameters and cost estimates for off-the-shelf hardware, using the following scheme. FPGA chips are connected in a two-dimensional grid, where each chip holds a block of mesh nodes. The FPGA we consider is the Altera Stratix EP1S25F1020C7, which is expected to cost about $150 in large quantities in mid-2003. It contains 2Mbit of DRAM and 25660 ``logic elements'' that consist each of a single-bit register and some configurable logic. Since on-chip DRAM is scant, we connect each FPGA to several DRAM chips. The FPGA has 706 I/O pins that can provide about 70Gbit/sec of bandwidth to the DRAM chips (we can fully utilize this bandwidth by ``swapping'' large continuous chunks into the on-FPGA DRAM; the algorithm allows efficient scheduling). These I/O pins can also be used for communicating with neighbouring FPGAs at an aggregate bandwidth of 280Gbit/sec. The parameters given in Table 2 are normalized, such that one LE is considered to occupy 1 area unit, and thus $ \AA _f=1$. We make the crude assumption that each LE provides the equivalent of 20 logic transistors in our custom design, so $ \AA _t=0.05$. Every FPGA chip is considered a ``wafer'' for the purpose of calculation, so $ \AA _w=51 840$. Since DRAM is located outside the FGPA chips, $ \AA _d=0$ but $ \mathcal{C}_d=4\cdot10^8$, assuming $320 per gigabyte of DRAM. $ \mathcal{T}_d$ and $ \mathcal{T}_p$ are set according to available bandwidth. For $ \mathcal{T}_l$ we assume that on average an LE switches at 700MHz. $ \AA _p=0$, but we need to verify that the derived $ \mathcal{N}_p$ is at most 706 (fortunately this holds for all our parameter choices). As can be seen from the tables, the FPGA-based devices are significantly less efficient than both the custom designs and properly parallelized PC-based implementation. Thus they appear unattractive.
next up previous
Next: The traditional approach to Up: Analysis of Bernstein's Factorization Previous: Bibliography