next up previous
Next: The circuits for integer Up: Analysis of Bernstein's Factorization Previous: Introduction


Background on the number field sieve

In theory and in practice the two main steps of the NFS are the relation collection step and the matrix step. We review their heuristic asymptotic runtime analysis because it enables us to stress several points that are important for a proper understanding of ``standard-NFS'' and of ``circuit-NFS'' as proposed in [1].

2.1. Smoothness.

An integer is called $ B$-smooth if all its prime factors are at most $ B$. Following [10, 3.16] we denote by $ L_x[r;\alpha]$ any function of $ x$ that equals

$\displaystyle e^{(\alpha+o(1))(\log x)^r(\log\log x)^{1-r}},  $for $\displaystyle x\to\infty,$

where $ \alpha$ and $ r$ are real numbers with $ 0\leq r\leq 1$ and logarithms are natural. Thus, $ L_x[r;\alpha]+L_x[r;\beta]=L_x[r;\max(\alpha,\beta)]$, $ L_x[r;\alpha]L_x[r;\beta]=L_x[r;\alpha+\beta]$, $ L_x[r;\alpha]L_x[s;\beta]=L_x[r;\alpha]$ if $ r<s$, $ L_x[r,\alpha]^k=L_x[r,k\alpha]$ and if $ \alpha>0$ then $ (\log x)^kL_x[r;\alpha]=L_x[r;\alpha]$ for any fixed $ k$, and $ \pi(L_x[r;\alpha])=L_x[r;\alpha]$ where $ \pi(y)$ is the number of primes $ \leq y$. Let $ \alpha>0$, $ \beta>0$, $ r$, and $ s$ be fixed real numbers with $ 0<s<r\leq1$. A random positive integer $ \leq L_x[r;\alpha]$ is $ L_x[s;\beta]$-smooth with probability

$\displaystyle L_x[r-s;-\alpha(r-s)/\beta],  $for $\displaystyle x\to\infty.$

We abbreviate $ L_n$ to $ L$ and $ L[1/3,\alpha]$ to $ L(\alpha)$. Thus, a random integer $ \leq L[2/3,\alpha]$ is $ L(\beta)$-smooth with probability $ L(-\alpha/(3\beta))$. The notation $ L^{1.901\cdots+o(1)}$ in [1] corresponds to $ L(1.901\cdots)$ here. We write `` $\zeta \mbox{\rlap{${}^{ o}$}$=$} x$'' for `` $ \zeta=x+o(1)$ for $ n\to\infty$.''

2.2. Ordinary NFS.

To factor $ n$ using the NFS, more or less following the approach from [11], one selects a positive integer

$\displaystyle d=\delta\left(\frac{\log n}{\log\log n}\right)^{1/3}$

for a positive value $ \delta$ that is yet to be determined, an integer $ m$ close to $ n^{1/(d+1)}$, a polynomial $ f(X)=\sum_{i=0}^df_iX^i\in{\bf Z}[X]$ such that $ f(m)\equiv0\bmod n$ with each $ f_i$ of the same order of magnitude as $ m$, a rational smoothness bound $ B_r$, and an algebraic smoothness bound $ B_a$. Other properties of these parameters are not relevant for our purposes. A pair $ (a,b)$ of integers is called a relation if $ a$ and $ b$ are coprime, $ b>0$, $ a-bm$ is $ B_r$-smooth, and $ b^df(a/b)$ is $ B_a$-smooth. Each relation corresponds to a sparse $ D$-dimensional bit vector with

$\displaystyle D\approx\pi(B_r)+\char93 \{(p,r):p  $prime$\displaystyle \leq
B_a, f(r)\equiv0\bmod p\}

(cf. [11]). In the relation collection step a set of more than $ D$ relations is sought. Given this set, one or more linear dependencies modulo $ 2$ among the corresponding $ D$-dimensional bit vectors are constructed in the matrix step. Per dependency there is a chance of at least 50% (exactly 50% for RSA moduli) that a factor of $ n$ is found in the final step, the square root step. We discuss some issues of the relation collection and matrix steps that are relevant for [1].

2.3. Relation collection.

We restrict the search for relations to the rectangle $ \vert a\vert<L(\alpha)$, $ 0<b<L(\alpha)$ and use $ B_r$ and $ B_a$ that are both $ L(\beta)$ (which does not imply that $ B_r=B_a$), for $ \alpha,\beta>0$ that are yet to be determined. It follows (cf. 2.1) that $ D=L(\beta)$. Furthermore,

$\displaystyle \vert a-bm\vert=L[2/3,1/\delta]    $and$\displaystyle     \vert b^df(a/b)\vert=L[2/3,\alpha\delta+1/\delta].$

With 2.1 and under the usual assumption that $ a-bm$ and $ b^df(a/b)$ behave, with respect to smoothness probabilities, independently as random integers of comparable sizes, the probability that both are $ L(\beta)$-smooth is

$\displaystyle L\left(\frac{-1/\delta}{3\beta}\right)\cdot L\left(\frac{-\alpha\...

The search space contains $ 2L(\alpha)^2=2L(2\alpha)=L(2\alpha)$ pairs $ (a,b)$ and, due to the $ o(1)$, as many pairs $ (a,b)$ with $ \gcd(a,b)=1$. It follows that $ \alpha$ and $ \beta$ must be chosen such that

$\displaystyle L(2\alpha)\cdot
L\left(-\frac{\alpha\delta+2/\delta}{3\beta}\right)=L(\beta)  (=D).$

We find that

$\displaystyle \alpha $$\displaystyle \mbox{\rlap{${}^{ o}$}$=$}$$\displaystyle  \frac{3\beta^2+2/\delta}{6\beta-\delta}.$ (1)

2.4. Testing for smoothness.

The $ (a,b)$ search space can be processed in $ L(2\alpha)$ operations. If sufficient memory is available this can be done using sieving. Current PC implementations intended for the factorization of relatively small numbers usually have adequate memory for sieving. For much larger numbers and current programs, sieving would become problematic. In that case, the search space can be processed in the ``same'' $ L(2\alpha)$ operations (with an, admittedly, larger $ o(1)$) but at a cost of only $ L(0)$ memory using the Elliptic Curve Method (ECM) embellished in any way one sees fit with trial division, Pollard rho, early aborts, etc., and run on any number $ K$ of processors in parallel to achieve a $ K$-fold speedup. This was observed many times (see for instance [10, 4.15] and [4]). Thus, despite the fact that current implementations of the relation collection require substantial memory, it is well known that asymptotically this step requires negligible memory without incurring, in theory, a runtime penalty - in practice, however, it is substantially slower than sieving. Intermediate solutions that exchange sieving memory for many tightly coupled processors with small memories could prove valuable too; see [6] for an early example of this approach and [1] for various other interesting proposals that may turn out to be practically relevant. For the asymptotic argument, ECM suffices. In improved NFS from [4] it was necessary to use a ``memory-free'' method when searching for $ B_a$-smooth numbers (cf. 2.2), in order to achieve the speedup. It was suggested in [4] that the ECM may be used for this purpose. Since memory usage was no concern for the analysis in [4], regular ``memory-wasteful'' sieving was suggested to test $ B_r$-smoothness.

2.5. The matrix step.

The choices made in 2.3 result in a bit matrix $ A$ consisting of $ D=L(\beta)$ columns such that each column of $ A$ contains only $ L(0)$ nonzero entries. Denoting by $ w(A)$ the total number of nonzero entries of $ A$ (its weight), it follows that $ w(A)=L(\beta)\cdot L(0)=L(\beta)$. Using a variety of techniques [5,13], dependencies can be found after, essentially, $ O(D)$ multiplications of $ A$ times a bit vector. Since one matrix-by-vector multiplication can be done in $ O(w(A))=L(\beta)$ operations, the matrix step can be completed in $ L(\beta)^2=L(2\beta)$ operations. We use ``standard-NFS'' to refer to NFS that uses a matrix step with $ L(2\beta)$ operation count. We will be concerned with a specific method for finding the dependencies in $ A$, namely the block Wiedemann algorithm [5][18] whose outline is as follows. Let $ K$ be the blocking factor, i.e., the amount of parallelism desired. We may assume that either $ K=1$ or $ K>32$. Choose $ 2K$ binary $ D$-dimensional vectors $ \vec{v}_i,\vec{u}_j$ for $ 1 \leq i,j \leq K$. For each $ i$, compute the vectors $ A^k\vec{v}_i$ for $ k$ up to roughly $ 2D/K$, using repeated matrix-by-vector multiplication. For each such vector $ A^k\vec{v}_i$, compute the inner products $ \vec{u}_jA^k\vec{v}_i$, for all $ j$. Only these inner products are saved, to conserve storage. From the inner products, compute certain polynomials $ f_l(x)$, $ l=1,\ldots,K$ of degree about $ D/K$. Then evaluate $ f_l(A)\vec{v}_i$, for all $ l$ and $ i$ (take one $ \vec{v}_i$ at a time and evaluate $ f_l(A)\vec{v}_i$ for all $ l$ simultaneously using repeated matrix-by-vector multiplications). From the result, $ K$ elements from the kernel of $ A$ can be computed. The procedure is probabilistic, but succeeds with high probability for $ K \gg 1$ [17]. For $ K=1$, the cost roughly doubles [18]. For reasonable blocking factors ($ K=1$ or $ 32\leq K \ll \sqrt{D}$), the block Wiedemann algorithm involves about $ 3D$ matrix-by-vector multiplications. These multiplications dominate the cost of the matrix step; accordingly, the circuits of [1], and our variants thereof, aim to reduce their cost. Note that the multiplications are performed in $ 2K$ separate chains where each chain involves repeated left-multiplication by $ A$. The proposed circuits rely on this for their efficiency. Thus, they appear less suitable for other dependency-finding algorithms, such as block Lanczos [13] which requires just $ 2D$ multiplications.

2.6. NFS parameter optimization for matrix exponent  $ 2\epsilon >1$.

With the relation collection and matrix steps in $ L(2\alpha)$ and $ L(2\beta)$ operations, respectively, the values for $ \alpha$, $ \beta$, and $ \delta$ that minimize the overall NFS operation count follow using Relation (1). However, we also need the optimal values if the ``cost'' of the matrix step is different from  $ L(\beta)^2$: in [1] ``cost'' is defined using a metric that is not always the same as operation count, so we need to analyse the NFS using alternative cost metrics. This can be done by allowing flexibility in the ``cost'' of the matrix step: we consider how to optimize the NFS parameters for an $ L(\beta)^{2\epsilon}$ matrix step, for some exponent $ \epsilon>1/2$. The corresponding relation collection operation count is fixed at $ L(2\alpha)$ (cf. 2.4). We balance the cost of the relation collection and matrix steps by taking $\alpha \mbox{\rlap{${}^{ o}$}$=$} \epsilon\beta$. With (1) it follows that

\begin{displaymath}3(2\epsilon-1)\beta^2-\epsilon\beta\delta-2/\delta \mbox{\rl...

Minimizing $ \beta$ given $ \epsilon$ leads to

$\displaystyle \delta $$\displaystyle \mbox{\rlap{${}^{ o}$}$=$}$$\displaystyle  \sqrt[3]{3(2\epsilon-1)/\epsilon^2}$ (2)


$\displaystyle \beta $$\displaystyle \mbox{\rlap{${}^{ o}$}$=$}$$\displaystyle  2\sqrt[3]{\epsilon/(3(2\epsilon-1))^2}.$ (3)

Minimizing the resulting

$\displaystyle \alpha $$\displaystyle \mbox{\rlap{${}^{ o}$}$=$}$$\displaystyle  2\epsilon\sqrt[3]{\epsilon/(3(2\epsilon-1))^2}$ (4)

leads to $ \epsilon=1$ and $\alpha \mbox{\rlap{${}^{ o}$}$=$} 2/3^{2/3}$: even though $ \epsilon<1$ would allow more ``relaxed'' relations (i.e., larger smoothness bounds and thus easier to find), the fact that more of such relations have to be found becomes counterproductive. It follows that an operation count of $ L(4/3^{2/3})$ is optimal for relation collection, but that for $ 2\epsilon>2$ it is better to use suboptimal relation collection because otherwise the matrix step would dominate. We find the following optimal NFS parameters:
$ 1<2\epsilon\leq2$:
$ $
$\delta \mbox{\rlap{${}^{ o}$}$=$} 3^{1/3}$, $\alpha \mbox{\rlap{${}^{ o}$}$=$} 2/3^{2/3}$, and $\beta \mbox{\rlap{${}^{ o}$}$=$} 2/3^{2/3}$, with operation counts of relation collection and matrix steps equal to $ L(4/3^{2/3})$ and $ L(4\epsilon/3^{2/3})$, respectively. For $ \epsilon=1$ the operation counts of the two steps are the same (when expressed in $ L$) and the overall operation count is $ L(4/3^{2/3})=L((64/9)^{1/3})=L(1.9229994\cdots)$. This corresponds to the heuristic asymptotic runtime of the NFS as given in [11]. We refer to these parameter choices as the ordinary parameter choices.
$ 2\epsilon>2$:
$ $
$ \delta$, $ \alpha$, and $ \beta$ as given by Relations (2), (4), and (3), respectively, with operation count $ L(2\alpha)$ for relation collection and cost $ L(2\epsilon\beta)$ for the matrix step, where $ L(2\alpha)=L(2\epsilon\beta)$. More in particular, we find the following values.
$ 2\epsilon=5/2$:
$ $
$\delta \mbox{\rlap{${}^{ o}$}$=$} (5/3)^{1/3}(6/5)$, $\alpha \mbox{\rlap{${}^{ o}$}$=$} (5/3)^{1/3}(5/6)$, and $\beta \mbox{\rlap{${}^{ o}$}$=$} (5/3)^{1/3}(2/3)$, for an operation count and cost $ L((5/3)^{4/3})=L(1.9760518\cdots)$ for the relation collection and matrix steps, respectively. These values are familiar from [1, Section 6: Circuits]. With $(1.9229994\cdots/1.9760518\cdots+o(1))^3 \mbox{\rlap{${}^{ o}$}$=$} 0.9216$ and equating operation count and cost, this suggests that factoring $ 0.9216\cdot512\approx 472$-bit composites using NFS with matrix exponent $ 5/2$ is comparable to factoring 512-bit ones using standard-NFS with ordinary parameter choices (disregarding the effects of the $ o(1)$'s).
$ 2\epsilon=3$:
$ $
$\delta \mbox{\rlap{${}^{ o}$}$=$} 2/3^{1/3}$, $\alpha \mbox{\rlap{${}^{ o}$}$=$} 3^{2/3}/2$, and $\beta \mbox{\rlap{${}^{ o}$}$=$} 3^{-1/3}$, for an operation count and cost of $ L(3^{2/3})$ $ =L(2.0800838\cdots)$ for the relation collection and matrix steps, respectively.

2.7. Improved NFS.

It was shown in [4] that ordinary NFS from [11], and as used in 2.2, can be improved by using more than a single polynomial $ f$. Let $ \alpha$ and $ \delta$ be as in 2.3 and 2.2, respectively, let $ \beta$ indicate the rational smoothness bound $ B_r$ (i.e., $ B_r=L(\beta)$), and let $ \gamma$ indicate the algebraic smoothness bound $ B_a$ (i.e., $ B_a=L(\gamma)$). Let $ G$ be a set of $ B_r/B_a=L(\beta-\gamma)$ different polynomials, each of degree $ d$ and common root $ m$ modulo $ n$ (as in 2.2). A pair $ (a,b)$ of integers is a relation if $ a$ and $ b$ are coprime, $ b>0$, $ a-bm$ is $ B_r$-smooth, and $ b^dg(a/b)$ is $ B_a$-smooth for at least one $ g\in G$. Let $ \epsilon$ be the matrix exponent. Balancing the cost of the relation collection and matrix steps it follows that $\alpha \mbox{\rlap{${}^{ o}$}$=$} \epsilon\beta$. Optimization leads to

\begin{displaymath}\gamma \mbox{\rlap{${}^{ o}$}$=$} \left(\frac{\epsilon^2+5...

and for this $ \gamma$ to

\begin{displaymath}\alpha \mbox{\rlap{${}^{ o}$}$=$} \frac{9\gamma^3+1+\sqrt{...
...,     \beta \mbox{\rlap{${}^{ o}$}$=$} \alpha/\epsilon,\end{displaymath}


\begin{displaymath}\delta \mbox{\rlap{${}^{ o}$}$=$} \frac{3\gamma(-4\epsilon-1+\sqrt{18\gamma^3(2\epsilon+1)+1})}{9\gamma^3-4\epsilon}.\end{displaymath}

It follows that for $ 2\epsilon=2$ the method from [4] gives an improvement over the ordinary method, namely $ L(1.9018836\cdots)$. The condition $ \beta\geq\gamma$ leads to $ 2\epsilon\leq7/3$, so that for $ 2\epsilon>7/3$ (as in circuit-NFS, cf. 3.1) usage of the method from [4] no longer leads to an improvement over the ordinary method. This explains why in [1] the method from [4] is used to select parameters for standard-NFS and why the ordinary method is used for circuit-NFS. With 2.1 it follows that the sum of the (rational) sieving and ECM-based (algebraic) smoothness times from [4] (cf. last paragraph of 2.4) is minimized if $ \beta=\gamma+1/(3\beta\delta)$. The above formulas then lead to $ 2\epsilon=(3+\sqrt{17})/4=1.7807764\cdots$. Therefore, unlike the ordinary parameter selection method, optimal relation collection for the improved method from [4] occurs for an $ \epsilon$ with $ 2\epsilon<2$: with $ \epsilon=0.8903882\cdots$ the operation count for relation collection becomes $ L(1.8689328\cdots)$. Thus, in principle, and depending on the cost function one is using, the improved method would be able to take advantage of a matrix step with exponent  $ 2\epsilon<2$. If we disregard the matrix step and minimize the operation count of relation collection, this method yields a cost of $ L(1.8689328\cdots)$.
next up previous
Next: The circuits for integer Up: Analysis of Bernstein's Factorization Previous: Introduction