CGPE: Code Generation for Polynomial Evaluation

Online documentation

Return to CGPE's homepage

Installation

In order to compile CGPE, you will need the following packages installed: You will also need Gappa (>= 0.16.1) installed.

To compile, use the makefile by invoking it using the following commands:

$ tar xvfz cgpe-x.x.x.tar.gz
$ cd cgpe-x.x.x
$ ./configure
$ make

See ./configure --help for details. You can then use CGPE by typing

$ src/cgpe

or install it with:

$ make install

Command-line interface

CGPE is a command-line tool. Initially it aimed at generating fast and certified codes for evaluating univariate or bivariate polynomials of given degrees:

a(x,y) = a[0,0] + a[1,0] × x + ... + a[0,1] × y + a[1,1] × x × y + a[2,1] × x2 × y + ... + a[dx,dy] × xdx × y dy

in fixed-point arithmetic. So far it has been extended to handle sum and dot-product expressions of given size. You may even give your own expression as input to CGPE: in that case, no best parenthesizations (according a given criteria) is computed, CGPE is only used for backend purpose.

The general syntax is

$ cgpe <options>

where <options> is a list among the following.

First simple example

For example, the following command generates all the parenthesizations of lowest latency for evaluating a given bivariate polynomial a(x,y) of degree [1,1]

a(x,y) = a[0,0] + a[1,0]*x + a[0,1]*y + a[1,1] × x × y,

assuming the variable y is known 2 cycles later than x, with an addition having a latency of 2 cycles and a multiplication having a latency of 3 cycles. Consider for this example that the addition by a[0,0] costs only 1 cycle and the multiplication by a[1,1] costs only 1 cycle. Finally, all the parenthesizations found are printed. The command-line and the results are shown below:
  $ cgpe --degree="[1,1]" --latency=lowest --delay=2 --add=2 --mul=3 --operators="[1222:3331]" --print
  ----------------------------------------------------------------------------------------------------
   CGPE (Code Generation for Polynomial Evaluation)
  ----------------------------------------------------------------------------------------------------
     -> polynomial degree : 2 [1,1]
     -> coefficients      : a0 a1 a2 a3 
     -> variables         : x y 
     -> expression        : a0 + a1*x + a2*y + a3*x*y
     -> lowest latency
     -> algorithm         : bivariate
  ----------------------------------------------------------------------------------------------------
   Start    : Mon Jun  3 12:12:34 2013
  ----------------------------------------------------------------------------------------------------
   Computation of pows of x and y
     -> duration:   82us
     -> number of trees: 1
   Determination of target latency 
     -> target latency: 6
     -> duration:        12us
   Generation of evaluation tree(s) 
     -> first target latency: 6
     -> new target latency: 7
     -> new target latency: 8
   Schemes computed
     --------------------
     ((a0+(x*a1))+(y*(a2+(x*a3))))	 -latency=8 (4,6)  (3 mul)
     --------------------
     -> minimal latency: 8
     -> total number of computed evaluation tree(s): 1
  ----------------------------------------------------------------------------------------------------
   End      : Wed Jun  5 01:59:41 2013
   Total duration :    1ms 150us
  ----------------------------------------------------------------------------------------------------
  
Here, CGPE find one parenthesization only, having a latency of 8 cycles and using only 3 multiplications. The left part of the parenthesization has a latency of 4 cycles, while the right part has a latency of 6 cycles, and the splitting is done before the coefficient of degree [0,1]. Finally, the parenthesization uses only 3 different multiplications.

XML input file format

The input coefficients and variables are defined in an external XML file, and passed to CGPE through using the --xml-input=<file> option. The surrounded tag of this file must be either <polynomial>, <power>, <sum>, <dotproduct>, or <expression>. In this documentation, we consider that this surrounded tag is <polynomial>, but the same holds for the others. These tags may contains the following attributes: In CGPE, coefficients and variables are represented by interval in fixed-point arithmetic. (By convention, coefficients are point intervals.) These must appear in the XML file in the same order as they appear in the expression to be evaluated: for example, for the case of polynomial, coefficients appear in the file in the increasing degree order. In addition to this information, the file may contain the <error> tag indicating the error bound allowed on the evaluation of the problem in fixed-point arithmetic, with the following attribute: Remark that the error values are defined in the XML file using the format XbP, where X and P are two integers such that the error equals X * 2P.
  <polynomial> 
    <coefficient name="a0" value="0x00000020" sign="0" integer_width="2" fraction_="30"/>
    <!-- a0 = 0x20 * 2^(-30) -->
    ...
    <variable name="x" inf="0x00000000" sup="0xfffffe00" sign="0" integer_width="0" fraction_width="32"/>
    <!-- x \in { X * 2^(-32), X \in [0x0,0xfffffe00] } -->

    <error value="844352520372336487460155217459680136667560679525219409191017b-225" 
           type="absolute" strict="false"/>
 </polynomial>
  

Second example

Let us now consider the degree-6 univariate polynomial a(x) that approximates the function log2(1+x) over the interval [0.5,1-2-23] with an approximation error less than θ ≈ 2-27.72. To ensure correct rounding, we need to evaluate this polynomial with an evaluation error strictly less than η ≈ 2-25.23. The content of the XML input file is shown below:
  <polynomial>
    <coefficient value="0x0017e4d8" sign="0" integer_width="1" fraction_width="31"/>
    <coefficient value="0xb7a94e71" sign="0" integer_width="1" fraction_width="31"/>
    <coefficient value="0x5799ac78" sign="1" integer_width="1" fraction_width="31"/>
    <coefficient value="0x3097d4d9" sign="0" integer_width="1" fraction_width="31"/>
    <coefficient value="0x16f06e1d" sign="1" integer_width="1" fraction_width="31"/>
    <coefficient value="0x074d13b3" sign="0" integer_width="1" fraction_width="31"/>
    <coefficient value="0x011c0129" sign="1" integer_width="1" fraction_width="31"/>
    <variable inf="0x00000000" sup="0xfffffe00" sign="0" integer_width="0" fraction_width="32"/>
    <error value="2726472240709203592273463737322727060915725686814820528338649b-226" strict="true"/>
  </polynomial>
  
Finally, CGPE generates in about 1 second a C code evaluating the polynomial a(x) and Gappa certificate ensuring that the evaluation of the C code entails an evaluation error strictly less than the required bound &eta ≈ 2-25.23.
  $ cgpe --degree="[6,0]" --latency=lowest --register=32 --precision=24 --output 
         --optimized-search="[l,2]" --support=5 --max-kept=50 --schedule="[4,2]" 
         --xml-input=log2.xml --gappa-certificate --mpfi-error-bound  --mul-name=mul 
	 --name="[a,T,S]" --without-cst
  ----------------------------------------------------------------------------------------------------
   CGPE (Code Generation for Polynomial Evaluation)
  ----------------------------------------------------------------------------------------------------
     -> polynomial degree : 6 [6,0]
     -> coefficients      : a0 a1 a2 a3 a4 a5 a6 
     -> variables         : T 
     -> expression        : a0 + a1*T + a2*T^2 + a3*T^3 + a4*T^4 + a5*T^5 + a6*T^6
     -> lowest latency
     -> recursion level   : 2
     -> keep a maximum of 50 schemes at each step
     -> maximum support   : 5
     -> algorithm         : univariate
  ----------------------------------------------------------------------------------------------------
   Start    : Wed Jun  5 01:53:42 2013
  ----------------------------------------------------------------------------------------------------
   Computation of pows of T
     -> duration:  185us
     -> number of trees: 8
   Determination of target latency 
     -> target latency: 10
     -> duration:        16us
   Generation of evaluation tree(s) 
     -> first target latency: 10
     -> new target latency: 11
     -> duration:   11ms 432us
   Heuristic filters
     -> duration of numerical filter:    6ms 654us
        50 tree(s) have been tested
        2 tree(s) passed this filter
     -> duration of scheduling filter:    1ms 137us
        2 tree(s) have been tested
        1 tree(s) passed this filter
     -> duration of compiler pass: 
        1 tree(s) have been tested
        1 tree(s) passed this filter
     -> duration of Gappa filter:  465ms 475us
        1 tree(s) have been tested
        1 tree(s) passed this filter
   Code synthesis
     -> type of synthesized codes: unsigned C, Gappa
   Note: 1 schemes have been found ...
         ... and 2 files have been written in Wed_Jun__5_01h53m42_2013/
  ----------------------------------------------------------------------------------------------------
   End      : Wed Jun  5 01:53:42 2013
   Total duration :  489ms 698us
  ----------------------------------------------------------------------------------------------------
  
Valid HTML 4.01 Transitional Valid CSS 2.1
Guillaume Revy
Last updated: Fri 12 Jul 2013 16:18:14 CEST.