CTIDH: Faster constant-time CSIDH

Prerequisites: Intel or AMD CPU with adcx/adox: i.e., Broadwell, Skylake, or newer. Linux with standard development tools plus clang plus valgrind.

To download and unpack the latest version:

    wget -m https://ctidh.isogeny.org/high-ctidh-latest-version.txt
    version=$(cat ctidh.isogeny.org/high-ctidh-latest-version.txt)
    wget -m https://ctidh.isogeny.org/high-ctidh-$version.tar.gz
    tar -xzf ctidh.isogeny.org/high-ctidh-$version.tar.gz
    cd high-ctidh-$version

To compile, test for functionality, tune for multiplications, and tune for cycles, for all selected sizes (511, 512, 1024, 2048):

    make

This takes a while because of all the testing and tuning. Any test failure will stop the build process. You can separately run

    make generic # size-independent tests, 8.6 minutes
    make 511 # size 511, 1.5 minutes
    make 512 # size 512, 1.7 minutes
    make 1024 # size 1024, 25 minutes
    make 2048 # size 2048, 549 minutes

where the timings shown here are on a 3GHz Skylake core.

(Tuning for multiplications is machine-independent and can be precomputed. Tuning for cycles can be precomputed per microarchitecture. One can carry out both precomputations more efficiently by starting with measurements of tree1, multiprod2, multiprod2_selfreciprocal, multieval_precompute, and multieval_postcompute; the Python scripts include a preliminary implementation of this for the multiplication tuning, currently used only as a double-check.)

The functionality testing included in "make" does not include a constant-time test. To run a constant-time test for all selected sizes:

    make timecop # 25 minutes

For benchmarks regarding, e.g., size-511 code tuned for multiplications:

    ./bench511mults 16383 > bench511mults.out.16383

This runs a million experiments: more precisely, 16383 experiments for each of 65 keys. This takes hours, and generates hundreds of megabytes of data. Each measurement includes, for validation and separately for the action, a "mulsq" count that includes both multiplications and squarings, a "sq" count that includes only squarings, an "addsub" count that includes additions and subtractions, and a cycle count (which for multiplication-tuned code isn't far behind cycle-tuned code). The action also shows "stattried" counts showing the number of times each batch occurred publicly in an atomic block.

To analyze average costs and standard deviations:

    ./analyze-costs < bench511mults.out.16383 \
    > bench511mults.out.16383.analyze-costs

Statistics are printed for each of the 65 keys separately, and ("total") for the all of the experiments together.

To analyze whether the "stattried" counts are as expected:

    ./analyze-pr < bench511mults.out.16383 \
    > bench511mults.out.16383.analyze-pr

This prints, for each batch, 1−1/p times the number of times the batch was tried divided by the batch bound, where p is the smallest prime in the batch.

For various size-511 microbenchmarks:

    ./umults511
    ./ubench511

To select other CSIDH sizes and other CTIDH parameters (subject to various undocumented restrictions), edit the table at the top of autogen and run "./autogen; make".

Changelog (reverse chronological order)

Version 20210523: browse high-ctidh-20210523.tar.gz

Add support for the same 2048-bit prime as in SQALE, with 2^220 keys as in SQALE. Use CTIDH batch sizes (9,10,8,8,7,10,12,11,10,15,10,9,8,6,10,13,10,9,12,13,10,10,10,1) with batch bounds (1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,0,2,0). Calculated average number of multiplications is 366213, compared to 646000 reported by the SQALE software.

For CSIDH-1024 with 2^256 keys, switch CTIDH parameters from batch sizes (5,8,7,8,8,7,8,7,7,11,9,10,7,9,6,12,1) and batch bounds (3,6,7,7,7,7,7,7,7,7,7,7,6,7,5,7,0) to batch sizes (2,3,5,4,6,6,6,6,6,7,7,7,6,7,7,5,6,5,10,3,10,5,1) and batch bounds (2,4,5,5,6,6,6,6,6,6,6,6,6,6,6,5,5,3,6,2,6,2,0). Improves calculated average number of multiplications from 385424 to 375673.

Support batches with bound 0 in analyze-pr.

Rescale acceptance probabilities for sparse batches to speed up private-key generation while preserving uniformity. Add frequency tests to testrandom for many more choices of batch sizes and batch bounds.

Add measurements of private-key generation to bench.

Add fp_sq1_rep, and use it to compress exponentiation code.

Version 20210504: browse high-ctidh-20210504.tar.gz

Original release.


Version: This is version 2021.05.26 of the "Software" web page.