Prerequisites: Intel or AMD CPU with `adcx`

/`adox`

: i.e., Broadwell,
Skylake, or newer. Linux with standard development tools plus `clang`

plus `valgrind`

.

To download and unpack the latest version:

```
wget -m https://ctidh.isogeny.org/high-ctidh-latest-version.txt
version=$(cat ctidh.isogeny.org/high-ctidh-latest-version.txt)
wget -m https://ctidh.isogeny.org/high-ctidh-$version.tar.gz
tar -xzf ctidh.isogeny.org/high-ctidh-$version.tar.gz
cd high-ctidh-$version
```

To compile, test for functionality, tune for multiplications, and tune for cycles, for all selected sizes (511, 512, 1024, 2048):

```
make
```

This takes a while because of all the testing and tuning. Any test failure will stop the build process. You can separately run

```
make generic # size-independent tests, 8.6 minutes
make 511 # size 511, 1.5 minutes
make 512 # size 512, 1.7 minutes
make 1024 # size 1024, 25 minutes
make 2048 # size 2048, 549 minutes
```

where the timings shown here are on a 3GHz Skylake core.

(Tuning for multiplications is machine-independent and can be
precomputed. Tuning for cycles can be precomputed per microarchitecture.
One can carry out both precomputations more efficiently by starting with
measurements of `tree1`

, `multiprod2`

, `multiprod2_selfreciprocal`

,
`multieval_precompute`

, and `multieval_postcompute`

; the Python scripts
include a preliminary implementation of this for the multiplication
tuning, currently used only as a double-check.)

The functionality testing included in "`make`

" does not include a
constant-time test. To run a constant-time test for all selected sizes:

```
make timecop # 25 minutes
```

For benchmarks regarding, e.g., size-511 code tuned for multiplications:

```
./bench511mults 16383 > bench511mults.out.16383
```

This runs a million experiments: more precisely, 16383 experiments for
each of 65 keys. This takes hours, and generates hundreds of megabytes
of data. Each measurement includes, for validation and separately for
the action, a "`mulsq`

" count that includes both multiplications and
squarings, a "`sq`

" count that includes only squarings, an "`addsub`

"
count that includes additions and subtractions, and a cycle count (which
for multiplication-tuned code isn't far behind cycle-tuned code). The
action also shows "`stattried`

" counts showing the number of times each
batch occurred publicly in an atomic block.

To analyze average costs and standard deviations:

```
./analyze-costs < bench511mults.out.16383 \
> bench511mults.out.16383.analyze-costs
```

Statistics are printed for each of the 65 keys separately, and
("`total`

") for the all of the experiments together.

To analyze whether the "`stattried`

" counts are as expected:

```
./analyze-pr < bench511mults.out.16383 \
> bench511mults.out.16383.analyze-pr
```

This prints, for each batch, 1−1/p times the number of times the batch was tried divided by the batch bound, where p is the smallest prime in the batch.

For various size-511 microbenchmarks:

```
./umults511
./ubench511
```

To select other CSIDH sizes and other CTIDH parameters (subject to
various undocumented restrictions), edit the table at the top of
`autogen`

and run "`./autogen; make`

".

## Changelog (reverse chronological order)

### Version 20210523: browse `high-ctidh-20210523.tar.gz`

Add support for the same 2048-bit prime as in SQALE, with 2^220 keys as in SQALE. Use CTIDH batch sizes (9,10,8,8,7,10,12,11,10,15,10,9,8,6,10,13,10,9,12,13,10,10,10,1) with batch bounds (1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,0,2,0). Calculated average number of multiplications is 366213, compared to 646000 reported by the SQALE software.

For CSIDH-1024 with 2^256 keys, switch CTIDH parameters from batch sizes (5,8,7,8,8,7,8,7,7,11,9,10,7,9,6,12,1) and batch bounds (3,6,7,7,7,7,7,7,7,7,7,7,6,7,5,7,0) to batch sizes (2,3,5,4,6,6,6,6,6,7,7,7,6,7,7,5,6,5,10,3,10,5,1) and batch bounds (2,4,5,5,6,6,6,6,6,6,6,6,6,6,6,5,5,3,6,2,6,2,0). Improves calculated average number of multiplications from 385424 to 375673.

Support batches with bound 0 in `analyze-pr`

.

Rescale acceptance probabilities for sparse batches to speed up private-key generation
while preserving uniformity.
Add frequency tests to `testrandom`

for many more choices of batch sizes and batch bounds.

Add measurements of private-key generation to `bench`

.

Add `fp_sq1_rep`

, and use it to compress exponentiation code.

### Version 20210504: browse `high-ctidh-20210504.tar.gz`

Original release.

**Version:**This is version 2021.05.26 of the "Software" web page.