Ring-LWE based homomorphic encryption computations in large depth use a combination of two techniques: 1) decomposition of big numbers into small limbs/digits, and 2) efficient cyclotomic multiplications modulo XN+1. It was long believed that the two mechanisms had to be strongly related, like in the full-RNS setting that uses a CRT decomposition of big numbers over an NTT-friendly family of prime numbers, and NTT over the same primes for multiplications. However, in this setting NTT was the bottleneck of all large-depth FHE computations. A breakthrough result from Crypto'2023 by Kim et al. managed to overcome this limitation by introducing a second gadget decomposition and by showing that it indeed shifts the bottleneck and renders the cost of NTT computations negligible compared to the rest of the computation. In this paper, we extend this result (far) beyond the Full-RNS settings and show that we can completely decouple the big number decomposition from the cyclotomic arithmetic aspects. As a result, we get modulus switching/rescaling for free, and the memory footprint for storing relinearization keys across different levels is considerably lower compared to the CRT-based counterparts, by typically a factor ℓ/3 where ℓ is the deepest level of multiplication depth supported. We verify both in theory and in practice that the performance of key-switching, external and internal products and automorphisms using our representation are similar or faster than the one achieved by Kim et al. Crypto'2023 paper, and we discuss the high impact of these results for people who work on low-level or hardware optimizations as well as the benefits of the new parametrizations for people currently working on compilers for FHE.
We even manage to lower the running time of the gate bootstrapping of TFHE by eliminating 12.5% of its FFTs.