Oblivious Message Detection / Retrieval for Taiga Notes – Discussion

This forum thread is to consolidate and continue our discussion on the Oblivious Message Detection / Retrieval issue.

Some of the previous places this discussion took place:

In short, the primary topic of the discussion is whether the OMD/R model presented in the paper is performant enough to be considered for integration with Taiga.

I did performance testing of the original model codebase (the detection part) some time ago in a single threaded mode and got the following results. With ell set to 4, the original C/C++ implementation reaches the speed of 12 ms/message using SEAL+HEXL on an Intel CPU (M6i instance in AWS) and 13ms/message on an M1 Max using SEAL without HEXL. This part could be done by the server side, then the client side only needs to run a very limited computation locally. For 500k messages, this adds up to ~100 minutes (single threaded).

J tried to reimplement the code in rust (GitHub - Janmajayamall/ObliviousMessageRetrieval) and managed to achieve impressive performance after several iterations, but it is still lower then the C++ implementation. For consistency, I will additionally benchmark latest code by J in the same environments and post the results here.

To have a baseline, I also implemented a little benchmark of the trial decryption approach using OpenSSL implementation of ECC Diffie-Hellman key exchange. I wrote it in plain C, but this should be easy to port to rust.

We setup an environment with N users (1000 peers + 1 recipient in the benchmark), each has a pair of EC keys. When peer A sends a note to peer B, peer A uses DH key exchange to generate a shared secret from A’s private key and B’s public key. The shared secret is then used to generate AES-128 parameters, which are used to encrypt a clue, which is 16 ubytes 0x01 . An encrypted clue also takes 16 bytes.

Recipient then scans the entire set of clues (in the benchmark, 500k clues). For every clue, recipient gets a sender’s public key, uses it to recover the shared secret, instantiates AES decryptor and attempts to decrypt the clue. Recipient then checks if the decrypted clue is 16 0x01 ubytes. For a pertinent message it will always be so, for a non-pertinent message it may by chance be and then this is a false positive (which is ok).

For 1001 peers and 500k messages, out of which 100 are pertinent, these are the benchmark result for some of the shortlisted curves (which were faster than others). On average, for 500k messages we see ~6 false positives.

    CURVE   | PubKey Length | Total B/clue | Total Bytes | Detection time | Detection/message
 secp224r1  |   29 bytes    |   45 bytes   |  22.5 MiB   |   30'654 ms    |    0.0613 ms
 sect113r1  |   16 bytes    |   32 bytes   |  16.0 MiB   |   43'876 ms    |    0.0878 ms
 sect113r2  |   16 bytes    |   32 bytes   |  16.0 MiB   |   43'759 ms    |    0.0875 ms
 prime256v1 |   33 bytes    |   49 bytes   |  24.5 MiB   |   25'742 ms    |    0.0515 ms

This is measured with a C implementation using OpenSSL library and we see 0.05–0.09 ms/clue, which is down from 3-4 ms/clue in my first crude benchmark calling OpenSSL binary from a bash script :slight_smile: In this context “per clue” means the same as “per message”. The numbers are measured on an Intel CPU (3rd Gen Intel Xeon Ice Lake with AVX-512) in a single thread mode. This should parallelize linearly. The whole computation takes place on a local device (laptop, phone, etc), there’s is no work done by any service, eg a validator.

Here are the raw results: openssl-omd/results.txt at b9871a51d113aa9a84f648f31d4f057ae24fc240 · bazzilic/openssl-omd · GitHub
Here’s the code: openssl-omd/ecdh-openssl.c at b9871a51d113aa9a84f648f31d4f057ae24fc240 · bazzilic/openssl-omd · GitHub

I am re-working this benchmark a little bit and will post new results later on.

Here are results from a slightly reworked benchmarks for AES-based trial decryption message detection. Each curve was tested through 150 iterations, the A ± B timing format in the table indicates mean (A) and standard deviation (B). MB in this table means 10⁶ bytes. MiB in the top post of this thread should actually also be MB.

                                                  |              INTEL              |                M1               |
    CURVE   | PubKey Lgth | Total B/clu | Total B | Detection time  | Detection/msg | Detection time  | Detection/msg |
 secp224r1  |  29 bytes   |  45 bytes   | 22.5 MB | 30'562 ± 24  ms |   0.0611 ms   | 25'064 ± 316 ms |   0.0501 ms   |
 sect113r1  |  16 bytes   |  32 bytes   | 16.0 MB | 43'011 ± 287 ms |   0.086  ms   | 34'639 ± 436 ms |   0.0692 ms   |
 sect113r2  |  16 bytes   |  32 bytes   | 16.0 MB | 43'051 ± 312 ms |   0.0861 ms   | 34'579 ± 422 ms |   0.0691 ms   |
 prime256v1 |  33 bytes   |  49 bytes   | 24.5 MB | 25'708 ± 17  ms |   0.0514 ms   | 20'715 ± 351 ms |   0.0414 ms   |

AES is used slightly incorrectly here, as the shared secret is used to generate both the AES key and the IV. To fix that, either the IV should be generated independently of the shared secret and stored together with the clue (will increase the payload by 16 more bytes) or, better, sender should generate a fresh set of AES keys for every message they send: this way, IV can be safely derived from the shared secret since it will only be used once.

To summarize, detecting all pertinent messages among 500k messages in total takes (depending on the curve) 25–43 sec on a single core of a modern Intel CPU, and 20–34 sec on a single core of M1 Max CPU and requires the user to download 16–24.5 MB of data. The method yields an average of 7 false positives for 500k messages.

For reference, Ethereum processes ~1M txns per day; ZCash is at 4–5k txns/day; NEAR: ~450k txns/day; Solana: ~150k txns/day.

Code used for this test: openssl-omd/ecdh-openssl.c at 4b49ec7bce85a4e3e18a80f40ff1b3ec7e311054 · bazzilic/openssl-omd · GitHub
Detailed results: openssl-omd/results.md at ddaa96000ff06aa75cd97c9908eb06d8c2aa78a8 · bazzilic/openssl-omd · GitHub

1 Like

Updated benchmark:

                                                  |              INTEL              |                M1               |
    CURVE   | PubKey Lgth | Total B/clu | Total B | Detection time  | Detection/msg | Detection time  | Detection/msg |
 secp224r1  |  29 bytes   |  45 bytes   | 22.5 MB | 30'571 ± 31  ms |   0.0611 ms   | 24'381 ± 71  ms |   0.0487 ms   |
 sect113r1  |  16 bytes   |  32 bytes   | 16.0 MB | 42'921 ± 125 ms |   0.0858 ms   | 33'564 ± 126 ms |   0.0671 ms   |
 sect113r2  |  16 bytes   |  32 bytes   | 16.0 MB | 42'931 ± 163 ms |   0.0858 ms   | 33'607 ± 102 ms |   0.0672 ms   |
 prime256v1 |  33 bytes   |  49 bytes   | 24.5 MB | 25'705 ± 17  ms |   0.0514 ms   | 20'125 ± 55  ms |   0.0402 ms   |

Code: openssl-omd/ecdh-openssl.c at a260ff6d1ee3233576c30d363b01a360d37cf2e5 · bazzilic/openssl-omd · GitHub
Log: openssl-omd/results.md at a260ff6d1ee3233576c30d363b01a360d37cf2e5 · bazzilic/openssl-omd · GitHub

For future reference, modern Intel CPUs (Ice Lake, Tiger Lake, Rocket Lake, Sapphire Rapids and newer) and AMD Zen 4 CPUs have a vectorized (SIMD) AES instruction set (VAES) in AVX-512. Those can be used to speed up AES128 to about 4x of the numbers we see in this benchmark. Some references:

However encryption itself should not take too long compared to, for example, key exchange. Next thing I plan to do is benchmark detection stages individually (ECDH key exchange, key derivation, decryption). Might also be interesting to benchmark non-EC key exchange: the public key size will be larger (128 bit?) but supposedly key exchange is faster.