In this post, I will explain why supporting 64-bit integers requires special attention, what Cheerp used to do in the past, why the current situation is better, and what we will be able to do in the future to further improve support.
This may be not very efficient, but it allows to compile programs that make use of types like
long long ) and
unsigned long long ).
Clang usually compiles these C++ types into the LLVM
i64 type. But since we cannot represent
int64_t/uint64_t in terms of 32-bit instructions on the
i32type. This decomposition was done directly in Clang when generating LLVM IR code. The reasoning behind this was to do this operation as soon as possible to enable more optimizations, and to never have
i64 values in our IR to avoid possible issues.
For example, for the following C code:
int64_t i = …; int64_t j = …; j = j & i;
Clang normally emits LLVM IR like this:
%i.addr = alloca i64, align 8 %j.addr = alloca i64, align 8[…]%i = load i64, i64* %i.addr, align 8 %j = load i64, i64* %j.addr, align 8%and = and i64 %i, %j
store i64 %and, i64* %j.addr, align 8
While in Cheerp it used to emit something like this:
%i.addr = alloca [2 x i32], align 8 %j.addr = alloca [2 x i32], align 8[…]%i.gep.high = getelementptr inbounds [2 x i32], [2 x i32]* %i.addr, i32 0, i32 1 %i.high = load i32, i32* %i.gep.high, align 4 %i.gep.low = getelementptr inbounds [2 x i32], [2 x i32]* %i.addr, i32 0, i32 0 %i.low = load i32, i32* %i.gep.low, align 8%j.gep.high = getelementptr inbounds [2 x i32], [2 x i32]* %j.addr, i32 0, i32 1 %j.high = load i32, i32* %j.gep.high, align 4 %j.gep.low = getelementptr inbounds [2 x i32], [2 x i32]* %j.addr, i32 0, i32 0 %j.low = load i32, i32* %j.gep.low, align 8%and.high = and i32 %i.high, %j.high %and.low = and i32 %i.low, %j.low
store i32 %and.high, i32* %j.gep.high, align 8 store i32 %and.low, i32* %j.gep.low, align 4
This design decision proved to be an hindrance for supporting real 64 bit integers for WebAssembly output.
Supporting 64-bit integers in WebAssembly
WebAssembly natively supports i64 values, so we don’t need need to manually decompose them into i32 and emulate all the instructions.
Moreover, since we were never actually generating
i64 values from clang, we were faced with the daunting task of “reconstructing” i64 values and instructions from their lowered i32 version. Or alternatively modify Clang even further to sometimes compile
in64_t to a pair of
i32 , and sometimes to
We can use these features to solve the issue of representing types like
The only problem is that not all modern browsers support it yet: Safari will support it from version 14, and old browsers like Internet Explorer will just never support it at all.
BigInt and Wasm
i64, which would be needed for interoperability, is even less supported currently (see the current state here).
So while this is a very useful feature, and Cheerp will definitely support optionally using BigInt, it is not a definitive solution to the problem.
Lowering i64 in an LLVM pass
Instead of modifying Clang to compile
int64_t values and instructions directly to
i32 , we can leave the normal code generation as it is, and later run a custom LLVM pass to remove all the
i64 and convert them to
This has the benefit of making it very easy to run the pass conditionally, and to simplify the custom logic in Clang. We can still run this pass very early in the optimization process, to get more efficient code.
There is still one issue left: interoperability. We want to be able to access a
Our solution to this problem is to always represent
[i32 x 2] in memory: when performing a load, we load two
i32 , and we use them to build the
%1 = getelementptr inbounds [2 x i32], [2 x i32]* %0, i32 0, i32 1 %2 = load i32, i32* %1, align 4 %3 = getelementptr inbounds [2 x i32], [2 x i32]* %0, i32 0, i32 0 %4 = load i32, i32* %3, align 4 %5 = zext i32 %2 to i64 %6 = zext i32 %4 to i64 %7 = shl i64 %5, 32 %i = or i64 %7, %6
When performing a store, we first split the
i64 in two
i32 and we store them sequentially:
%8 = lshr i64 %i, 32 %9 = trunc i64 %8 to i32 %10 = trunc i64 %i to i32 %11 = getelementptr inbounds [2 x i32], [2 x i32]* %i.addr, i32 0, i32 1 %12 = getelementptr inbounds [2 x i32], [2 x i32]* %i.addr, i32 0, i32 0 store i32 %9, i32* %11, align 1 store i32 %10, i32* %12, align 1
For code compiled to Wasm, Cheerp actually bitcasts the pointer to the data before the load/store, and simply load/store an
i64 directly at the address of the lower
i32 value (WebAssembly is little-endian). This way we keep the full efficiency of native
Example LLVM IR of a load (for code compiled to Wasm):
i.addr = alloca [2 x i32], align 8 %1 = bitcast [2 x i32]* %0 to i64* %i = load i64, i64* %1, align 4
Example LLVM IR of a store (for code compiled to Wasm):
%2 = bitcast [2 x i32]* %i.addr to i64* store i64 %i, i64* %2, align 8
Conclusion and future development
Do you have any questions about this feature or anything Cheerp related? Drop me a question on Twitter.