As noted in a recent article, WebAssembly adds a powerful new suite of tools to the Web development workshop. In particular, WebAssembly clears a path to running software written in arbitrary languages within a Web browser securely, and at near native speeds. Unfortunately, taking advantage of this new potential is non-trivial given that it requires expertise in both Web technologies and traditional build systems.
This article, the first in a series, shows how to compile a real-world C codebase to WebAssembly. Specific problems and general solutions are highlighted. In the end, the entire process will reveal itself to be much simpler than it might appear on the surface.
This first installment will show how to create an HTML page and associated assets that rather boringly runs the
The WebAssembly standard describes a binary instruction format that has been implemented by all major browsers since 2017. The aim of this standard is to provide a browser-embeddable, universal compile target that executes at native speed.
The current iteration of the WebAssembly specification is considered a minimum viable product (MVP) by its creators. As such, many possible features and improvements have been left out for the sake of correctness. With increased traction and stability of the base system, these features and improvements can be expected to appear in WebAssembly.
Emscripten transforms LLVM compiler output to both asm.js and WebAssembly. LLVM enjoys broad support as a compile target, meaning that any language that can be reduced to LLVM can in principle be further compiled to WebAssembly. A few years ago, I showed how to compile the InChI chemical identifier software to asm.js using Emscripten. With the more recent support of WebAssembly by all major browsers, it's time to update that work with an exhaustive description of how to compile InChI to WebAssembly.
I approached this project with four main goals:
- Don't modify the InChI source files in any way. Leaving the InChI source pristine means that this dependency can be readily swapped out to maintain compatibility with new releases.
- Build a wrapper API written in C. This wrapper must get everything it needs from the InChI sources and must introduce no external dependencies.
- Compile the wrapper and InChI dependencies into a single WebAssembly file (
*.js) and one wasm (
This article focuses on Goal (1), compilation of unmodified InChI source. Subsequent articles will focus on the remaining three goals.
The following tutorial assumes you've installed and activated the latest version of the Emscripten toolchain in your current environment. It also assumes you're running Mac OS, although other unix systems might work as well.
Compile the Native Executable
Before attempting to compile to WebAssembly, we need to first understand how to compile a native binary. Emscripten replaces your existing compile toolchain with one capable of producing WebAssembly output. As such, it's very helpful to first develop a method to compile natively before attempting to use Emscripten.
git clone https://github.com/metamolecular/inchi cd inchi ls INCHI_API INCHI_BASE INCHI_EXE LICENCE README.md readme.txt
Two directories contain the source files we'll need:
Although the InCHI repository contains makefiles, they're too tightly coupled to be useful for the purpose of cross-compilation. Instead, we'll need to devise a simple way to get the C compiler to build InChI's
In the interest of keeping the InChI repository itself clean, create a
build subdirectory and change into it:
mkdir build cd build
Error-driven devlopment is a powerful way to learn new techniques. The idea is to begin with the simplest possible idea, no matter how unlikely to work. Then, using the error messages that result, figure out how to get to the next error. Continue until you either exhaust all errors or paths forward.
The simplest idea would be to invoke the C compiler directly on the file containing InChI's
main function. It can be found at
INCHI_EXE/inchi-1/src/ichimain.c. Attempting to compile
ichimain.c directly gave:
cc \ ../INCHI_EXE/inchi-1/src/ichimain.c ../INCHI_EXE/inchi-1/src/ichimain.c:50:10: fatal error: 'conio.h' file not found #include <conio.h> ^~~~~~~~~ 1 error generated.
conio.h is a MS-DOS header. Given that I'm compiling on a MacOS system, the error is to be expected. Searching the source for the text "
conio.h" offers a clue. Starting on line 48 of the file
INCHI_BASE/src/ichiparm.c, for example, we find:
This suggests that we can move past the error by setting the
COMPILE_ANSI_ONLY flag. This can be accomplished by updating the build command:
cc \ ../INCHI_EXE/inchi-1/src/ichimain.c \ -DCOMPILE_ANSI_ONLY
Doing so generates the new error:
../INCHI_EXE/inchi-1/src/ichimain.c:56:10: fatal error: '../../INCHI_BASE/src/mode.h' file not found #include "../../INCHI_BASE/src/mode.h" ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now would be a good time to point the compiler to its required headers. Browsing through the
INCHI_BASE/src directory reveals it to contain many header files. We can include them with:
cc \ ../INCHI_EXE/inchi-1/src/ichimain.c \ -I../INCHI_BASE/src \ -DCOMPILE_ANSI_ONLY
Again a new error. This time we're notified that:
In file included from ../INCHI_EXE/inchi-1/src/ichimain.c:56: ../INCHI_BASE/src/../../INCHI_BASE/src/mode.h:77:6: error: No build target #defined, pls check compiler options... (TARGET_EXE_STANDALONE|TARGET_API_LIB|TARGET_EXE_USING_API|TARGET_LIB_FOR_WINCHI)
For this build we're targeting an executable. We can let the compiler know about this choice with the following:
cc \ ../INCHI_EXE/inchi-1/src/ichimain.c \ -I../INCHI_BASE/src \ -DCOMPILE_ANSI_ONLY \ -DTARGET_EXE_STANDALONE
Running the updated command yields another error, but this time more cryptic than before:
[...] 3 warnings generated. Undefined symbols for architecture x86_64: "_FreeAllINChIArrays", referenced from: _ProcessSingleInputFile in ichimain-787c41.o [...]
Undefined symbols for architecture means that the compiler failed to locate required functions or data structures. We've already included the header files from
INCH_BASE/src but we haven't tried to compile any of the implementation files. We can fix that with:
cc \ ../INCHI_EXE/inchi-1/src/ichimain.c \ ../INCHI_BASE/src/*.c \ -I../INCHI_BASE/src \ -DCOMPILE_ANSI_ONLY \ -DTARGET_EXE_STANDALONE
No errors! Listing the contents of the working directory yields a new file,
a.out. This is the binary executable. For now, delete it.
We're almost there. All that remains is to generate a binary with a more descriptive name than
a.out. This can be accomplished with the following change:
cc \ ../INCHI_EXE/inchi-1/src/ichimain.c \ ../INCHI_BASE/src/*.c \ -I../INCHI_BASE/src \ -DCOMPILE_ANSI_ONLY \ -DTARGET_EXE_STANDALONE \ -o inchi
This produces an executable called
inchi in the working directory. It can be executed with the command
./inchi. Verify that it produces the expected output by saving a molfile called
example.mol to the working directory. Then execute the command:
./inchi example.mol InChI version 1, Software v. 1.05 (inchi-1 executable) Linux Build of May 13 2019 09:38:13 [... more output]
example.mol file encoding benzene yielded the following output:
cat example.mol.txt * Input_File: "example.mol" Structure: 1 InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H AuxInfo=1/0/N:1,2,6,3,5,4/E:(1,2,3,4,5,6)/rA:6nCCCCCC/rB:d1;s2;d3;s4;s1d5;/rC:60.6483,-42.3537,0;69.3086,-47.3537,0;77.9689,-42.3537,0;77.9689,-32.3537,0;69.3086,-27.3537,0;60.6483,-32.3537,0;
Compile InChI to WebAssembly
Having developed a command for compiling InChI using using the native compiler puts us in excellent position to cross-compile to WebAssembly. Be sure the Emscripten toolchain is installed and activated in your current shell. Then simply change the command (from
emcc) and destination (
emcc \ -I../INCHI_BASE/src \ ../INCHI_BASE/src/*.c \ ../INCHI_EXE/inchi-1/src/ichimain.c \ -DCOMPILE_ANSI_ONLY \ -DTARGET_EXE_STANDALONE \ -o inchi.html
On MacOS you're likely to see the following error:
../INCHI_BASE/src/util.c:1562:33: error: implicit declaration of function '__isascii' is invalid in C99 [-Werror,-Wimplicit-function-declaration] for ( i = 0; i < len && __isascii( p[i] ) && isspace( p[i] ); i++ )
__isascii is yet another a Windows-only API used by InChI. It's far from clear why the IUPAC team included multiple platform-specific dependencies by default in its build. Fortunately, it also provided workarounds.
Searching the source tree for the text
__isascii reveals a hint in
INCHI_BASE/src/util.c. Starting on line 47, we find:
/* For build under OSX, advice from Burt Leland */
__APPLE__ flag gives the following command:
emcc \ -I../INCHI_BASE/src \ ../INCHI_BASE/src/*.c \ ../INCHI_EXE/inchi-1/src/ichimain.c \ -DCOMPILE_ANSI_ONLY \ -DTARGET_EXE_STANDALONE \ -D__APPLE__ \ -o inchi.html
Testing in a Browser
At this point we're ready to test the generated WebAssembly. If you open the
inchi.html file into your browser directly, you're likely to be greeted with a hung progress spinner and something like the following error:
both async and sync fetching of the wasm failed
The problem is that security constraints prevent modern browsers from opening resource files from pages loaded with the
file:/// protocol. We can work around this limitation by running a web server. A convenient lightweight server available on most systems is Python's SimpleHTTPServer. Execute it in your working directory with:
python -m SimpleHTTPServer
This server allows us to test the WebAssembly output by opening the
inchi.html file. For example, if your server runs on localhost:8000, point your browser to http://localhost:8000/inchi.html. Doing so should yield the following window:
There it is: InChI running in the browser!
But we're still not done yet. Although the
This article presents a step-by-step procedure for cross-compiling a representative C codebase to WebAssembly using Emscripten. Although there are a few small complications to pay attention to, the procedure for generating a native binary looks almost identical to the one for generating WebAssembly. As such, the instructions here should serve as a model for cross-compiling other C codebases.