How to run Tesseract from web browsers with the help of emscripten

Tesseract is a fairly accurate Optical Character Recognition (OCR) engine available as open source and free software. It’s written in C and C++ and usually runs from the command line or from a GUI. But I work on a web application where we have to let thousands of users run OCR tasks from their web browsers without having to rely on server-side processing. So we asked our great Capgemini developers team to compile Tesseract so that it could be executed from Javascript code. This is were emscripten is useful : it allows programs written in C to be compiled into Javascript. This compilation was tricky but they made it.

Here is their HOWTO documentation.

I share it under the Create Commons CC-BY-SA 3.0 license so feel free to improve it as you like.