OpenCV in the Browser? Lets give it a try

d_fens · November 29, 2023, 10:42am

Hi guys,

I’ve been working my software and decided to take on the challenge of testing OpenCV running in the browser. The goal was simple :
create a standalone device where Python code controls the motor, camera, and handles exposure bracketing, saving the data on a fast SSD. The GUI is a web interface (specifically Vue) that interacts with a Flask API. So you only need a web browser on any device within the same network, be it a tablet or an old smartphone to control the process.

Why the hassle?

The question arose on how to process the accumulated images. Initially, the idea was to use a powerful PC for Mertens merge, cropping, and aligning the images using the sprocket holes. However, setting up a new codebase on the PC (Or Mac), installing Python and dependencies didn’t feel right. So why not use the Raspberry Pi only to host Javscript Code while the Power PC handles the processing through the browser? This would mean no additional software installations, just pure browser-based processing.

I was skeptical about the performance with JavaScript and even the feasibility, but it was an interesting challenge I couldn’t resist.

Starting up

I started with OpenCV.js, a JavaScript binding for OpenCV that runs in the browser via WebAssembly (WASM). WASM is a low-level, binary instruction format that provides near-native performance by allowing web applications to run compiled code in the browser. It’s more efficient than JavaScript, especially for CPU-intensive tasks like image processing, as it’s closer to machine code.

After downloading the initial version provided by OpenCV and experimenting a bit, I successfully merged two images. However, trying with two full-size HQ camera images led to an ‘out of memory’ error.

“I never said ‘640K should be enough for anybody’”

I discovered that the memory limit was due to Emscripten (the tooling that converts the opencv c code to js and wasm code) and the flags set by OpenCV during its build process. It was clear I needed to create my own build.

So, I installed Emscripten, set it up, cloned the OpenCV git, and executed my first build with --build_wasm and a flag for a 4GB memory max. The build was successful, but I faced async loading issues in HTML and a bug (wasm build with: EXPORT_NAME not used in shell, · Issue #20801 · emscripten-core/emscripten · GitHub) which I managed to fix. Despite the initial success, I still encountered the ‘Out of Memory’ issue, due to the --build_flags not overriding the defaults in the CMakeLists.txt.

Direct adjustments in CMakeLists.txt were necessary. Finally, it worked! I managed to merge five full-size images, but whats the performance?

Performance Analysis: The First Runs

The first run of Mertens Merge took around 6010.7ms, with subsequent runs averaging 5686.28ms. This discrepancy was due to the initial memory allocation process, as enabled by ALLOW_MEMORY_GROWTH=1 in WASM.

Next Level: Introducing SIMD

Next, I explored Single Instruction, Multiple Data (SIMD) capabilities within WASM. SIMD allows the execution of a single operation on multiple data points simultaneously, significantly boosting processing efficiency for tasks like image merging.
This was just a rebuild of OpenCV.js with the additional --simd parameter and the performance improved significantly: the first run took 4448.38ms, and subsequent runs were around 4112.38ms. Thats a 28% increase in speed, nice!

Pushing Further: Implementing Web Workers and Threading

I aimed higher and added --threads for the next build. This led to the ultimate performance boost by implementing multithreading using Web Workers in WASM, which paralleled tasks across multiple CPU cores. Although I initially faced a JavaScript error (cv undefined), fixing another bug (opencv_js.worker.js with MODULARIZE generates invalid code · Issue #20800 · emscripten-core/emscripten · GitHub) led to a successful implementation.
With four worker threads and SIMD, the first run came down to 2254.54ms, with subsequent runs around 1896.6ms – a further 54% improvement.

Performance Comparison: The python OpenCV Version

To benchmark my progress, I created a simple PyQt application performing the same Mertens Merge:


import sys
import cv2
import numpy as np
import time
from PyQt5.QtWidgets import QApplication, QWidget, QPushButton, QVBoxLayout, QFileDialog

class MertensMergeApp(QWidget):
    def __init__(self):
        super().__init__()

        self.initUI()
    
    def initUI(self):
        self.setWindowTitle("Mertens Merge Demo")
        self.setGeometry(100, 100, 200, 100)

        layout = QVBoxLayout()
        self.setLayout(layout)

        loadButton = QPushButton("Load Images", self)
        loadButton.clicked.connect(self.loadImages)
        layout.addWidget(loadButton)

        mergeButton = QPushButton("Perform Mertens Merge", self)
        mergeButton.clicked.connect(self.performMerge)
        layout.addWidget(mergeButton)

    def loadImages(self):
        options = QFileDialog.Options()
        files, _ = QFileDialog.getOpenFileNames(self, "Select Images", "", "Image Files (*.png *.jpg *.jpeg)", options=options)
        if files:
            self.images = [cv2.imread(file) for file in files]

    def performMerge(self):
        if hasattr(self, 'images'):
            print("Performing Mertens Merge...")

           
            # Mertens Merge
            merge_mertens = cv2.createMergeMertens()
            start_time = time.time()

            merged_image = merge_mertens.process(self.images)

            # Convert scale of merged image
            merged_image = np.clip(merged_image * 255, 0, 255).astype('uint8')

            end_time = time.time()
            print(f"Mertens Merge completed in {end_time - start_time:.2f} seconds")

            # Save or display the result...
            cv2.imwrite("merged_result.jpg", merged_image)
            print("Merged image saved as merged_result.jpg")

if __name__ == '__main__':
    app = QApplication(sys.argv)
    ex = MertensMergeApp()
    ex.show()
    sys.exit(app.exec_())

To my surprise, the PyQt application completed the merge in 2.76 seconds!
This means: The WASM implementation with SIMD and multithreading was over 30% faster than the PyQt version, which is likely due to the more direct hardware access and optimization capabilities in WASM compared to Python. I think this significant performance leap demonstrates the capabilities of modern web technologies in handling intensive image processing tasks efficiently.

Looking Ahead: Embracing OpenCV in the Browser

The journey has been long and full of learning, but the final outcome for me is impressive. The next step is to develop the code that handles the processing of accumulated images and possibly leverages multiple Clients in the same network. I’m really satisfied with the results and hope you found this as interesting as I did!

cpixip · November 29, 2023, 11:35am

@d_fens - what an amazing amount of work! Congratulations!

One small comment: which resolution had your test images? Have you tried to merge a more realistic number of images, ie., 3 to 5 images? How do processing times scale here?

d_fens · November 29, 2023, 11:40am

yep, it was a journey …

sorry to not have made that clear enough: all timings are for 5 full size hq camera images