OpenCV Imgwrite vs. simple_jpeg perfomance

Hi,

running this benchmark on a raspi 4 I expected simplejpeg to win the comparison by far - but it does actually lag behind, as far as I can see my OpenCV version doesn’t use libjpeg-turbo, so what could be the reason for this?

opencv config:

Summary

General configuration for OpenCV 4.6.0 =====================================
Version control: unknown

Platform:
Timestamp: 2022-06-08T21:56:33Z
Host: Linux 5.15.32-v7l+ armv7l
CMake: 3.22.4
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/gmake
Configuration: Release

CPU/HW features:
Baseline:
requested: DETECT

C/C++:
Built as dynamic libs?: NO
C++ standard: 11
C++ Compiler: /usr/bin/c++ (ver 10.2.1)
C++ flags (Release): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release): -Wl,–gc-sections -Wl,–as-needed -Wl,–no-undefined
Linker flags (Debug): -Wl,–gc-sections -Wl,–as-needed -Wl,–no-undefined
ccache: NO
Precompiled headers: NO
Extra dependencies: /usr/lib/arm-linux-gnueabihf/liblapack.so /usr/lib/arm-linux-gnueabihf/libcblas.so /usr/lib/arm-linux-gnueabihf/libatlas.so /usr/lib/arm-linux-gnueabihf/libjpeg.so /usr/lib/arm-linux-gnueabihf/libwebp.so /usr/lib/arm-linux-gnueabihf/libpng.so /usr/lib/arm-linux-gnueabihf/libtiff.so openjp2 /usr/lib/arm-linux-gnueabihf/libz.so dl m pthread rt
3rdparty dependencies: libprotobuf ade ittnotify IlmImf quirc

OpenCV modules:
To be built: calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python3 stitching video videoio
Disabled: world
Disabled by dependency: -
Unavailable: java python2 ts
Applications: -
Documentation: NO
Non-free algorithms: NO

GUI: GTK3
GTK+: YES (ver 3.24.24)
GThread : YES (ver 2.66.8)
GtkGlExt: NO
VTK support: NO

Media I/O:
ZLib: /usr/lib/arm-linux-gnueabihf/libz.so (ver 1.2.11)
JPEG: /usr/lib/arm-linux-gnueabihf/libjpeg.so (ver 62)
WEBP: /usr/lib/arm-linux-gnueabihf/libwebp.so (ver encoder: 0x020e)
PNG: /usr/lib/arm-linux-gnueabihf/libpng.so (ver 1.6.37)
TIFF: /usr/lib/arm-linux-gnueabihf/libtiff.so (ver 42 / 4.2.0)
JPEG 2000: OpenJPEG (ver 2.4.0)
OpenEXR: build (ver 2.3.0)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES

Video I/O:
DC1394: NO
FFMPEG: YES
avcodec: YES (58.91.100)
avformat: YES (58.45.100)
avutil: YES (56.51.100)
swscale: YES (5.7.100)
avresample: NO
GStreamer: NO
v4l/v4l2: YES (linux/videodev2.h)

Parallel framework: pthreads

Trace: YES (with Intel ITT)

Other third-party libraries:
Lapack: YES (/usr/lib/arm-linux-gnueabihf/liblapack.so /usr/lib/arm-linux-gnueabihf/libcblas.so /usr/lib/arm-linux-gnueabihf/libatlas.so)
Eigen: YES (ver 3.3.9)
Custom HAL: NO
Protobuf: build (3.19.1)

OpenCL: YES (no extra features)
Include path: /tmp/pip-wheel-u79916uk/opencv-python_ea2489746b3a43bfb3f2b5331b7ab47a/opencv/3rdparty/include/opencl/1.2
Link libraries: Dynamic load

Python 3:
Interpreter: /usr/bin/python3 (ver 3.9.2)
Libraries: /usr/lib/arm-linux-gnueabihf/libpython3.9.so (ver 3.9.2)
numpy: /usr/local/lib/python3.9/dist-packages/numpy/core/include (ver 1.22.3)
install path: python/cv2/python-3

Python (for build): /usr/bin/python3

Java:
ant: NO
JNI: NO
Java wrappers: NO
Java tests: NO

Install to: /tmp/pip-wheel-u79916uk/opencv-python_ea2489746b3a43bfb3f2b5331b7ab47a/_skbuild/linux-armv7l-3.9/cmake-install

import cv2
import numpy as np
from simplejpeg import encode_jpeg
import time
import threading

# Create 10 random 2K images
num_images = 10
image_list = [np.random.randint(0, 256, (1080, 2048, 3), dtype=np.uint8) for _ in range(num_images)]

def benchmark_opencv(images, results):
    durations = []
    for idx, img in enumerate(images):
        start_time = time.time()
        cv2.imwrite(f'temp_opencv_{idx}.jpg', img)
        durations.append(time.time() - start_time)
        print(f"OpenCV: Image {idx + 1} saved.")
    results['opencv'] = durations

def benchmark_simplejpeg(images, results):
    durations = []
    for idx, img in enumerate(images):
        start_time = time.time()
        jpeg_data = encode_jpeg(img)
        with open(f'temp_simplejpeg_{idx}.jpg', 'wb') as f:
            f.write(jpeg_data)
        durations.append(time.time() - start_time)
        print(f"simplejpeg: Image {idx + 1} encoded and saved.")
    results['simplejpeg'] = durations

results = {}

# Create threads
opencv_thread = threading.Thread(target=benchmark_opencv, args=(image_list, results))
simplejpeg_thread = threading.Thread(target=benchmark_simplejpeg, args=(image_list, results))

# Start threads
opencv_thread.start()
simplejpeg_thread.start()

# Wait for both threads to finish
opencv_thread.join()
simplejpeg_thread.join()

# Analyze the timings
print(f"\nOpenCV Average: {np.mean(results['opencv']):.5f} seconds")
print(f"simplejpeg Average: {np.mean(results['simplejpeg']):.5f} seconds")

print(f"OpenCV Total: {sum(results['opencv']):.5f} seconds")
print(f"simplejpeg Total: {sum(results['simplejpeg']):.5f} seconds")
OpenCV Average: 0.17318 seconds
simplejpeg Average: 0.25848 seconds
OpenCV Total: 1.73184 seconds
simplejpeg Total: 2.58479 seconds

Well, I think you could improve your test a little bit:

  1. JPG-encoding is not really tuned to images created via np.random. This creates images with a lot of high frequency content which challenges the encoder. An approach more correlated with real imagery might yield different results.
  2. You did not specify in your calls the encoding quality. The jpgs created might differ substantially if you only use each decoder’s defaults.
  3. Some encoders are geared to toward specific input formats. You tested only with RGB 8 bit per channel data.
  4. One should probably time “encoding” separate from “saving the image to disk”. The later depends on many things, including the file size of the encoded image. Opencv allows you to separately encode the image as well, without actually saving it.
  5. Testing both ways of creating jpgs in a single program will not give you reliable results, as both tests influence each other. Test the different approaches in different programs. Run them one after the other and make sure nothing else is stealing CPU-cycles.
  6. Do not operate the encoders at their default settings. These might not be optimal for your use case.
1 Like

Thank you, @cpixip, for your valuable insights as always. After refining the test, which included upgrading to Bookworm to access OpenCV 4.8 (which reintroduces support for setting the subsampling parameter), I’ve observed that simple_jpeg does seem to encode more quickly. However, the write speed to the SD Card is so inconsistent that it’s challenging to optimize further. I’ll consider retesting with a faster SSD once it arrives.

import os
import cv2
import numpy as np
from simplejpeg import encode_jpeg
import time
import requests

# Ensure the 'data' directory exists
if not os.path.exists('data'):
    os.makedirs('data')

# Define the desired sizes
sizes = [(800, 600), (1024, 768), (1280, 960), (1600, 1200),(4000, 3000)]

# Download 10 images in 5 different sizes
image_list = []
print("Fetching images: ", end='', flush=True)
for size in sizes:
    width, height = size
    for i in range(2):  # 2 images per size
        filename = f"data/image_{width}x{height}_{i}.jpg"
        if not os.path.exists(filename):
            response = requests.get(f'https://picsum.photos/{width}/{height}')
            with open(filename, 'wb') as f:
                f.write(response.content)
            print('.', end='', flush=True)
        image = cv2.imread(filename)
        image_list.append(image)

print("done")

QUALITY = 90  # Example quality

FACTOR_444 = int(cv2.IMWRITE_JPEG_SAMPLING_FACTOR_444)
FACTOR_440 = int(cv2.IMWRITE_JPEG_SAMPLING_FACTOR_440)
FACTOR_422 = int(cv2.IMWRITE_JPEG_SAMPLING_FACTOR_422)
FACTOR_420 = int(cv2.IMWRITE_JPEG_SAMPLING_FACTOR_420)

def test_opencv_encode_and_save(images, sampling_factor, sampling_name):
    encode_durations = []
    save_durations = []
    for img in images:
        # Encoding
        start_time = time.time()
        image_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        _, encoded_img = cv2.imencode('.jpg', image_rgb, [int(cv2.IMWRITE_JPEG_QUALITY), QUALITY, int(cv2.IMWRITE_JPEG_SAMPLING_FACTOR), sampling_factor])
        encode_durations.append(time.time() - start_time)

        # Saving
        start_time = time.time()
        with open(f'opencv_encoded_temp_{sampling_name}.jpg', 'wb') as f:
            f.write(encoded_img)
        save_durations.append(time.time() - start_time)
    return encode_durations, save_durations

def test_simplejpeg_encode_and_save(images, colorsubsampling):
    encode_durations = []
    save_durations = []
    for img in images:
        # Encoding
        start_time = time.time()
        encoded_img = encode_jpeg(img, quality=QUALITY, colorspace='RGB', colorsubsampling=colorsubsampling)
        encode_durations.append(time.time() - start_time)

        # Saving
        start_time = time.time()
        with open(f'simplejpeg_encoded_temp_{colorsubsampling}.jpg', 'wb') as f:
            f.write(encoded_img)
        save_durations.append(time.time() - start_time)
    return encode_durations, save_durations

# Test OpenCV and simplejpeg encoding and saving with 4:4:4 subsampling
print("Testing with 4:4:4 subsampling...")
simplejpeg_444_encode, simplejpeg_444_save = test_simplejpeg_encode_and_save(image_list, '444')
time.sleep(1)
opencv_444_encode, opencv_444_save = test_opencv_encode_and_save(image_list, FACTOR_444, '444')
time.sleep(1)
print(f"OpenCV 4:4:4 Average Encoding: {np.mean(opencv_444_encode):.5f} seconds")
print(f"OpenCV 4:4:4 Average Saving: {np.mean(opencv_444_save):.5f} seconds")
print(f"simplejpeg 4:4:4 Average Encoding: {np.mean(simplejpeg_444_encode):.5f} seconds")
print(f"simplejpeg 4:4:4 Average Saving: {np.mean(simplejpeg_444_save):.5f} seconds")

# Test OpenCV and simplejpeg encoding and saving with 4:2:0 subsampling
print("\nTesting with 4:2:0 subsampling...")
simplejpeg_420_encode, simplejpeg_420_save = test_simplejpeg_encode_and_save(image_list, '420')
time.sleep(1)
opencv_420_encode, opencv_420_save = test_opencv_encode_and_save(image_list, FACTOR_420, '420')
print(f"OpenCV 4:2:0 Average Encoding: {np.mean(opencv_420_encode):.5f} seconds")
print(f"OpenCV 4:2:0 Average Saving: {np.mean(opencv_420_save):.5f} seconds")
print(f"simplejpeg 4:2:0 Average Encoding: {np.mean(simplejpeg_420_encode):.5f} seconds")
print(f"simplejpeg 4:2:0 Average Saving: {np.mean(simplejpeg_420_save):.5f} seconds")

Interesting… - how did you install this opencv-version? In a fresh Bookworm-install, I did my usual

sudo apt install  libopencv-dev python3-opencv

and only ended up with:

>>> print(cv2.__version__)
4.6.0

– which obviously does not have the subsampling parameter settings. So you probably did a different install…

Yep, writing large amounts of data to the SD card is going to screw things up, usually. A lot of approaches for filmscanner software transfer instead the encoded jpg to a client-software running on a more powerful computer. In any case, comparing encoding times for various encoders is one thing, comparing write speeds of differently sized files is quite another one. As I am planning to switch from jpgs to raws, this is going to be an area of testing for me as well. With a Samsung SSD connected to one of the USB3-ports on a RP4, I did get sufficient write speed to support the basic speed of my scanner, about 1 frame per second. I have not tried yet the client-server approach nor have I looked into ways to optimize this further (besides saving of each frame in separate threads).

As far as I know, picamera2 uses internally also simplejpeg; I simply assumed in my own code that this was a decent choice and used it too.

One IO-package I am using occationally (when writing software which should be able to read DPX-files and the like) is OpenImageIO; here’s a short segment of how I use it for reading DPX-images:

try:
    import OpenImageIO as oiio
except:
    pass

if 'OpenImageIO' in sys.modules and os.path.splitext(self.fileName)[1]=='.dpx' :
    io = oiio.ImageInput.open(self.fileName)
    if io:
        self.inputImage = io.read_image()
    io.close()
else:
    self.inputImage = cv2.imdecode(np.fromfile(self.fileName, dtype=np.uint8), cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)

and this would be some code for writing (copied directly from the documentation):

import OpenImageIO as oiio
import numpy as np

def simple_write() :
    filename = "simple.tif"
    xres = 320
    yres = 240
    channels = 3  # RGB
    pixels = np.zeros((yres, xres, channels), dtype=np.uint8)

    out = oiio.ImageOutput.create (filename)
    if out:
        spec = oiio.ImageSpec(xres, yres, channels, 'uint8')
        out.open (filename, spec)
        out.write_image (pixels)
        out.close ()

As OpenImageIO is a rather highlevel interface, I would expect the performance to be somewhat lower than simplejpeg or opencv - but who knows.

Yes its a bit complicated: with Bookworm installing pip packages is now somewhat restricted - e.g. installing pipreqs gives

pi@raspberrypi:~/ScanSloth/backend $ pip install opencv-python
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.

this is to avoid a mixture of system installed packages with pip installed packages. So now python virtual environment to the rescue, its just a matter of

python -m venv venv
source venv/bin/activate

now you can install OpenCV via pip ( and not apt / system packages )

(venv) pi@raspberrypi:~/ScanSloth/backend $ pip install opencv-python

which gives 4.8 but wait - that would be too easy :slight_smile:
While this works fine now - libcamera2 is missing in the venv - and here things get murky. The libcamera2 devs do promote an installation via apt but then there is the problem as described in this libcamera2 issue as libcamera is not available inside the venv.

So the venv has to be redone ( deactivate, delete the directory ) and recreated with

python -m venv venv --system-site-packages
source venv/bin/activate 

which maps the system packages (picamera2) into the venv, and finally it works, but only if system and venv python version are the same. This wasn’t possible in Bullseye as a OpenCV dependency wasn’t available in the required version, so there I had abandon Bullseye and update to Bookworm.

Regarding the handling of image saving / processing after the scan - my goal is to have a standalone system with maybe an old smartphone running the GUI so i just can fill the SSD with data and then either do the merge processing when the Scan has finished and the system is idle or plug the drive with the collected data to a power PC where the processing is done afterwards.
Tests with the full software show that simple_jpeg is times faster than OpenCV so for me this is the way to go.

1 Like

Wow… - this is what I would call crazy. Thanks for writing this guide - there might be people out there who are searching exactly for this information!

That looks like a nice setup. Especially setting the focus should be rather easy with a portable smartphone or tablet. It is no fun if your camera image is displayed on a desktop-PC some meters away from your scanner…

So finally, simplejpeg has the crown again. :+1:

Here is the obligatory xkcd.

That sort of rundown in @d_fens guide has been my experience every single time I’ve ever tried to use Python for anything non-trivial… which is one of the reasons I don’t often use Python! :sweat_smile:

2 Likes