well, for some basic math, there exists a trick in opencv to do computations on the graphics card. If you are writing a line like
img = cv2.UMat(img)
the image will be transfered to the GPU for further processing. However, before you get too excited:
- you are creating overhead by transfering data from CPU to GPU and back. Do not underestimate this, it can take longer to transfer the data than to use optimized CPU-code.
- some stuff does not work, like
img.shape
needs to be recoded asimg.get().shape
. Which incidently grabs the image stored on the GPU, transfers and recodes it to a CPU image which has ashape
-attribute. - most importantly: many advanced image processing algorithms have not been reimplemented to run on the GPU - I bet
mergeMertens
is one them.
To round up this comment: well, it might be possible to utilize the computing powers of the GPU within the opencv context, but it is probably of no help in our mergeMertens
-case and not worth the effort (feel free to start here for some details on how this UMat
-thing works).
The thing is that for each single final frame, a lot of incoming image data would need to be transfered to the GPU and back again. However, the computations which are done in the mergeMertens
-code by itself are not that challenging, so a speed gain vs. the CPU by using the GPU might be eaten up by the supporting/transporting code. Also, if you are programming in Python, you are too far away from the original code base to even judge what is going on - you are totally dependent on the specific implementation (library) you are calling with your python code.
Well, if you are working with small format films, there is another easy way to achieve some speed-up: just work with the smallest resolution you need to use for the quality you are aiming at. Here are some timings for the exposure fusion algorithm and varying image size (I’ll give just the horizontal size, the vertical size scales accordingly; this is for a 5-image-stack):
- 960 px: 530 ms
- 1200 px: 660 ms
- 1600 px: 1000 ms
- 1920 px: 1300 ms
- 2880 px: 2800 ms
Clearly, the processing time scales badly with image size.
The trick to speed up processing: for every processing step along your pipeline choose a resolution which just works, maybe with a little safety margin. That is especially valid when working with small movie formats, as their native resolution is not that great anyway.
Scale down material as early as you can, not at the end of your processing pipeline. You have to try out what is possible. You mileage will vary, depending on the software you are using and the release format you are choosing.
For example, I scan with 2880 px width (but: overscan), do exposure fusion with 1800 px (still overscanned), cut-out/stabilize/color-grade as other stuff with 1440 px (full frame) and aim generally at only 960 px output resolution for Super-8 material. At least for my material, anything above that 960x720 px output resolution is wasted.