OpenCV has debayering built in and is likely as fast as you will find. It’s worth figuring out where your slowdown is though. I would be surprised if it’s in the debayer itself, and not in something else like I/O, especially if it’s high resolution. The actual act of debayering is not especially intensive, so it should be fast. If you can, build timers into your code at the beginning and before/after each individual operation. Then you can display the duration for each task and see where things are getting bogged down.
This would also be doable in OpenCV. One method is to create some template B/W images of just the perforations for each type of film you’re scanning, then have OpenCV use its pattern matching tools to locate the perfs in your image. Then you can determine where that perf is in the frame and simply move the whole image on the X,Y axes to align it to a predefined position. It should be very fast. We did some tests on 14k images a while back and the time it took to do the pattern match was measured in microseconds. One thing with OpenCV - you generally want to work on a copy of your full color image, that’s monochrome, so that processing is faster. then apply whatever transform you want to the full color image before outputting.
OpenCV has really good python integration, and most of the documentation and example videos you’ll find will include both C++ and Python code.