What could happen here is the following: the opencv implementation of the Mertens algorithm does convert your 8 bit per channel images into a fused floating point image. However, this fused image usually ends up with brightness values below zero and above one (which is the normalized brightness range for floating point images). If you simply output this image as a .png, for example, dark and bright image areas become clipped. That is an error I have seen in a lot of implementations. Be sure to rescale the output of the Mertens algorithm with something like
minimum = img.min()
maximum = img.max()
scaler = 1.0/(maximum-minimum+1e-6)
img = scaler*(img-minimum)
Another point to check: you have taken the 1ms/10ms/50ms images manually. That does not assure that the images which you are inputting into the Mertens algorithm are the same, as those are presumably taken during automatic capture. It could be that the camera does not really settle to the exposure time you are requesting. That issue is generic to most cameras (you have to wait for 3-50 frames, depending on the camera) and was the reason in my scanning approach to vary the power of the light source for the different exposures, not the exposure time.
The three input pictures you have posted should give you a better result than the one you obtained. Both highlights and shadows are poorly defined.
Generally, I would advise to store the separate exposures of a frame scan for archival purposes. As the HDR techniques (especially the display technology) progress, you might want to realize a real HDR workflow, which is different from what the Mertens algorithm is aiming at.
While you can do a Mertens with only two appropriately chosen images, you will need to cover at least a range of 5 stops for scanning all the details of a standard Kodak color reversal film (other film stock is less demanding). For most camera sensors, this means you should not go below three different exposures. If you sample more exposures from a given frame, the Mertens algo automatically reduces the noise of the camera sensor. Also, you are better off with more exposures if, at a later stage, you want to produce real HDR footage (that would be than “Debevec”, but without tone-mapping).