The Missing Link? -- Sound Capture with Hardware

Okay guys. We’ve been avoiding it. But what if we consider this as an option? Realistically, what does it take? I’m pretty sure you can’t just buy the sound capture hardware as a ‘module’ and slap it on.

The questions in my mind are:

  • what is the cost
  • are the parts easily purchased
  • how delicate is the calibration and setup process (will it stand up to stress and use without constant maintenance)



Are you talking optical soundtracks?

For variable width optical soundtracks you need a ‘slit’ light-source, a photocel and an amp.
The calibration is quite necessary, but the circuitry is simple.
The soundtack varies within a fixed width, the lightsource is a very thin line, so it results in the photocell receiving an amount of light proportional to the width of that section of the soundtrack. It is very simple really.

The difficulty is in the calibration, the speed has to be constant, the slit perfectly focused and aligned to the soundtrack, i.e. the line has to fall horizontally across the soundtrack, and even a slight angle will cause the sound to be inferior.
If you can’t lock the speed to great enough precision, then you could potentially create a ‘click track’ by detecting the sprocket holes, and use a digital algorithm to adjust the sound later.

For 35mm you certainly can just buy modules, but for small guage film, probably not.

@Peter that’s very helpful. I actually have a silt somewhere (I think) I could test with. I’ll add it to the (long) list.

Do you have a link to the purchasable modules?


On other thing to keep in mind, is that obviously you can’t use a 50Hz or 60Hz driven light source, as the flicker would be picked up as a 100Hz or 120Hz ‘tone’, so you need either a ‘constant’ light source, or one that flickers at superaudible frequencies. Also of course, the sound is offest by a fixed number of frames to the images on the film, depending on the film guage, so that offset needs to be taken into account if you wish to capture the vision and sound at the same time and have it synched.

There are a ton of modules available, mostly as replacements for 35mm projectors, they vary from projector model to projector model.
Most now are LED and sensor drop in replacements for the older bulb/photocell arrangements on older projectors, like the JaxLight among others.

Audio pickup is a problem in the telecine world as well. Most Spirits were delivered without audio, as they were used for negative capture of original negative film. Because of this various system have been developed to retrofit for sound. The “inexpensive” one is from Point 360.

This is an entire kit with modified audio preamps and power supply customized for telecine capture. It is based on the Component Engineering parts…

Could something like this be used. Yes, probably, and parts are available sometimes on ebay. But as Peter said it depends on a extremely stable transport to move the film at a correct and constant speed.


Also, what happened to the idea of extracting audio from the picture area? I know the samples Matt had online were not great, but the idea is being used/tested in several places. The software I am familiar with is AEO-Light.
I have tried this with a sample from the Spirit and it worked ok, but not great. But I now have noticed that Blackmagic is using this approach with their scanner.
To Quote: “Audio- Extraction from scanned image.”
There are various other systems, I know Sondor had a kit available that mounted in front of the gate where the keycode reader usually lives. I think MWA built something as well.
all of this is stupid expensive though, but worth a look for technical reasons…

there are modules for detecting position consisting of a LED and Photo-transistor. in a 16mm projector their was just a lamp, a slit and a lens to project a narrow line of light across the film, and a phototube (or photodiode/Transistor) to pick up the variance of the total light passing.

RECENT 35mm prints have a cyan track which works best with red light. and proably use a red laser, like in a laser pointer.

I got a laser level at the hardware store one time which has a diffraction grating to spread the laser light out in a line. that sort of hardware coeld proably be used to fake the light source, which would have to be powered from DC. (some of the old projectors used a ultrasonic AC source, but just because that let them filter out the AC hum by using a small transformer.)

A photo-transistor could easily detect the beam, and give audio out. BUT the trick as mentioned is teh film has to run smoothly at the correct speed.

For that matter, one could just use the position detector to key off a sprocket on a working projector and pick the audio out of the amplifier. send both the audio and the sproket track to a digital recorder, and use the sproket track to keep the sound in sync with the pictures.

1 Like

The reason AEO-Light isn’t used is that the sound quality isn’t great, and you really need to be scanning at 4K or better with great optics for the sound processing to work well, so it is out of the realm for some people.
As mentioned before and again by cmacd, if you either record the sprocket holes as ‘ticks’ on a third channel, or use the ‘ticks’ to control a recording device, you can compensate somewhat for variance in speed.

Yes, the newer cyan track works with a red laser diode, and they are readily available.

One more idea- and this is being used somewhere, though I don’t know what software actually makes it work. There have been several systems created that use a separate video camera, probably line scan to capture a file of some sort, that can be used to recreate the audio. I have seen it on a sound follower system at Chace Audio, now part of Deluxe, as well as on the experimental Walde scanner. I am sure it would take some work to get the right camera, the right optics, the right backlight, and of course it would depend on the speed being accurate and constant, which is not trivial. Actual, I just remembered that Darren Walde told me he was using a special version of AEO-light with this system, one that could process a continuous track, as opposed to stitching frames together. It seems a camera system could be chosen with enough resolution to make this viable.

Certainly could be done, but if the speed needs to be constant anyway, then a photocell and light source would be much cheaper and less complex to achieve the same end.

I am currently writing my master thesis about restoration of optical soundtracks.
The development of an extraction and restoration software is part of this.
At the moment I have implemented plenty functions without gui.
So far I am able to extract audio signals from images with higher quality than AEO-Light does.
I want to improve my software, but my material sources are limited.
This means that I only have three uncompressed videos from the AEO-Light home page to work with.
So I am searching for all sorts of digitized film material with different types of optical audio, with damages and dirt on the audio.
Can someone at this forum help me?
Short clips of 35mm, 16mm and 8mm film would do it (uncompressed if possible).

Here is a first comparison of AEO-Light and my software at the current state:


Andreas Great job! I’ll keep an eye on your topic.

Hello Andreas! At your request, here’s my scan samples. -
I really hope for a great result.

Thank you for your samples!
@dan74: Whew, your sample is really a worst case scenario.
There are several technical aspects, which make it hard to get a “great” result here.
First of all, there is virtually no significant overlap between neighbour frames, so there are gaps in the audio. AEO-Light and also my software (at this point) are not able to fill gaps in the audio signal.
This causes the stammer artefacts in the audio signal of the sample.
But that is a new point that should be considered in the future, an algorithm to interpolate gaps in the audio.
The next problem is small resolution of the images, they are only in 480p.
So (without overlap) there are still 480 audio samples per frame, this results in a sampling rate of 11520Hz (at 24fps) and according to the sampling theorem the audio signal could then only have a bandwidth within 0Hz-5760Hz.
Next problem is a high jpeg compression. The audio signal is stored in the transition between the brighter and darker areas of soundtrack. Jpeg compression is DCT based, so the image is transformed to the frequency domain and is quantised to reduce data.
This also affects the data on transitions and areas in form of ringing and blocking. So while maybe areal noise is reduced, new artefacts are brought in the audio signal.
Then, there seem to be line artefacts and some kind of ghosting artefacts, maybe caused by a suboptimal CCD sensor.
Uneven background illumination and so on…
Some of this artefacts could be reduced, but it would be much better to avoid them.
But generally, worst case scenarios are absolutely welcome to push the limits of the software.

That brings me to some theoretical aspects, which should be considered during the digitisation of optical soundtracks, to make a better result.


An optimal usage of image area would be the green frame. This is the best compromise between audio overlap and movie picture resolution.
Since my software uses the image information to generate an ideal overlap, it is good to have some characteristic details like sprocket holes on it.
This is superior to stitching after audio analysis, because it avoids misinterpretations of periodic audio signals. Also the resolution between both overlap positions can be interpreted as information about shrinkage of the film.
In fact, a correct scan with 2K resolution would be absolutely enough to cover the full bandwidth of the audio signal. However 4K would be a better compromise to the movie picture resolution.
Due to the area needed for the audio overlap, the movie picture resolution would fall under HD resolution if a 2K sensor is used.
It would be best to use a monochromatic image sensor, because of the lack of a Bayer-pattern and as a result of this, a better filling factor and anti-aliasing.
For variable density soundtracks a higher bit depth is needed, because the audio information is stored in the luminance. So a smaller bit depth is a lower precision quantisation of the audio signal.
An uneven background illumination causes hum in the audio signal, so it should be as even as possible.
The sensor which is used to digitise, should not have a fixed pattern noise and the optics should be clean and free of dust (even dust causes an audible periodic noise).
Some of the resulting artefacts can be reduced or completely removed, but it is still better to avoid them.

1 Like

Hello Andreas.
Thank you very much for your detailed response.

Andreas , Your software and we’ll see here?

@ dan74
I will try to enhance your clip.
I plan to make the software open source, once it is in better state of development.
But it will only happen after I have finished my thesis in August.