HDMI/SDI output

Sometimes, what you want from a video mixer isn’t a stream, just another output that goes to e.g. a projector—or you might want the live stream, but also a monitor output on a separate display. You could of course play the stream on another PC, but for many uses, the end-to-end latency is too high, and you might not want to involve a full extra PC just for this anyway.

Thus, since 1.5.0, Nageru supports using a spare output card for HDMI/SDI output, turning it into a simple, reasonably low-latency audio/video switcher.

Setting up HDMI/SDI output

Turning on HDMI/SDI output is simple; just right-click on the live view and select the output card. (Equivalently, you can access the same functionality from the Video menu in the regular menu bar, or you can give the –output-card= parameter on the command line.) Currently, this is supported for DeckLink cards only (PCI/Thunderbolt), as the precise output protocol for the Intensity Shuttle cards is still unknown. The stream and recording will keep running just as before.

A video mode will automatically be picked for you, favoring 59.94 fps if possible, but you can change the mode on-the-fly to something else if you’d like, as long as the resolution matches with what you’ve set up at program start. Note that whenever HDMI/SDI output is active, the output card will be the master clock; you cannot change it to any of the input cards. This also means that the frame rate you choose here will determine the frame rate for the stream.

A note on latency

For a regular stream, a few seconds of latency is usually acceptable (clients will typically buffer at least a few seconds), and thus, working hard to shave off single frames’ worth of latency is not worth it. However, for a live output, every millisecond of latency counts; if you have a stage with a projector behind it, 500 ms latency on the projector looks distinctly out of sync with what’s happening on stage. Thus, HDMI/SDI output typically has much stricter latency demands than usual streaming.

Nageru is capable of low latency operation, but not extremely low latency; for instance, it waits for an entire frame to arrive before processing it. (This is a complexity and flexibility tradeoff; anything else would make e.g. scaling nearly impossible.) Well-designed hardware switcher setups can do cut-through switching to get latency down to as little as one frame [1] or less, ie. 16.7 ms at 60 fps; Nageru can get down to 2–3 frames (50 ms) given the right hardware, and in general, 100 ms isn’t hard.

[1]Since almost all latency in a realtime video setup is caused by processing of various forms and not by length of the cable, most forms of latency will be proportional to the length of a frame. Thus, one will often see latency calculated in terms of number of frames, not milliseconds, and video at higher frame rates will often see less delay. Networking (and by extension streaming) is different; there, jitter and delay is more often caused by propagation and administrative delays, and latency is more often independent of the frame rate.

Typical sources of latency

This section aims to illuminate some of the sources of latency and how to deal with them. Note that often, latency is at odds with throughput, and so, tradeoffs must be made. The most important sources of latency are:

  • Frame transmission latency: Unlike computer networks, HDMI and SDI transmit their frames pretty much in real time, ie., sending one frame takes one frame of time. For cut-through switching (which includes HDMI → SDI conversion and the other way around), this doesn’t really matter, but Nageru has to receive the entire frame before it can start processing it (and by extension, send the result frame out). Thus, you will typically get one frame of latency just by having Nageru, or really any switcher/mixer with digital effects, in the chain at all.
  • Jitter and queuing latency: Unless you are lucky enough to have an all-SDI setup where everything runs off of a shared reference clock, frames on different devices, as well as on the output, will be at random offsets from each other (and also drifting slowly, even if they are at the same frame rate). Thus, some sort of input queue is needed for each input card, and the time a frame spends in the queue before being picked out for processing is by definition extra latency. (Note that this means that latency is not a single number for the chain as a whole, but can vary by input.)
  • Processing latency: By definition, processing of each frame has to take less than one frame’s worth of time, or else the system can’t keep up. But if you have a fast GPU and/or do little processing, you can spend significantly less. Thus, if you’re after the lowest possible latency, a faster GPU might help you shave off a fraction of a frame here.
  • Output latency: Finally, cards have their own output queue, and some will expect there to be multiple frames in it before outputting anything. This is outside Nageru’s control, unfortunately, but can easily add 2–3 frames of latency. If you want to avoid this, look for Blackmagic’s “4K” series of cards, which are of a newer, lower-latency design than the previous cards. The 4K series in this context include everything that have “4K” in their names, plus the Mini Recorder, Duo 2 and Quad 2 devices.

Controlling latency

Of the different sources of latency outlined in the previous section, the only one that is really under your control (short of buying faster or better hardware) is the input queue latency. By default, Nageru attempts to strike a balance between reducing latency and having to drop frames due to jitter; by looking at each queue’s input length history, it attempts to find a “safe queue limit”, above which it can drop frames without risking underrun (which requires duplicating frames). However, if latency is more important to you than 100% smooth motion, you can override this by using the –max-input-queue-frames= flag; this is a hard limit on the number of frames that can be kept in the queue, on top of Nageru’s own heuristics. It cannot be set lower than 1, or else all incoming frames would immediately get dropped on arrival.

However, even though the other factors are largely outside your control, you still have to account for them. Nageru needs to know when to begin processing a frame, and it cannot do this adaptively; you need to give Nageru a latency budget for processing and output queueing, which tells it when to start processing a frame (by picking out the input frames available at that time). If a frame isn’t processed in time for the output card to pick it up, it will be dropped, which means its effort was wasted. (Nageru will tell you on the terminal if this happens.) The latency budget is set by –output-buffer-frames=, where the default is a pretty generous 6.0, or 100 ms at 60 fps; if you want lower latency, this you probably want to adjust this value down to the point where Nageru starts complaining about dropped or late frames, and then a bit up again to get some margin. (But see the part about audio latency <audio-latency> below.) Note that the value can be fractional.

As an exception to the above, Nageru also allows slop; if the frame is late but only a little (ie., less than the slop), it will give it on to the output card nevertheless and hope for forgiveness, which may or may not cause it to be displayed. The slop is set with –output-slop-frames=, where the default is 0.5 frames.

Audio latency

Since Nageru does not require synchronized audio sources, neither to video nor to each other (which would require a common, locked reference clock for all capture and sound cards), it needs to resample incoming audio to match the rate of the master video clock. To avoid buffer underruns caused by uneven delivery of input audio, each card needs an audio input queue, just like the video input queue; by default, this is set to 100 ms, which then acts as a lower bound on your latency.

If you want to reduce video latency, you will probably want to reduce audio latency correspondingly, or audio will arrive too late to be heard. You can adjust the audio latency with the –audio-queue-length-ms= flag, but notice that this value is in milliseconds, not in frames.

Audio and video queue lengths do not need to match exactly; the two streams (audio and video) will be synchronized at playback, both for network streaming and for HDMI/SDI output.

Measuring latency

In order to optimize latency, it can be useful to measure it, but for most people, it’s hard to measure delays precisely enough to distinguish reliably between e.g. 70 and 80 milliseconds by eye alone. Nageru gives you some simple tools that will help.

The most direct is the flag –print-video-latency. This samples, for every 100th frame, the latency of that frame through Nageru. More precisely, it measures the wall clock time from the point where the frame is received from the input card driver (and put into the input queue) to up to four different points:

  • Mixer latency: The frame is done processing on the GPU.
  • Quick Sync latency: The frame is through VA-API H.264 encoding and ready to be muxed to disk. (Note that the mux might still be waiting for audio before actually outputting the frame.)
  • x264 latency: The frame is through x264 encoding and ready to be muxed to disk and/or the network. (Same caveat about the mux as the previous point.)
  • DeckLink output latency: The HDMI/SDI output card reports that it has shown the frame.

As every output frame can depend on multiple input frames, each with different input queue latencies, latencies will be measured for each of them, and the lowest and highest will be printed. Do note that the measurement is still done over a single output frame; it is not a measurement over the last 100 output frames, even though the statistics are only printed every 100th.

For more precise measurements, you can use Prometheus metrics to get percentiles for all of these points, which will measure over all frames (over a one-minute window). This yields more precise information than sampling every 100 frames, but setting up Prometheus and a graphic tool is a bit more work, and usually not worth it for simple measurement. For more information, see Monitoring.

Another trick that can be useful in some situations is looping your signal, ie., connecting your output back into your input. This allows you to measure delays that don’t happen within Nageru itself, like any external converters, delays in the input driver, etc.. (It can also act as a sanity check to make sure your A/V chain passes the signal through without quality degradation, if you first set up a static picture as a signal and then switch to the loop input to verify that the signal stays stable without color e.g. shifts [2]. See the section on the frame analyzer for other ways of debugging signal integrity.)

For this, the timecode output is useful; you can turn it on from the Video menu, or through the command-line flag –timecode-stream. (You can also output it to standard output with the flag –timecode-stdout.) It contains some information about frame numbers and current time of day; if you activate it, switch to the loop input and then deactivate it while still holding the loop input active, the timecode will start repeating with roughly the same length as your latency. (It can’t be an exact measurement, as delay is frequently fractional, and a loop length cannot be.) The easiest way to find the actual length is to look at the recorded video file by e.g. dumping each frame to an image file and looking at the sequence.

In general, using Nageru’s own latency measurement is both the simplest and the most precise. However, the timecode is a useful supplement, since it can also test external factors, such as network stream latency.

[2]If you actually try this with Nageru, you will see some dark “specks” slowly appear in the image. This is a consequence of small roundoff errors accumulating over time, combined with Nageru’s static dither pattern that causes rounding to happen in the same direction each time. The dithering used by Nageru is a tradeoff between many factors, and overall helps image quality much more than it hurts, but in the specific case of an ever-looping signal, it will cause such artifacts.