Futatabi: Real-time instant replay with slow motion

Futatabi (after the Japanese word futatabi, 再び, meaning “again” or “for the second time”) is a system for instant replay. Even though Futatabi is meant to be used with Nageru, shares some code with it and is built from the same source distribution, it is a separate application. Futatabi is meant for slow motion for e.g. sports broadcasts, but can also be used as a more generic multitrack recorder for later editing.

Futatabi supports interpolated slow motion, meaning you do not need to buy a high-speed 120 fps camera to get smooth 60 fps output after slowdown—Futatabi will automatically synthesize in-between frames for you during playback. Good interpolation, especially in realtime, is a difficult problem, and not all content will do equally well. Most content should do quite acceptably, especially considering that half of the frames will actually be originals, but you will need to see in practice what actually works for you. Interpolation can also be used for frame rate conversion (e.g. 50 to 59.94 fps).

Futatabi currently uses a GPU reimplementation of Fast Optical Flow using Dense Inverse Search (DIS) by Kroeger et al, together with about half of the algorithm from Occlusion Reasoning for Temporal Interpolation Using Optical Flow (to do the actual interpolation based on the estimated optical flow), although this may change in the future.

Since Futatabi is part of the Nageru source distribution, its version number mirrors Nageru. Thus, the first version of Futatabi is version 1.8.0, which is the Nageru version when it was first introduced.

System requirements

It is strongly recommended to run Futatabi on a separate machine from Nageru, not the least because you probably want a different person to operate the replay while the producer is operating Nageru.

Like Nageru, Futatabi uses your GPU for nearly all image processing. However, unlike Nageru, Futatabi requires a powerful GPU for interpolation; a GTX 1080 or similar is recommended for interpolated 720p60. (Futatabi was initially developed on a GTX 950, which is passable, but has little performance margin.) If you have a slower GPU and are happy with worse quality, or just wish to test, you can use a faster preset, or turn off interpolation entirely. Futatabi requires OpenGL 4.5 or newer.

For other required libraries, see Compiling; when you build Nageru, you also build Futatabi.

Getting started

Sample multicamera data

Good multicamera sample video is hard to come by, so it can be hard to test or train before an actual event. To alleviate this, I’ve uploaded some real-world video from the very first event where an early version of Futatabi was tested. (There are some issues with the JPEG quality, but it should largely be unproblematic.) You are free to use these for training or demonstration purposes. Do note that they will not be displayed entirely correctly in most video players (see Video format specification), although they will certainly be watchable.

There are two files:

  • Trøndisk 2018, finals only (MJPEG, 77 GB): The final match, in MJPEG format (about 30 minutes). This can be downloaded and then fed directly to Nageru as if it were a real camera stream (remember the –slow-down-input option).
  • Trøndisk 2018, entire tournament (H.264, 74 GB): The entire tournament, with no cuts (about 12 hours). However, due to space and bandwidth constraints, it has been transcoded to H.264 (with some associated quality loss), and needs to be transcoded to MJPEG before Nageru can use it.

Both files are mixed-resolution, with some cameras at 1080p59.94 and some at 720p59.94 (one even switches between matches, as the camera was replaced). They contain four different camera angles (overview camera on crane, detail camera in tripod, two fixed endzone overhead cameras) with differing quality depending on the camera operators. In short, they should be realistic input material to practice with.

Transferring data to and from Nageru

Video format specification

Futatabi expects to get data in MJPEG format only; though MJPEG is old, it yields fairly good quality per bit for an intraframe format, supports 4:2:2 without too many issues, and has hardware support through VA-API for both decode (since Ivy Bridge) and encode (since Skylake). The latter is especially important for Futatabi, since there are so many high-resolution streams; software encode/decode of several 1080p60 streams at the same time is fairly taxing on the CPU if done in software. This means we can easily send 4:2:2 camera streams back and forth between Nageru and Futatabi without having to scale or do other lossy processing (except of course the compression itself).

However, JPEG as such does not have any way of specifying things like color spaces and chroma placement. JFIF, the de facto JPEG standard container, specifies conventions that are widely followed, but they do not match what comes out of a capture card. Nageru’s multicam export _does_ set the appropriate fields in the output Matroska mux (which is pretty much the only mux that can hold such information), but there are few if any programs that read them and give them priority over JFIF’s defaults. Thus, if you want to use the multicam stream for something other than Futatabi, or feed Futatabi with data not from Nageru, there are a few subtle issues to keep in mind.

In particular:

  • Capture cards typically send limited-range Y’CbCr (luma between 16..235 and chroma between 16..240); JFIF is traditionally full-range (0..255 for both). (See also Synthetic tests and common problems.) Note that there is a special private JPEG comment added to signal this, which FFmpeg understands.
  • JFIF, like MPEG, assumes center chroma placement; capture cards and most modern video standards assume left.
  • JFIF assumes Rec. 601 Y’CbCr coefficients, while all modern HD processing uses Rec. 709 Y’CbCr coefficients. (Futatabi does not care much about the actual RGB color space; Nageru assumes it is Rec. 709, like for capture cards, but the differences between 601 and 709 here are small. sRGB gamma is assumed throughout, like in JFIF.)

Many players may also be confused by the fact that the resolution can change from frame to frame; this is because for original (uninterpolated) frames, Futatabi will simply output the received JPEG frame directly to the output stream, which can be a different resolution from the interpolated frames.

Finally, the subtitle track with status information (see Tally and status talkback) is not marked as metadata due to FFmpeg limitations, and as such will show up raw in subtitle-enabled players.

Monitoring

Tally and status talkback

Prometheus metrics