Futatabi: Real-time instant replay with slow motion

Futatabi (after the Japanese word futatabi, 再び, meaning “again” or “for the second time”) is a system for instant replay. Even though Futatabi is meant to be used with Nageru, shares some code with it and is built from the same source distribution, it is a separate application. Futatabi is meant for slow motion for e.g. sports broadcasts, but can also be used as a more generic multitrack recorder for later editing.

Futatabi supports interpolated slow motion, meaning you do not need to buy a high-speed 120 fps camera to get smooth 60 fps output after slowdown—Futatabi will automatically synthesize in-between frames for you during playback. Good interpolation, especially in realtime, is a difficult problem, and not all content will do equally well. Most content should do quite acceptably, especially considering that half of the frames will actually be originals, but you will need to see in practice what actually works for you. Interpolation can also be used for frame rate conversion (e.g. 50 to 59.94 fps).

Futatabi currently uses a GPU reimplementation of Fast Optical Flow using Dense Inverse Search (DIS) by Kroeger et al, together with about half of the algorithm from Occlusion Reasoning for Temporal Interpolation Using Optical Flow (to do the actual interpolation based on the estimated optical flow), although this may change in the future.

Since Futatabi is part of the Nageru source distribution, its version number mirrors Nageru.

System requirements

It is strongly recommended to run Futatabi on a separate machine from Nageru, not the least because you probably want a different person to operate the replay while the producer is operating Nageru. (See Working with your producer.)

Like Nageru, Futatabi uses your GPU for nearly all image processing. However, unlike Nageru, Futatabi requires a powerful GPU for interpolation; a GTX 1080 or similar is recommended for interpolated 720p60. (Futatabi was initially developed on a GTX 950, which is passable, but has little performance margin.) If you have a slower GPU and are happy with worse quality, or just wish to test, you can use a faster preset, or turn off interpolation entirely. Futatabi requires OpenGL 4.5 or newer.

For other required libraries, see Compiling; when you build Nageru, you also build Futatabi.

Getting started

Futatabi always pulls data from Nageru over the network; it doesn’t support SDI input or output. Assuming you have a recent version of Nageru (typically one that comes with Futatabi), it is capable of sending all of its cameras as one video stream (see Internal video format details), so you can start Futatabi with

If you do not have a running Nageru installation, see Sample multicamera data.

Once you are up and connected, Futatabi will start recording all of the frames to disk. This process happens continuously for as long as you have disk space, even if you are playing something back or editing clips, and the streams are displayed in real time in mini-display as they come in. You make replays in the form of clips (the top list is the clip list), which are then queued into the playlist (the bottom list). Your end result is a live HTTP stream that can be fed back into Nageru as a video input; by default, Futatabi listens on port 9096.

Futatabi has the concept of a workspace, which defaults to your current directory (you can change it with the -d option). This holds all of the recorded frames, all metadata about clips, and preferences set from the menus (interpolation quality and cue point padding). If you quit Futatabi and restart it (or you go down as the result of a power failure or the likes), it will remember all the state and frames for you. As a special case, if you give /dev/null in place of an input URL or file, you can keep using your workspace without getting any new frames added.

Basic UI operations

Nageru can be operated using either the keyboard and mouse, or using a MIDI controller with some help of the mouse. In this section, we will be discussing the keyboard and mouse only; see Using MIDI controllers for details on using MIDI controllers.

A clip in the clip list consists simply of an in and an out point; it represents an interval of physical time (although timed to the video clock). A clip in the playlist contains the same information plus some playback parameters, in particular which camera (stream) to play back.

Clicking the ”Cue in” button, or pressing the A key, will start a new clip in the clip list that begins at the current time. Similarly, clicking the ”Cue out” button, or pressing the S key, will set the end point for said clip. (If you click a button twice, it will overwrite the previous result.) Now it is in a state where you can queue it to the play list (mark the camera you want to use and click the “Queue” button, or Q on the keyboard—you can queue a clip multiple times with different cameras) for playing, or you can preview them using the “Preview” button (W on the keyboard). Previewing can be done either from the clip list the clip list or the playlist; they will not be interpolated or faded, but will be played back in the right

You can edit cue points, both in the clip and the playlist, in two ways: Either use the scroll wheel on the mouse, or hold down the left mouse button and scrub left or right. (You can hold down Shift to change ten times as fast, or Alt to change at one-tenth the speed.) You’ll see the new cue points as you change them in the preview window. You can also give clips names; these don’t mean anything, but can be good references to locate a specific kind of clip for later use. Once you’re happy with a playlist, and your producer is ready to cut to your channel, click on the first clip you want to play back and click the “Play” button (space on the keyboard); the result will both be visible in the top screen and go out live over the network to Nageru.

Controlling the playback speed

Most slow motion is at 0.5x speed (or equivalently, “2x slow motion”), and when you queue a clip from the clip list to the playlist, this is what it gets by default. However, you are free to change it to whatever you wish, like 0.333x, 0.25x (both are fairly common “super slow” standards) or even 1.0x. As long as you are reasonably close to a rational number with low integers (e.g. 1/2, 2/3, etc.), Futatabi will do its best to try to reuse as many original frames as possible, keeping quality and performance at their highest levels.

In addition to the per-clip speed, it is often interesting to change the speed during playback of a clip. For instance, you could keep normal slow motion (0.5x) speed in the run-up to a shot, ramp down to 0.1x to get a good look at the actual shot, and then ramp back once it’s done. When done right and not overused, this can create a dramatic effect that’s hard to replicate using constant slowdown.

To this effect, Futatabi supports a master speed control. It is found at the bottom of the window (or you can control it using a MIDI device); note that by default, it is locked at 100% until you click the lock button to unlock it. (This is particularly important when using a MIDI device, where it is very easy to touch a slider inadvertedly, and very hard to set it back exactly at 100%.) The master speed control is multiplied in on top of all other speed factors, so if you have e.g. a clip at 0.5x and the master speed is set to 70%, the clip will effectively play back at 0.35x. The master speed can be set between 10% and 200%, inclusive.

Note that the master speed control governs the speed of the output clock, unlike any other speed control in Futatabi. In particular, this means that unlike the regular clip speeds, it affects fade times; if fade time is at 0.5 seconds and master speed is set to 70%, the fade will take approximately 0.714 seconds (0.5 divided by 0.7). It also means that the “remaining time” displays will be wrong if master speed is not at 100%. This is because the master speed is by nature unpredictable (the user can change it at any time); one cannot e.g. delay fades when the master speed is reduced, since turning it back up would mean the start of the fade were simply missed. Similarly, it is impossible to give a proper estimate of time remaining that takes master speed into account; it would be overestimating time significantly, given that the operator is likely to turn it back up to 100% again soon.

Finally, note that when changing master speed, the speed is no longer at a rational, so most frames will be interpolated frames. If your GPU is not fast enough to interpolate every frame (ie., it is reliant on Futatabi’s usual reuse of original frames), it will drop output frames. Normal behavior will resume from the next clip, when the clocks will again go in lockstep (assuming the master speed is at 100% at that point). If you’re not ramping, or if you’re done ramping, it’s recommended to keep the speed lock on to avoid inadvertedly changing the speed.

Working with your producer

Generally, we recommend that the producer (Nageru operator) and slow motion operator (Futatabi operator) sit closely together and can communicate verbally. Good cooperation between the two is essential to get a good final product; especially the switches to and from the replays can be a critical point.

The general rule for working together is fairly obvious: The producer should switch to a replay when there’s something to show, and switch away when there’s nothing more to show (or, less ideally, when something live takes priority). Generally, when you have a playlist ready, inform your producer; they will count you in (three, two, one, go). At one, start playing so that you have some margin. If the Nageru theme is set up correctly (see Tally and status talkback), they will know how much is queued up so that they can switch back before the replay runs out, but it doesn’t hurt to give a warning up-front. The producer might also be able to request replays of specific events, or ask you to save something for later if they can’t show it right now (e.g. a foul situation that wasn’t called).

Audio support

Futatabi has limited audio support. It is recorded and saved for all inputs, and played back when showing a replay, but only when the replay speed is at 100%, or very close to it. (At other speeds, you will get silence.) Furthermore, there is no local audio output; the Futatabi operator will not hear any audio, unless they use a video player into the Futatabi stream locally (with associated delay). All of this may change in the future.

White balance

Futatabi integrates with Nageru for white balance; the white balance set in Nageru will be recorded, and properly applied on playback (including fades). Note that this assumes you are using the built-in white balance adjustment, not adding WhiteBalanceEffect manually to the scene; see White balance for an example.

Replay workflows

On top of the basics outlined in Basic UI operations, there are many possible workflows; we’ll discuss only two. Try out a few and see which ones fit your style and type of event.

Repeated cue-in

In many sports, you don’t necessarily know when a replay-worthy event has happened before it’s already happened. However, you may reasonably know when something is not happening, and it would be a good time to start a clip if something is happening immediately afterwards. At these points, you can make repeated cue-ins, ie., start a clip without finishing it. As long as you keep making cue-ins, the previous one will be discarded. Once you see that something is happening, you can wait until it’s done, and then do a cue-out, which gives you a good clip immediately.

For instance, in a basketball game, you could be seeing a series of uninteresting passes, clicking cue-in on each of them. However, once it looks like there’s an opportunity for a score, you can hold and see what happens; if the shot happens, you can click cue-out, and if not, you can go back.

Before playing the clip, you can make adjustments to the in and out points as detailed above. This will help you trim away any uninteresting lead-ups, or add more margins for fades. If you consistently find that you have too little margin, you can use the cue point padding feature (either from the command line using –cue-in-point-padding and –cue-out-point-padding, or set from the menu). If you set cue in point padding to e.g. two seconds, the cue-in point will automatically be set two seconds ago when you cue-in, and similarly, if you set cue out point padding, the cue-out point will be set two seconds into the future when you cue-out.

Instant clips

Like the previous section explained how you generally would know the start of an interesting event (at least if discarding most of the candidates), you would be even more sure about the end of one. Thus, you can wait until something interesting has happened, and then click cue-in immediately followed by cue-out. This will give you a clip of near zero length, ending at the right point. Then, edit this clip to set the starting point as needed, and it’s ready to play.

Again, you can use the cue point padding feature to your advantage; if so, your clips will not be of zero length, but rather of some predefined length given by your chosen cue point padding.

Sample multicamera data

Good multicamera sample video is hard to come by, so it can be hard to test or train before an actual event. To alleviate this, I’ve uploaded some real-world video from the very first event where an early version of Futatabi was tested. (There are some issues with the JPEG quality, but it should largely be unproblematic.) You are free to use these for training or demonstration purposes. Do note that they will not be displayed entirely correctly in most video players (see Internal video format details), although they will certainly be watchable.

There are two files:

  • Trøndisk 2018, final only (MJPEG, 126 GB): The final match, in MJPEG format (about 73 minutes). This can be downloaded and then fed directly to Futatabi as if it were a real camera stream (remember the –slow-down-input option).

  • Trøndisk 2018, entire tournament (H.264, 74 GB): The entire first part of the tournament, with no cuts (about 12 hours). However, due to space and bandwidth constraints, it has been transcoded to H.264 (with some associated quality loss), and needs to be transcoded to MJPEG before Nageru can use it.

Both files are mixed-resolution, with some cameras at 1080p59.94 and some at 720p59.94 (one even switches between matches, as the camera was replaced). They contain four different camera angles (overview camera on crane, detail camera in tripod, two fixed endzone overhead cameras) with differing quality depending on the camera operators. In short, they should be realistic input material to practice with.

Internal video format details

Futatabi expects to get data in MJPEG format only; though MJPEG is old, it yields fairly good quality per bit for an intraframe format, supports 4:2:2 without too many issues, and has hardware support through VA-API for both decode (since Ivy Bridge) and encode (since Skylake). The latter is especially important for Futatabi, since there are so many high-resolution streams; software encode/decode of several 1080p60 streams at the same time is fairly taxing on the CPU if done in software. This means we can easily send 4:2:2 camera streams back and forth between Nageru and Futatabi without having to scale or do other lossy processing (except of course the compression itself).

However, JPEG as such does not have any way of specifying things like color spaces and chroma placement. JFIF, the de facto JPEG standard container, specifies conventions that are widely followed, but they do not match what comes out of a capture card. Nageru’s multicam export _does_ set the appropriate fields in the output Matroska mux (which is pretty much the only mux that can hold such information), but there are few if any programs that read them and give them priority over JFIF’s defaults. Thus, if you want to use the multicam stream for something other than Futatabi, or feed Futatabi with data not from Nageru, there are a few subtle issues to keep in mind.

In particular:

  • Capture cards typically send limited-range Y’CbCr (luma between 16..235 and chroma between 16..240); JFIF is traditionally full-range (0..255 for both). (See also Synthetic tests and common problems.) Note that there is a special private JPEG comment added to signal this, which FFmpeg understands.

  • JFIF, like MPEG, assumes center chroma placement; capture cards and most modern video standards assume left.

  • JFIF assumes Rec. 601 Y’CbCr coefficients, while all modern HD processing uses Rec. 709 Y’CbCr coefficients. (Futatabi does not care much about the actual RGB color space; Nageru assumes it is Rec. 709, like for capture cards, but the differences between 601 and 709 here are small. sRGB gamma is assumed throughout, like in JFIF.)

  • The white balance (gray point) is stored in a minimal EXIF header, and echoed back for original and interpolated frames. (During fades, Futatabi applies white balance itself, and does not require gray point adjustment from the client.)

Many players may also be confused by the fact that the resolution can change from frame to frame; this is because for original (uninterpolated) frames, Futatabi will output the received JPEG frame directly to the output stream, which can be a different resolution from the interpolated frames.

Also, even though Futatabi exists to make a fixed-framerate stream out of something that’s not, the output stream can be variable-framerate (VFR) due to pauses and frame drops. In particular, when the stream is paused, frames are only output about every 100 milliseconds.

Finally, the subtitle track with status information (see Tally and status talkback) is not marked as metadata due to FFmpeg limitations, and as such will show up raw in subtitle-enabled players.

Using MIDI controllers

This section assumes you have already read the section about MIDI controllers in Nageru. MIDI controllers in Futatabi are fairly similar, but there are also some important differences, since they control replay and not audio:

  • There is no concept of a bus (there is only one video output channel). Thus, the concept of guessing also is obsolete.

  • Since there are no buses, there are also usually plenty of buttons and controls, rendering the bank concept less useful. It is supported, but activity highlights (to show which bank is active) are not.

  • Finally, outputs (controller lights and button lights) frequently have more than one state depending on the velocity sent, e.g. 1 for on and 2 for blinking. Thus, the Futatabi MIDI mapping editor allows you to change the note velocities from the default 1.

Futatabi has been tested with the Behringer CMD PL-1; it is not originally designed for slow motion (it is a DJ controller), but provides everything you need (a jog wheel, a slider that works as a T bar for master speed, and plenty of buttons) at a fraction of the price of a “real” slow motion remote. A sample MIDI mapping is included with Futatabi.

Futatabi currently does not support classic RS-422 controllers, only MIDI controllers.

Monitoring

Tally and status talkback

In addition to the verbal communication between the two operators (see Working with your producer), it is also useful to have automatic communication between Nageru and Futatabi. This goes both ways.

First, Futatabi can consume Nageru’s tally data like any camera can; give the correct channel URL to –tally-url on the Futatabi command line, and the color around the live view will show up when you are ready or playing (e.g. –tally-url http://nageru-server.example.org:9096/channels/2).

Second, Futatabi can export its current status in two ways. The simplest is through a REST call; the HTTP server that exposes the stream also exposes the endpoint /queue_status (e.g. http://futatabi-server.example.org:9096/queue_status). This contains the same text as is below the live window, ie., how much time is queued or left.

The same status is also exported programmatically in the video output from Futatabi, as a subtitle track. This allows the Nageru theme not only to display it if desired, but even to act automatically to switch to a different channel when the playlist is nearing its end. (See Ingesting subtitles for information on how to digest such information from the Nageru side.)

Each subtitle entry is displayed momentarily before the frame it belongs to, and has the following format:

Futatabi 1.8.2;PLAYING;6.995;0:06.995 left

The semicolon-separated columns are as follows:

  • The first column is the identifier for the Futatabi version in use; if the format should ever diverge between versions, it can serve as a way to distinguish between them if needed. (The version may also change without the format changing.) For now, however, you can ignore it.

  • The second column is either PLAYING or PAUSED, depending on the current status.

  • The third column is how many seconds is queued up (PAUSED) or remaining (PLAYING). It is always in C locale, no matter what the user has set (ie., the decimal point will always be a period). Note that this does not take into account the master speed (see Controlling the playback speed).

  • Finally, the fourth column is a human-readable string, the same as is exposed on the /queue_status endpoint above.

Prometheus metrics

Like Nageru, Futatabi supports a series of Prometheus metrics for monitoring; see Monitoring for general information. Futatabi provides entirely different metrics, though, mostly related to performance. There is no predefined Grafana dashboard available at the current time.