Video stream analysis in Jupyter Notebook

Create interactive plots of frame level metrics for encoded stream analysis, e.g. frame types and sizes, PSNR, SSIM, VMAF, and motion vectors.

5 min readJun 15, 2020

In my previous post (Encode videos from your browser with Jupyter Notebook), I explained the basics of deploying FFmpeg in a Jupyter Notebook for video compression. This time I’m going to walk you through another notebook for video stream analysis, to showcase what FFmpeg commands can be useful to generate metrics of interest, how to convert the console output to data types ready for visualization, and how to create interactive plots.

Launch the notebook

Starting from my Github repository (https://github.com/jina-jyl/jupyter), click the “launch binder” icon to create a Jupyter running environment for this repository, and select notebook “video-stream-analysis-demo.ipynb” to launch.

Alternatively you can download the repository and run a local Jupyter Notebook server. You may need to modify “util/load-ffmpeg.ipynb” for where to get the static FFmpeg build compatible with your host if you are not running Ubuntu.

Type and size of each video frame

The first piece of information I want to extract is the I/P/B type of each frame and how many bytes they take in the stream relative to others. This is done in cell 3 of the notebook, which first calls the following ffprobe command to get per frame metadata:

ffprobe -v error -select_streams v:0 -show_frames input.mp4

A sample output for one frame is like this:

[FRAME]
media_type=video
stream_index=0
key_frame=0
pkt_pts=495000
pkt_pts_time=5.500000
pkt_dts=495000
pkt_dts_time=5.500000
best_effort_timestamp=495000
best_effort_timestamp_time=5.500000
pkt_duration=3000
pkt_duration_time=0.033333
pkt_pos=379090
pkt_size=629
width=560
height=320
pix_fmt=yuv420p
sample_aspect_ratio=N/A
pict_type=P
coded_picture_number=165
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
color_range=tv
color_space=bt709
color_primaries=bt709
color_transfer=bt709
chroma_location=left
[/FRAME]

Then “parseOutput” parses the text output, extracts the values for “pkt_size” and “pict_type”, and converts them into Pandas DataFrame, which is ready to be drawn with Plotly.

The plot is interactive: zoom, pan, hover to show data point, download as PNG, etc.

It gives me a quick overview of the structure of the video stream: the GOP size, the number of I frames, the number of consecutive B frames, the dynamic range of bitrate over time, the estimated scene change points, the complexity of the content in different parts of the video, etc.

Video compression and frame level quality metrics

Moving on to cell 4–5, where we compress input.mp4 with libx264, calculate the quality metrics of the output video relative to the input, and plot the per frame quality metrics, including PSNR, SSIM, and VMAF.

The FFmpeg commands are like this:

# Video compression with constant quality rate control:
ffmpeg -v error -i {input_file} -crf 26 -y {output_file}# Use libvmaf filter to output frame level metrics to a file as JSON
ffmpeg -hide_banner -i output.mp4 -i {input_file} -filter_complex \
    libvmaf=psnr=1:ssim=1 -f null

Cell 5 loads the JSON output and plots the metrics again using Plotly.

The frame level visualization and interactive features make it very easy to spot quality ups and downs and find anomaly, e.g. the VMAF jump at frame 78.

Putting different metrics side by side also gives me a convenient way to find where the metrics don’t have consensus. That’s probably where I should pay more attention to.

Frame extraction and visual comparison

Since the objective metrics don’t always tell the whole story, we often need to zoom into a particular frame for subjective evaluation. That’s easy to do in Jupyter Notebook too.

Cell 6 calls FFmpeg to extract a given frame from the source video and the compressed video and displays the images for visual comparison.

Subjective quality comparison at a given frame

A couple of notes about Cell 6:

FFmpeg “select” filter expects frame index to start from 1, while all the other data structures and visualizations in this notebook count from 0. That’s why I put “{frame_index+1}” in the select filter to keep indexing consistent.
The browser usually caches the image files and may not show any update even the files on the server have changed. To work around that, I added the “foo={random}” query in the image URL so the latest version is always fetched from the server in every run for repeated experiments.

Visualize motion vectors

Another cool feature of FFmpeg is to visualize motion vectors and quantization parameters (MPEG codecs only, i.e. H.264/HEVC).

Debug/MacroblocksAndMotionVectors - FFmpeg

You can use ffmpeg or ffplay to analyze the macroblocks / coding tree units and motion vectors in a video file. The…

trac.ffmpeg.org

The command to overlay the motion vectors onto a video is like this:

ffmpeg -v error -flags2 +export_mvs -i {output_file} -vf codecview=mv=pf+bf+bb -y mv.mp4

In FFmpeg 3.x.x, one can also visualize block level quantization parameters which would be very useful for debugging encoding problems. The functionality is said to be moved to the “codecview” filter with “qp=true” flag. But I couldn’t get it to work in the latest stable version and more investigation is needed. A stackoverflow thread suggested https://github.com/slhck/ffmpeg-debug-qp to dump QP values. Maybe I’ll try it out next time.

Besides being a standalone analysis tool, the notebook is perfect for sharing and collaboration as well. The Jupyter service model removes the hassle of setting up local environment and fiddling with numerous command line options, so running the same notebook always gives you consistent results and make sure others on the project see exactly what you see.

About the author

I’m a full stack software engineer and entrepreneur with 10+ years of development and leadership experience in WebRTC, video encoding, mobile and Web technologies, and media centric machine learning algorithms. Previously I served as the CTO of Visionular and spent years at Google and Microsoft developing multiple consumer facing applications and large scale systems that serve billions of users. When I’m not coding, I like writing, reading, and learning random new skills.