A practical guide for VMAF
By Jina Liu, Visionular
VMAF has become a very popular video quality metric since it was released by Netflix in 2016 (Netflix blog post). Today you can hardly run into a video technology gathering without hearing people mentioning “VMAF” a dozen times in every talk. It provides a powerful and freely available tool to objectively evaluate the perceptual video quality, which is a long standing problem for researchers, video engineers, and everyone who cares about delivering high quality video experience to their customers.
In this blog post, I’m going to share some practical tips to help you start using VMAF more easily and efficiently.
Why VMAF? Are PSNR/SSIM out of date? What about deep video quality metrics?
Before I dive into the “how to” part, you may still wonder:
Why should I care about yet another video quality metric?
Isn’t it good enough to stick to old school PSNR, SSIM, or some variation of SSIM?
Or if you keep an eye on computer vision or signal processing research fields, what about the new and shiny deep neural network based image/video quality metrics that come up every month?
I don’t intend to write a lengthy literature review of video quality metrics since there are many great resources on the Internet for interested readers to explore (Search “video quality metrics review” on Google), but here are some key elements behind the wide adoption of VMAF and why you should consider it too:
- To win the video quality metric war, “best” is a trade off between accuracy and complexity. PSNR doesn’t align with human perception in many cases, but is deadly simple to understand and implement. SSIM is the most successful and commonly acknowledged perceptual metric before VMAF. Many variations of SSIM exist to add various improvements as well as complexity to the original one, but none achieved similarly well reception as the original SSIM. VMAF is more accurate by combining several traditional perceptual metrics in a data driven fashion (which is a double bladed sword), and yet more practical than most deep neural network based algorithms.
- Netflix leads the open source project and actually uses VMAF to optimize their production video encoding. The reliable support, high quality codebase, and well-built tools and libraries level the ground for small players to enter and quickly realize the benefits.
- One is nice, two is nice, three is better……When it comes to video quality assessment, no single metric is perfect. It’s often beneficial to look at multiple metrics at the same time.
- Since everyone else is using it, how could it be wrong?……just kidding.
How to get started?
Now you cannot wait to get your hands dirty.
If you start with the VMAF Github project page, you may feel confused with the different ways of running VMAF. My favorite is FFmpeg. It’s fast, simple, flexible with input format, and can convert resolution and frame rate in one command line.
Unfortunately the official FFmpeg build doesn’t include the VMAF filter and if you, like many people, have never built FFmpeg from source, it may look intimidating.
But no worries! Here is a step-by-step guide to start testing VMAF with FFmpeg on Linux and MacOS. For Windows, I recommend using Windows Subsystem for Linux, or a Linux container inside Docker, then the same steps apply too.
Step 1: Install VMAF enabled FFmpeg
- Install Homebrew. If using Docker, you can directly run the official image:
docker run -it linuxbrew/linuxbrew
brew tap homebrew-ffmpeg/ffmpeg
- Install VMAF enabled FFmpeg. This may take a while to finish as it compiles FFmpeg from scratch.
brew install homebrew-ffmpeg/ffmpeg/ffmpeg —-with-libvmaf
- Verify your FFmpeg installation by running
ffmpeg -version
. You should find--enable-libvmaf
in the output message.
Step 2: Download the model files
Download these two model files from VMAF GitHub page and save to /usr/local/share/model:
mkdir -p /usr/local/share/modelcurl https://raw.githubusercontent.com/Netflix/vmaf/master/model/vmaf_v0.6.1.pkl.model -o /usr/local/share/model/vmaf_v0.6.1.pkl.modelcurl https://raw.githubusercontent.com/Netflix/vmaf/master/model/vmaf_v0.6.1.pkl -o /usr/local/share/model/vmaf_v0.6.1.pkl
Now you are ready to go!
Since VMAF is a full reference video quality metric, it represents the quality difference between two videos: an original video (e.g. original.mp4) and a distorted video (e.g. distorted.mp4).
Most basic command
In the most basic use case, the command to calculate VMAF is like this:
ffmpeg -i distorted.mp4 -i original.mp4 -filter_complex libvmaf -f null -
Tip: The order of inputs matters! The distorted video must be the first input and the original video the second, otherwise the resulting score won’t be accurate.
The VMAF score will be printed at the end like this:
[libvmaf @ 0x1b5b700] VMAF score: 99.055347
When resolution or frame rate change is involved
If the distorted video has a different frame resolution than the original video, running the command above will cause such an error:
[Parsed_libvmaf_0 @ 0x2192ac0] Width and height of input videos must be same.
The solution is to scale the distorted video to match the original resolution, say 1920 x 1080:
ffmpeg -i distorted.mp4 -i original.mp4 -filter_complex “[0:v]scale=1920:1080[distorted];[distorted][1:v]libvmaf” -f null -
Note the quotes around the filter_complex option value. They are necessary to avoid parsing errors.
A more subtle mistake happens when the distorted video has a different frame rate than the original. FFmpeg will not complain about it, but the resulting score will be off.
This is because VMAF first calculates the frame by frame difference and then aggregates the per frame scores. It doesn’t look at the timestamps and will match the wrong pair of frames if the input videos are not in sync.
The solution is to “normalize” the frame rate with the “framerate” filter:
ffmpeg -i distorted.mp4 -i original.mp4 -filter_complex “[0:v]framerate=30[distorted];[1:v]framerate=30[ref];[distorted][ref]libvmaf” -f null -
Similar care needs to be taken for other kinds of mismatch between the distorted and the original, e.g. rotation, time shift, watermark, etc. Each case may require a different FFmpeg filter to be added to the chain to “counter” the mismatch. Yes, FFmpeg has a filter for almost every video editing need you may have or imagine. That’s why I recommend starting with FFmpeg for VMAF experimenting in this post.
More tips
FFmpeg VMAF filter supports several options which you can find details from FFmpeg documentation. The options I find particularly useful are:
n_subsample: Set interval for frame subsampling used when computing VMAF.
You probably already notice that calculating VMAF is pretty slow. For me it runs at ~10fps for 1080P video on a fairly powerful workstation.
Setting n_subsample to N>1 can achieve significant speed up as the amount of frames being processed is reduced to 1/N. It may hurt accuracy to different extent, but in most cases it’s probably worth the time saving.
log_path: Set the file path to be used to store logs.
This option conveniently saves the per frame VMAF score to the log file, which is very useful for investigating quality issues.
ssim/psnr: Enables computing PSNR/SSIM along with VMAF.
When used along with log_path, you can further plot the PSNR/SSIM/VMAF time series and see when they agree and disagree.
Thanks for reading along! Now you have the basic gears to play with VMAF. Next we’ll discuss more about how VMAF matches/mismatches human perception for real life use cases in a future post. Stay tuned!