This article was written in March 2023. At the time of writing, the encoders we used were still in active development. As codec standards and encoders evolve, some of the information may become outdated.
A Turning Point for Codecs
For the longest time, MPEG LA’s Advanced Video Coding (AVC), or H.264, has been everyone’s go-to video codec. Thanks to Cisco’s OpenH264, software projects no longer have to worry about patents as Cisco foots the bill for royalties. However, AVC is showing its age after 20 years, as it struggles to handle resolutions beyond 1080p, while maintaining low bitrates.
AVC’s official successor – High Efficiency Video Coding (HEVC), or H.265 – is not enjoying the same level of popularity. Video platforms like YouTube refuse to adopt HEVC due to licensing fees. On the other hand, open-source software projects, such as Mozilla Firefox, deliberately choose not to implement HEVC to avoid potential legal trouble. Although HEVC was published almost a decade ago, AVC still dominates the codec market to this day.
In 2015, an industry consortium led by Google – the Alliance for Open Media (AOMedia) – decided to develop their own open, royalty-free alternative to HEVC – AOMedia Video 1 (AV1). AV1 was initially planned to release in 2017, but the world of codecs moves slowly, and mainstream mobile platforms still lack hardware decoding capabilities as we speak.
With its adoption within grasp, the hype for AV1 has never been higher. Some expect AV1 to follow AVC’s path, but its fate may not be as clear-cut. Back in 2015, HEVC was AV1’s competitor, but HEVC’s successor – Versatile Video Coding (VVC), or H.266 – was released in 2020. Meanwhile, MPEG LA is developing two other codecs in parallel: Essential Video Coding (EVC) and Low Complexity Enhancement Video Coding (LCEVC). A subset of EVC – Type 1 – only uses patents within the public domain and thus is royalty-free. LCEVC, on the other hand, is not a standalone codec but an innovative enhancement layer. LCEVC enhances a base codec by compressing lost details and compensating for compression artifacts. LCEVC has the potential to re-vitalize older codecs, such as AVC, and bring their relevance back into the 4K arena.
We’re at a turning point for codecs, and it begs the question: born in between MPEG LA’s two generations of codecs, how does AV1 perform? This is a huge topic, so let’s narrow it down a bit. There are already many good writings on how codecs work, and I’m not really qualified to comment on the legal side without an LL.B. Therefore, we’re exclusively taking a look at the technical performance of HEVC, AV1 and VVC, namely compression efficiency and computational complexity. Both EVC and LCEVC are currently in heavy development, and real-world applications are difficult to find. So we’ll exclude them in today’s discussion. I hope you are not already lost from too many acronyms (TMA), and I promise you it’s not getting much better from here on.
An Opinionated Methodology
Codec testing is a minefield. Each codec has many encoders, and each encoder has dozens of parameters with dialectical relationships with both speed and quality. It is certainly ideal to cover the common configurations, but testing combinations of too many variables quickly becomes convoluted.
On the other hand, visual quality is perceptual and even up to individual taste. For example, some may consider movies without film grain clearer and thus higher quality, while others argue staying true to film – one hallmark of visual quality – means retaining the grain. Therefore, the best way to assess visual quality is to conduct in-person subjective assessments where real human judges by their perception. However, we can’t afford such a massive project, and algorithmic metrics come with its own set of pitfalls.
Therefore, the testing methodology is (highly) opinionated, as we have made a number of decisions based on what we think is best. Every time I read an article on codec testing, I almost always get frustrated by questionable choices left and right, and leave with dozens of questions left unanswered. Let me save you from the same frustration by walking through the decisions we’ve made. You may not agree with our methodology (especially if you hang out on Doom9), and you may have good reasons for doing so, but at least you will have an explanation of why we’ve done what we’ve done.
For HEVC, we use the most popular open-source implementation
x265 3.5+95. For AV1, we have picked
SVT-AV1 1.4.1 as our encoder as it strikes a balance between quality and encoding speed, but VideoLAN’s
dav1d 1.1.0 is the decoder of our choice as it’s a faster implementation. For VVC, we use Fraunhofer’s Versatile Video Encoder
VVenC 1.7.0, an extensive optimized encoder based on the reference design
VTM, and its sibling decoder
We also use
FFmpeg to assist with our testing. VVC is not implemented by
FFmpeg at the moment, so we use the VVC v6 patch to provide such support. Both encoding and decoding are ran on an 8-core, 16-thread KVM on the second CCD of an AMD Ryzen 7950X.
We use Blender’s third “open movie” Sintel as our source material. The specific copy we used is an AVC-encoded 4K version available for download here. As mentioned, visual quality is ultimately up to individual perception. Sintel is licensed under the Creative Commons Attribution 3.0 licence, which allows us to share the encoded result for you to judge yourself.
Software encoding can be painstakingly slow, and we have only tested this one clip as it already took weeks to process. To make the problem worse, no comparison can represent all use cases. Please note the limitation that if your source material is drastically different from a typical film such as Sintel, you might produce different results from ours.
We have decided to test the aforementioned codecs for archival purposes. As no mainstream system can software encode any of the three codecs on the fly, it is difficult to test them for streaming purposes. Perhaps we can come up with a follow-up article to dive into common GPU hardware encoder implementations. Let us know if you’re interested.
Video archiving leads to two major assumptions: one, archival generally aims for the “best” subjective quality at the smallest file size; two, encoding time doesn’t matter. These two assumptions will later help us decide our encoding parameters.
SVT-AV1, we use Constant Rate Factor (CRF) for rate control. In short, CRF is optimal for archival as it ensures consistent quality for a given file. We start our testing at the default CRF while controlling the average VMAF score between 91 and 96.
VVenC is still in heavy development and doesn’t currently support CRF. Therefore, we use the next best option – two-pass variable bitrate (VBR) – for VVC. In this case, we start encoding at 1000 kbps, and increase the bitrates by 500 kbps until we reach an average VMAF of 96. During the first pass, the encoder dynamically allocates bitrates based on scene complexity. Then the encoder uses the collected information to encode the video for the second pass.
Each encoder has dozens of parameters, and different source materials can benefit from different tunings to achieve optimal compression efficiency and quality. However, since we’re comparing codecs rather than encoder settings, we assume the encoder presets are sane and have decided to use the presets. Since encoding time doesn’t matter, we use the slowest preset for each encoder to achieve best compression efficiency.
SVT-AV1, the slowest preset is 0; for
VVenC, the slowest is “slower.” The slowest preset for
x265 is “placebo,” but we use “veryslow” instead. As the name suggests, the improvements from “veryslow” to “placebo” is negligible, but the encoding time is a lot longer as “placebo” has reached past the point of diminishing returns.
Beyond presets, we choose 10-bit colour outputs as it generally improves compression efficiency and reduces artifacts at the expense of encoding time.
x265 disables AVX-512 support by default due to clock speed regressions on Intel Skylake-X systems. Since we’re using an AMD system, we manually enable AVX-512 to speed up encoding.
You can find the
FFmpeg commands as follows:
ffmpeg -i reference.mkv -map 0:v -c:v libx265 -crf $CRF -preset veryslow -pix_fmt yuv420p10le -x265-params asm=avx512 -map 0:a -c:a copy hevc-$CRF.mkv
ffmpeg -i reference.mkv -map 0:v -c:v libsvtav1 -crf $CRF -preset 0 -pix_fmt yuv420p10le -map 0:a -c:a copy av1-$CRF.mkv
ffmpeg -i reference.mkv -map 0:v -c:v libvvenc -b:v $bitrate -preset 4 -vvenc-params passes=2:pass=1:rcstatsfile=vvc-$bitrate-stats.json -f null /dev/null
ffmpeg -i reference.mkv -map 0:v -c:v libvvenc -b:v $bitrate -preset 4 -vvenc-params passes=2:pass=2:rcstatsfile=vvc-$bitrate-stats.json -map 0:a -c:a copy vvc-$bitrate.mkv
We use Netflix’s Emmy-winning Video Multimethod Assessment Fusion (VMAF) to assess visual quality, as it’s the best available tool at the moment. VMAF uses machine learning to predict subjective video quality, and we use the
vmaf_4k_v0.6.1 model specifically trained for 4K. As discussed, visual quality is subjective. But for the sake of quantitive analysis, we target at an average VMAF score of 95 for our final encode as the “best” quality.
However, VMAF is not perfect, and solely relying on algorithms is problematic, as machines ultimately perceive visuals differently from humans. Therefore, we will also take a closer look at the final encodes, and compare our findings with VMAF scores.
The VMAF scores show a clear advantage for VVC, delivering the highest quality at any given bitrate. HEVC and AV1 traded blows, with AV1 performing better at lower quality between VMAF 91 to 94, and HEVC performing better at higher quality between VMAF 94 to 96. But at the target VMAF 95, HEVC’s delivered an extra 5.6% bitrate savings than AV1.
|Encoding Time||Decoding Time|
|HEVC||2 Hours 34 Minutes||1 Minute 14 Seconds|
|AV1||18 Hours 54 Minutes||1 Minute 42 Seconds|
|VVC||2 Days 10 Hours 35 Minutes*||2 Minutes 42 Seconds|
Here, we expectedly see a geometric increase in computational complexity in newer codecs. Although we say encoding time doesn’t matter, AV1 is 7.5x slower than HEVC, and VVC took literal days to encode a 14-minute clip using the slowest preset. The large discrepancy will ultimately affect decisions on codec selection and encoder tunings in real-life applications.
We put an asterisk behind VVC’s encoding time because we have found
VVenC struggles to scale beyond 8 threads despite running on an 16-thread machine. I’m confident that future optimizations can drastically improve encoding time on high core-count machines.
The decoding speed is reasonable for all 3 codecs, but the tests were ran on an 8-core desktop system. In the age of mobile device proliferation, hardware-accelerated decoding is critically necessary for a codec’s adoption. Hardware acceleration will not only help mobile devices conserve battery life, but also ensure good playback performance on TVs and set-up boxes using similar SOCs.
Visual Quality at VMAF 95
According to VMAF, newer codecs produce less consistent quality across frames. The most consistent quality came from
x265 with the lowest 1% score of 84.67. Meanwhile, the 1% low for AV1 is 81.28, and 79.26 for VVC.
A Closer Look: Adaptive Quantization
VMAF is suggesting newer codecs have better compression efficiency, but visual quality is less consistent across frames. However, we were using CRF and 2-pass VBR for rate control, which should give us consistent quality throughout a video. What’s happening here?
If you dig through the manuals, you’ll see all three encoders have “benchmark modes,” which disable psycho-visual optimizations in exchange of higher algorithmic metrics scores, including VMAF. We deliberately choose not to use benchmark modes because we’ve decided to aim for the “best” subjective quality, rather than replicating exact copies of every frame. This also gives all three encoders an equal starting point as we’re not interested in which encoder is best optimized for benchmarking.
But what are the psycho-visual optimizations? Why do encoders and VMAF interpret visual quality so differently? Let us take a closer look at a specific frame then. According to VMAF, frame 11506 is VVC’s worst frame, which also happens to be HEVC’s 9th worst frame. How does it look?
We see drastic differences between three codecs in frame 11506. HEVC retained most details, and VVC is outright blurry in some areas. The screenshots do corroborate VMAF’s frame-by-frame analysis, but it raises another question. Videos are motion pictures. This frame comes from a fast, panning transition scene, and it’s only shown for 1/24 of a second. How does the scene look like when it’s playing?
With the fast camera movement, the difference in visual quality is arguably negligible across all codecs despite the degradations in individual frames. I want to point out that we – including you, my dear readers – are staring at and scrutinizing the tree, but the average viewer likely isn’t.
Earlier, we mentioned that it is problematic to solely rely on VMAF, which penalizes psycho-visual optimizations. One of these optimizations is Adaptive Quantization (AQ). When watching motion pictures, the human eye is more sensitive to certain regions of a frame. For example, we’re more drawn to flat and low-texture regions – usually foreground – than the background high-texture regions. We’re also more drawn to high-motion areas than low-motion areas. In Sintel‘s frame 11506, the tree is not only the high-texture background, but also moves quickly to give way to the protagonist in the foreground. AQ redistributes more bits to where the human eye is drawn to, and therefore allows encoders to vary compression within a frame to improve subject visual quality at the expense of background details.
This is why VMAF shows less consistent quality in newer codecs. Although VMAF is a state-of-the-art machine learning algorithm, it ultimately looks at motion pictures differently from humans. We watch videos, but VMAF watches frames.
Result 2: Electric Boogaloo
Earlier, we compared VMAF results between HEVC, AV1 and VVC. Although VMAF is a useful metric, we now know it doesn’t accurately represent perceptual visual quality. If we were to correct for VMAF’s biases, at the same average score, newer codecs would produce higher perceptual quality overall due to improvements in Adaptive Quantization. Therefore, AV1 would be generally on par with, and sometimes surpass HEVC in terms of compression efficiency. VVC would still be the best codec among the three.
There are also some real-life implications from our results. If you want to retain as much detail as possible, then you might want to use an older codec, such as AVC, and turn off psycho-visual optimizations. For home labs, HEVC is probably the best codec of the time since royalties are rarely relevant for personal use. Although the compression efficiency isn’t the best, HEVC enjoys significantly faster encoding speed and widespread hardware decoding support. VVC theoretically has the best compression efficiency, but at the time of the writing, we could not find any video player that supports it out of the box, let alone any hardware acceleration. VVC is still in heavy development, but it looks good for the future.
Born in between MPEG LA’s two generations of codecs, AV1’s biggest advantage is that it’s royalty-free, which makes it attractive for commercial usage, but everything is almost always more complicated than it seems. Sisvel, a patent licensing outfit, has already started a patent pool for AV1. Although AOMedia published a statement to establish a legal defence program against patent claims, no one can predict what the future holds.
As previously mentioned, I don’t feel qualified to make legal commentary as I don’t have a law degree, but I’m allowed to complain. Although HEVC is on par with AV1 in terms of compression efficiency, AV1 is so much more computationally complex, and thus energy inefficient, because AOMedia actively avoids patents that they don’t possess. VVC faces the same patent problem as HEVC, and even AV1’s royalty-free status needs an asterisks at the end.
It’s truly a shame. In the field of multimedia, almost all important techniques and formats are covered by broad and trivial patents. If HEVC isn’t encumbered by patents and royalties, we’d have been in the HEVC-era by now with significant bandwidth and storage savings. Valuable engineering resources can be put into improving existing codecs for everyone, instead of painstakingly creating new ones that might ultimately also be embroiled in patent disputes. As a fan of free, libre software, I feel strongly that patents shouldn’t be exploited to impede progress and innovation in the pursuit of profit. The future looks bright when we look at the technology itself, but the future is also dishearteningly complicated.
Oh, before I forget, here are the promised downloads: HEVC | AV1 | VVC | VMAF results.
If you like our articles and journalism, and you want to support us in our endeavors, then consider heading over to our Patreon or our PayPal if you want to toss a few bucks our way. If you would like to talk with the Chips and Cheese staff and the people behind the scenes, then consider joining our Discord.
15 thoughts on “Codecs for the 4K Era: HEVC, AV1, VVC and Beyond”
Can you add an expand/zoom action to figures 6/7/8? They are way too small to see any differences with the slider. I can right-click and fetch the full resolution BMP files but then I lose the slider.
If you use svt-av1’s -preset 0, you cannot then complain about how slow it is.
From my own analysis, LCEVC is the real surprise as a “Enhancement CODEC” that only encodes the missing data when you upscale from a base resolution.
VVC is awesome, but compute intensive. However it’s well worth it.
I don’t think AV1 is out of the legal frying pan, especially with the EU doing Anti-Trust investigation over it along with Sisvel. And no Sisvel isn’t a “Patent Troll” like others would have you believe.
MPEG-5 Part 1 EVC – Baseline
Now that’s the real intersting one since it’s all expired Patents that have hit Open Domain.
That can do ALOT for many companies / StartUps who don’t want to worry about Patent litigation. It’s still way better than anything h.264 has and the compute resources needed to encode EVC-Baseline is very reasonable.
For many companies who don’t have that kind of crazy $$$ for licensing or patents, I think EVC-Baseline is the safest option on a “Legal standing” along with being light on computing resources.
I’ve wanted to test LCEVC. Unfortunately, it’s a proprietary codec developed by V-NOVA, and no free encoders are available. The licensing term capped annual fees at $3.7 million, which is quite low comparing to other codecs. If the quality is good, I can foresee why some large broadcasters might use it, especially given that it can leverage on existing royalty-free codecs like AVC.
I did play with EVC a little. It was obvious to me that it’s not as mature as other codecs here, so I didn’t include it in our discussion. The baseline profile performs better than AVC and sometimes comparable to HEVC. It’s expected since it’s designed to be royalty-free, and we’ll probably see future improvements as encoders mature.
Looking at the more distant future, Enhanced Compression Model reportedly has 30% BD rate savings over VVC, and Neural Network Coding looks interesting too. AOM is also working on their next generation. The future certainly doesn’t lack for choices.
If you look at Streaming Media’s test results:
EVC – Baseline is basically one step underneath stock h.265 HEVC
But what really peaked my interest is how fast EVC – Baseline could be encoded with regular old CPU encode.
Imagine how efficient / low energy a true ASIC encoder/decoder for EVC – Baseline would be.
And for normal folks who don’t have great CPU’s or GPU’s. EVC – Baseline should be easy to get going on many people’s computer.
Don’t let Low Compute Difficulty fool you. EVC – Baseline could be the next great CODEC for many folks, especially on the Mobile side or on lots of simple cheapo Camera’s / Action Camera’s who want a low power ASIC Encoder/Decoder that is cheap, royalty free, and does better than H.264, but isn’t quite up to H.265.
AV1 is still mired in legal quagmire. And if that’s not enough, it’s encoding complexity is slightly worse than stock H.265.
While the best File Compression comes from VVC.
Stack LCEVC + VVC, you have a file size savings winner = reduced bit-rate necessary for high resolution files = less bandwidth spent.
Less Bandwidth spent = Huge win for companies who are worried about their bandwidth bills.
And after looking at the test results for EVC – Baseline encoding. The “Slow Setting” is good enough, it gets you most of the way there on VMAF w/o climbing up the exponential Hockey Stick graph for encode times.
Not sure if I’m doing something wrong but I’m getting a vmaf of 93.622597 for hevc. av1 is 94.970557 as expected.
Hi, I re-ran the test and was able to reproduce our result. We used the “vmaf_4k_v0.6.1” model tuned for 4K content, instead of the default “vmaf_v0.6.1” model, which is tuned for 1080p. As human inevitable makes mistakes, we always appreciate someone checking our work though!
I’m definitely using the 4k model, the av1 output matches av1-47.
Here are the commands I used:
ffmpeg -i Sintel.2010.4k.mkv -c:v rawvideo -pix_fmt yuv420p10le reference.yuv
ffmpeg -i hevc.mp4 -c:v rawvideo -pix_fmt yuv420p10le hevc.yuv
vmaf -r reference.yuv -d hevc.yuv -w 4096 -h 1744 -b 10 -p 420 -m path=vmaf_4k_v0.6.1.json -o hevc.xml –threads 8
Also tried ffmpeg filter which gives the same result:
ffmpeg -i hevc.mp4 -i Sintel.2010.4k.mkv -lavfi “[0:v]settb=AVTB,setpts=PTS-STARTPTS[main];[1:v]settb=AVTB,setpts=PTS-STARTPTS[ref];[main][ref]libvmaf=model=path=vmaf_4k_v0.6.1.json:log_path=output.xml” -f null –
Hi, hav. I doubled checked our results and reproduced our score 94.918230 yet again, which corresponds to the “hevc-34” file in our result bundle. I used ffmpeg 5.1.3 compiled with VMAF 2.3.1 with the command “ffmpeg -i hevc.mp4 -i Sintel.2010.4k.mkv -lavfi libvmaf=model=version=vmaf_4k_v0.6.1 -f null -“. I’m puzzled by your results. If it’s possible, could you please upload your results to a pastebin so I can look into it further? Thanks!
Ah 5.1.3 is the answer, I managed reproduced the score with that. I was using a self compiled build with VMAF 2.3.1, though testing 6.0 and a nightly build both give 93.622597 too. Seems like 5.1.3 and 6.0 give different scores for some reason.
5.1.3 hevc (94.918230): hevc-34
5.1.3 av1 (96.079884): http://0x0.st/H8lz.3.xml
6.0 hevc (93.622597): http://0x0.st/H8li.0.xml
6.0 av1 (94.970557): av1-47
Tested using these builds: https://github.com/BtbN/FFmpeg-Builds/releases/tag/autobuild-2023-04-19-14-14
VMAF version is the latest commit instead of 2.3.1 but that doesn’t seem to make a difference.
I believe we’re looking at the effect of decoder differences between versions. To incorporate the VVC patch, we did not use the vanilla ffmpeg 5.1.3, but rather a version in between 5.1.3 and 6.0 to avoid any unintended consequences of rebasing. As we disclaimed in the beginning of the article, the information is only accurate as of March 2023, and decoders still have room for future improvements. We released the encoded videos for the sake of transparency, and for any interested readers like you to make their own subjective assessments.
If you’re a Linux user, you can find a statically linked build of the ffmpeg we used here: https://chipsandcheese.com/wp-content/uploads/2023/04/ffmpeg_n-109934-g891ed24f77_x86_64-linux-gnu.zip
I don’t think AQ is the only (or even the most important) reason for “less consistent quality across frames”. The CRF itself gives less importance to fast moving scenes and scene transitions. CRF (and AQ) are the default, suggested setting for the “final” movies, viewed in continuous manner. If video has to be compressed (for storage and/or bandwidth reasons), but is used in selective manner, e.g. extracting frames (say in security or research), viewing details of frames (capturing/printing a still frame) and/or further editing, QP (-rc constqp) is probably better. Consider that in security/research we probably ARE interested in face/number plate/other details that were visible just in the split-second. Constant QP should also probably score better with VMAF as much more “consistent quality across frames”.
As such I can see two possibly valuable (and, likely, very fast to process) additions to the above great analysis:
1) showing on the charts (“Compression Efficiency” and “Visual Quality at VMAF 95”) effect of compressing the video with constant QP, and I guess just HEVC would be enough to prove/disprove the idea;
2) adding to the charts HEVC/CRF compressed with a good hardware encoder (say, 2×00 or newer nVidia card), as I think that with widely available hardware encoders and decoders, even most professional users are not thrilled with, say, “5.6% bandwidth savings” if it means 5-10x encoding time.
Anyway, thank you for the great job with the site, both this piece and all (micro)architectural ones.
You’re right. AQ is not the only reason for “produc[ing] less consistent quality across frames,” and I didn’t write the transition exceptionally well between the two sections. I wanted to showcase why VMAF is a flawed metric (but also a necessary evil, so to speak), and I’ve always found AQ to be such a cool concept so it became my example. This is something I will keep in mind for my future writings.
As mentioned in the article, the methodology is (highly) opinionated, and I’m sure many readers would disagree with me. For a technically accurate comparison, ideally we want the same parameters for all encoders, such as keyframe intervals. Unfortunately, once we go down this rabbit hole, not only will we face murky questions and questionable answers (e.g., which AQ mode in x265 is comparable to which in SVT-AV1?), but we also will end up with a funky combination of parameters that no one will actually use in real life. However, what I wanted is to make a comparison with practical implications. When an average user encodes a video for archival, they usually use one of the presets, sometimes with modifications. Similarly, I expect an average users to want the best subjective visual quality, and thus CRF mode is used.
It’s certainly a cool idea to make a more technically correct comparison with CQP. There is already an article on hardware encoders (https://chipsandcheese.com/2022/03/30/gpu-hardware-video-encoders-how-good-are-they/), and I’m sure at a certain point we will release an update to it, given how fast the hardware world is moving. Thanks for the comment!
I found the bug, https://github.com/Netflix/vmaf/issues/1161
libvmaf is broken on ffmpeg 5.1.3 for 10-bit videos.
The av1 scores seem to be using a fixed build. I couldn’t check vvc, the download seems broken?
Hi hav, thanks for this. I’ll do a retest for the chart. I didn’t save all the encoded files, so re-encoding will take considerable time. For VVC, ffmpeg doesn’t officially support it yet. You can use our provided binary in the previous comment, or build a patched version yourself with VVdeC. Here’s Fraunhofer instructions: https://github.com/fraunhoferhhi/vvenc/wiki/FFmpeg-Integration