This website might earn affiliate commissions from the hyperlinks on this web page. Phrases of use.

A bit over a yr in the past, I began experimenting with video restoration and AI upscaling for my Deep House 9 Upscale Venture. At the moment, I’d like to speak concerning the benchmark I’ve constructed as a part of these efforts and what kind of fascinating issues it may inform us about ultra-high-end workstation efficiency. Such discussions aren’t a lot enjoyable with out sensible {hardware} to play with, so we’ll even be analyzing how efficiency in our new check scales between an AMD Ryzen Threadripper 3990X with 64 cores and 4 RAM channels, and a Ryzen Threadripper Professional 3995WX-equipped Lenovo ThinkStation P620 workstation with the identical 64 cores and eight RAM channels.

The Lenovo P620, exterior view. There’s a deal with on the entrance for simple carrying.

Spoiler Alert: One of many causes I’ve written this text is to reveal simply how a lot firepower a contemporary top-end x86 system can carry to media transcoding workloads within the first place. The general high quality of AI upscaling continues to enhance and followers of my Deep House 9 Upscale Venture ought to know I’ll have extra to say about it within the close to future.

Up to now, I’ve relied on Handbrake to seize transcoding efficiency, however there are extra versatile instruments obtainable with a wider vary of options. I experimented with utilizing Handbrake as a processing step in my analysis over the past 15 months earlier than deciding different instruments have been a greater match for what I needed to do. TRACBench’s design — the primary 4 letters stand for TRanscoding, Ai, and Conversion — displays what I’ve discovered about scaling these workloads throughout a big array of cores.

TRACBench 0.1 makes use of SD-quality interlaced footage as an preliminary supply. Whereas AI scaling functions like Topaz are able to upscaling 720p or 1080p footage, 360p and 480p footage are extra simply processed in an affordable period of time.

Transcoding: This step makes use of StaxRip as a front-end for AviSynth and deinterlaces the footage utilizing QTGMC. TRACBench 0.1 makes use of the identical settings revealed right here and is constructed round StaxRip 2.1.3.Zero with AviSynth+ 3.6.1. StaxRip is run in parallel utilizing a number of situations of the identical utility. StaxRip is configured to permit as much as eight parallel processes per utility occasion and Prefetch(8) was utilized in every AviSynth script. We check as much as 16 simultaneous encodes to load all 128 threads of the Ryzen Threadripper 3990X and Threadripper Professional 3995WX. The Ryzen 9 5950X can not maintain so many parallel encodes and tops out at a a lot decrease most.

AI Upscaling. In Model 0.1, this step is dealt with by Topaz 1.5.3. That is an older model of the applying that doesn’t assist RTX 3000 or RDNA2 GPUs. That’s not an issue for us at the moment, as a result of the Quadro RTX 6000 playing cards contained in the Lenovo ThinkStation P620 are Turing-based. Future variations of the check will replace to the newest model of Topaz. Multi-GPU testing on the ThinkStation P620 was dealt with by operating one utility occasion on every GPU.

Conversion: The ultimate step — changing upscaled frames and the unique audio again right into a last video. Outputting frames after which recombining them utilizing a software like FFmpeg yields superior high quality to simply outputting an MP4 file by way of Topaz. TRACBench 0.1 makes use of FFmpeg git-2020-08-28-ccc7120 and libx264 for H.264 encoding. Future variations will embody testing in H.265.

We might proceed to make use of Handbrake for easy testing, however Handbrake isn’t as helpful for front-end video processing as AviSynth. AviSynth is a command-line video editor that gives a variety of filters for remodeling and modifying video in varied methods. StaxRip serves as a front-end for it.

The Lenovo ThinkStation P620 was an ideal testbed for constructing this benchmark. The 3995WX contained in the system is AMD’s top-end Ryzen Threadripper Professional CPU. It has barely decrease clocks than the 3990X, however it gives twice the utmost reminiscence bandwidth. The 3990X has only one reminiscence channel per 16 cores, whereas the 3995WX has two.

There’s a tradeoff between the Ryzen Threadripper 3995WX and the Threadripper 3990X, with the latter providing very barely extra clock velocity, however dramatically much less reminiscence bandwidth. We’ll see if the distinction is sufficient to matter in our assessments — and we’ve acquired a number of extra outcomes between the 2 techniques outdoors of this check as properly.

Reasonably than trying to make these three techniques as alike as attainable, I’ve intentionally allowed their configurations to vary. We’re three completely different efforts to construct a high-end workstation, basically. The Ryzen 9 5950X balances a brand new 16-core CPU towards an older GPU from 2018. The Ryzen Threadripper 3990X retains the identical GPU however will increase the variety of cores and total reminiscence bandwidth dramatically. Each of those techniques go for inexpensive, bigger M.2 SSDs, with 2TB of capability in contrast with the sooner Samsung PM981 Polaris drive, at 1TB. Lastly, the Lenovo ThinkStation P620 doubles reminiscence bandwidth once more and provides a second GPU. Every considered one of these techniques might pretty be known as a workstation-class system, however they make completely different tradeoffs. We’ll see how these tradeoffs influence efficiency.

By the way, the 3990X is operating DDR4-2666 as a result of my CPU, which as soon as ran at DDR4-3600 with no drawback, now refuses to clock above 2666 in any respect. Repeatedly resocketing each the RAM and CPU had no impact on this limitation, and stress-free RAM timings to a ridiculous diploma didn’t assist the system POST the next RAM clock.

The Lenovo ThinkStation P620 WorkstationThe Lenovo ThinkStation P620 is a genuinely good piece of package with a number of odd habits. It has a really lengthy boot time (~81 seconds) and it emits two lengthy beeps adopted by three quick beeps simply earlier than the monitor comes on. This can be associated to some side of the twin Nvidia Quadro RTX 6000 configuration as a result of the show doesn’t initialize till Home windows 10 is pulling up the desktop. System stability was glorious always.

The case panel is hinged and lifts straight away from the system. The ThinkStation P620’s inner structure is properly designed, although eradicating the second GPU could be troublesome relying on how massive one’s hand is. The entrance panel modules are designed to be adaptable to numerous forms of units, relying on what you should join.

I’m going to borrow a photograph from our sister website PCMag’s assessment of the ThinkStation P620 as a result of it reveals the within of the chassis with out graphics playing cards put in:

Right here’s a tighter angle of our ThinkStation P620, with its graphics playing cards put in.

The ability provide is exceptional. It’s simply the smallest 1kW energy provide I’ve ever seen, and it’s rated 80 Plus Platinum. It plugs straight into the motherboard utilizing an edge connector, seen under:

I’m torn on this side of the ThinkStation P620’s design. The ability provide is a well-built unit and it hooks on to the motherboard without having for a clunky 24-pin ATX cable. There are secondary PCIe energy cables mounted on the sting of the motherboard that journey from the motherboard to the GPUs. It’s objectively a greater system for energy supply, but when your energy provide dies you’ll be speaking to Lenovo a couple of alternative.

Energetic cooling for the RAM slots. Most likely not the worst concept, given how tightly packed issues are.

The cooling system is a bit uncommon however it retains the system steady, even beneath sustained full load. We stress-tested the system by operating 16 transcoding workloads and two AI upscaling workloads concurrently. Energy consumption on the wall hit 800W, however the system remained steady beneath an eight-hour load check. Fan noise from each GPUs and the CPU concurrently was important — I wouldn’t need to run the tower all-out if it sat subsequent to my head — however not sufficient to be bothersome if the machine sat beneath a desk.

Take a look at NotesThe Lenovo ThinkStation P620’s twin RTX 6000 GPUs assure that it’ll win the AI upscaling check. The purpose of this comparability is to point out the potential efficiency acquire when stepping from an upper-end client card from 2018 to a pair of higher-end workstation playing cards. All the level of TRACBench is that it may scale from strange client {hardware} to high-end workstations, so it is smart to seize a variety of knowledge factors (and value tags).

Outcomes at the moment are introduced just for AMD techniques. TRACBench 0.1 was designed on AMD {hardware} and I lack entry to the type of dual-socket Xeon techniques that compete with the Lenovo P620 on core depend. Future iterations of the benchmark will even embody data on Intel platform scaling throughout Rocket Lake, Cascade Lake, and lower-core AMD techniques.

TRACBench ResultsThe transcoding, AI, and mixture steps every present completely different efficiency patterns, so we’ll focus on them individually.

Transcoding is a big win for the ThinkStation P620 and reveals the advantages of eight reminiscence channels versus 4. At only one occasion, the Ryzen 9 5950X is definitely sooner than both Threadripper and AMD’s Zen Three structure retains an excellent tempo with the P620 and 3990X on the 2x degree as properly. At 4x, the Threadrippers pull decisively away. The small acquire between 2x and 4x for the 5950X reveals that 4x is the sensible restrict for the buyer CPU. StaxRip crashes when configured with Eight threads per occasion for those who run greater than 4 situations on the 5950X. The Threadrippers usually are not affected by this problem.

From 4x to 8x, the 3990X picks up simply 1.25x efficiency, whereas the Lenovo ThinkStation P620 good points 1.51x. Eight reminiscence channels permit the 3995WX to proceed scaling when even the mighty 3990X runs out of gasoline. I need to notice that the Ryzen Threadripper 3990X truly maintains greater clocks on this check than the Threadripper Professional 3995WX within the Lenovo ThinkStation P620. It’s not clock velocity making the distinction, it’s reminiscence bandwidth.

The AI check is measured in frames per minute. We anticipated efficiency to be totally decided by GPU alternative, so think about our shock when the Ryzen 9 5950X outperformed the Threadripper 3990X when each have been outfitted with an RTX 2080. Topaz has been up to date a number of occasions since we started growing this check, and TRACBench 0.2 will use an up to date app model, however this was an fascinating and sudden improvement. The Lenovo ThinkStation P620, as anticipated, simply wins this check.

Lastly, the FFmpeg conversion check merges frames and audio again right into a single video file. The P620 outperforms each the Threadripper 3990X and the 5950X on the single-instance mark and retains that lead thereafter. In contrast to in transcoding, the falloff between the 5950X and the opposite AMD CPUs is fast.

Scaling between the 2 Threadrippers is equivalent at each measured level. At eight encodes, each 64-core CPUs report ~95 % load, and the shortage of enchancment between 6x and 8x situations signifies there’s not a lot headroom left to scrape out. The truth that the 2 techniques scale identically, nevertheless, signifies that reminiscence bandwidth isn’t a limiting issue. It’s fascinating to see that the Ryzen 9 5950X nonetheless scales upwards, even when it isn’t by very a lot. Shifting from 4x to 8x improves efficiency by 7 %.

The ThinkStation P620 is a huge in the case of transcoding, the place it’s at least 1.84x sooner than the 3990X and three.37x sooner than the Ryzen 9 5950X. It maintains a 2.6x lead in AI upscaling over the 5950X, courtesy of the brace of RTX 6000 Quadro playing cards it carries. FFmpeg efficiency confirmed the smallest benefit for the Ryzen Threadripper 3995WX.

Along with TRACBench, we’ve additionally in contrast the 2 techniques in SPECworkstation 3.1.0.

SPECworkstation is designed to measure efficiency in workstation functions, together with GPU assessments. This accounts for among the gaps between the Threadripper 3990X and Threadripper Professional 3995WX within the graph above, however not all of them.

The large efficiency hole in Life Sciences can’t be defined solely by the 3995WX’s greater reminiscence channels, and there might have been a subtlety in our 3990X’s configuration, or a peculiarity of operating a four-channel Threadripper that resulted within the 3995WX testing a lot, a lot better than the 3990X within the lammps subtests, the place the 3995WX was at least 6.5x sooner than the 3990X. The gaps within the different classes are usually defined by the Lenovo ThinkStation P620 fielding sooner storage, GPUs, or a further 4 reminiscence channels, however the Life Sciences class hole dwarfs all of them.

If we take away the disparate influence of this subtest and study the 3990X versus the 3995WX subtest by subtest, the 3995WX turns in scores which might be 0.92x – 2.15x sooner than the 3990X. Whereas it narrowly loses a number of assessments as a result of 3990X’s sooner clock, it wins way over it loses on the addition of extra reminiscence bandwidth.

Once we take a look at storage assessments and we take away nammd storage outcomes for being skewed in a similar way to the CPU check, the Samsung PM981 SSD within the Lenovo P620 is 1.28x sooner, in combination, than the Mushkin Pilot-E we used for our Threadripper 3990X comparability. With the nammd outcomes included, the P620 is 1.37x sooner. Each techniques are utilizing PCIe 3.Zero drives — we’re seeing the influence of the SSD controller, not the extra bandwidth obtainable by way of PCIe 4.0.

The Lenovo ThinkStation P620 Hits the Pinnacle of Workstation PerformanceThe Ryzen Threadripper 3990X remains to be one of the vital enjoyable CPUs I’ve ever reviewed, partly for the absurd pleasure of pushing it to an all-core 4.3GHz outdoors in the course of the polar vortex, and partly as a result of watching 64 cores rip via rendering workloads in minutes that may take an hour or extra on an eight-core chip is enjoyable.

If watching the Ryzen Threadripper 3990X is enjoyable, watching the Lenovo ThinkStation P620 and the Ryzen Threadripper Professional 3995WX is an absolute social gathering. The 3995WX isn’t at all times sooner than the 3990X — there are a handful of locations the place it’s 4-6 % slower — however you commerce that handful of small slowdowns for 1.4x – 2x efficiency enhancements in particular functions. The outcomes we’ve proven right here illustrate the significance of realizing your workload — beneath the proper circumstances, the Ryzen Threadripper 3995WX is able to almost doubling the Ryzen Threadripper 3990X’s efficiency. Beneath the unsuitable ones, the 3990X is 5-6 % sooner than its dearer sibling.

As for TRACBench, count on to see it pop up once more the subsequent time we’ve got CPUs to assessment. The ThinkStation P620’s efficiency in TRACBench’s transcoding workload was superb. The Ryzen Threadripper Professional 3995WX eats transcode workloads for breakfast, far past something even the Ryzen Threadripper 3990X is able to.

I believe we’re going to see real-time AI upscaling at or above the standard TVEI at the moment gives inside the subsequent 5 years. Presently, two Turing GPUs mixed produce ~5.5fps, however one can think about Ampere doubling that baseline and hitting 5.5fps with one card. At that time, we want an additional 5x efficiency enchancment (I’m rounding as much as put some padding on the margin). Given how quickly AI efficiency has improved, that’s simply not a loopy concept. The ThinkStation P620 isn’t displaying off a future we’ll by no means get to see — simply accelerating its arrival a bit.

The Lenovo ThinkStation P620 is among the strongest air-cooled workstations cash should purchase, and it gives an enchanting glimpse into the way forward for content material restoration and upscaling. For those who’ve seemed on the Ryzen Threadripper 3990X however have been involved its quad-channel design restricted the chip, the Ryzen Threadripper Professional 3995WX could also be precisely what you’re searching for.

Now Learn:

AMD Stories Monstrous Q1 2021, With Income Up 93 % 12 months on YearAMD Roadmap Leak: Main Platform, Graphics Modifications Coming in Zen 4From 4.3GHz All-Core Overclocking to SMT Scaling: A Complete Overview of the AMD Threadripper 3990X