NVIDIA RTX 3080: Performance Test
September 17th 2020
In the early morning of September 2nd, the release of the RTX 30 series graphics card was definitely blockbuster news for many technology enthusiasts. The 30 series graphics card, which has been postponed several times, finally met after the official countdown. Regarding this conference, I believe that it has shocked users all over the world enough. On the one hand, the performance is doubled. The myth of doubling the performance of 10 series graphics cards has appeared again in 30 series graphics cards. On the other hand, it is the price, double the amount without increasing the price, which is enough to make anyone party.
21 days and 21 years, these 21 days NVIDIA has not let us wait in vain, and these 21 years have also allowed us to witness NVIDIA's brilliant achievements in the field of computer graphics.
In fact, as early as 2 months before the release, a variety of true and false news has continued to flow out, from the initial "3090 this year, replacing the previous TITAN" model change to the specific parameters of "3090 with 5248 CUDA" , And then to "the power supply interface is changed to a single 12pin", the truth is confusing.
The main improvements of this 30 series graphics card
In the press conference on September 2, Mr. Huang Renxun emphasized more than once that "this is the greatest performance improvement ever." From the effect shown at the press conference, the RTX 30 series graphics card can be described as double the amount without increasing the price. And the most direct change brought about by the second-generation RTX Ampere architecture is the skyrocketing performance, so the various smoke bombs before the press conference are obvious. The author will bring you the first evaluation of NVIDIA GeForce RTX 3080.
NVIDIA GeForce RTX 3080 appearance
Let's take a look at the appearance of this NVIDIA RTX 3080 graphics card. First of all, on the outer packaging, it has always been the minimalist style of NV. The square cardboard box is mainly black, supplemented by rose gold texture. And this time NVIDIA is also rare and useless green, the whole looks a bit like Tesla V100.
Outer packaging and graphics card
After getting the graphics card, the first impression it gives is that it has a very strong texture, which can be called a model of industrial design. In the press conference, we also saw that the RTX 30 series graphics card has made great changes in appearance, and a large area of the card body is covered by heat sink fins.
After getting the graphics card, I actually found that all the cooling fins have a matte coating, so the touch is more warm. The shell part of the graphics card is wrapped in a large area of metal, and the surface is made of frosted material.
The heat sink fins are all matte coating
The first impression of NVIDIA's RTX 3080 in your hand is perfection. This is definitely a work of art. Although in the past we would marvel at its exquisite workmanship in the public version evaluation, but like this time, the large area of metal is so cleverly merged to form a combination of rigidity and softness. It is definitely in the beginning of the design. It takes a lot of effort, and if this effect is not good, it will become an "iron bump".
GeForce RTX 3080 appearance display
The reason why the appearance of RTX 30 series graphics cards needs to be greatly changed is because of the subversive design in terms of heat dissipation. It adopts a dual-axial flow design. The RTX 3080 actively dissipates the fans one after the other. According to official data, the air flow is increased by 55% compared to the previous design, the heat dissipation efficiency is increased by 30%, and the mute effect is increased by 3 times.
Cooling system schematic
The specific working principle is shown in the figure above. This is also the first time that the NVIDIA graphics card combines the heat dissipation system with the overall heat dissipation of the chassis to form a cooperative work.
How the cooling system works
The new heat dissipation system can suck in cold external air, flow through the GPU, and exhaust the hot air directly from the back of the chassis. The other back-pull fan also sucks in cold air, but flows through the heat dissipation fins on the heat pipe, and is guided to the back of the chassis through the overall heat dissipation system of the chassis.
PCB version comparison
NVIDIA has also made great adjustments on the PCB inside the graphics card. In order to match the new heat dissipation system, this time adopts an ultra-high-density PCB design with a "V" shape on the front end, which is 50% smaller than before. .
From the figure, you can see the dense arrangement of components on the board, with the RTX 3080 core in the middle, 10 memory particles distributed around it, and two empty soldering positions.
GeForce RTX 3080 PCB big picture
The 18-phase power supply is arranged in order on the left and right sides of the chip, and the tantalum capacitors are distributed at the corners. In addition, the power supply interface can be seen at the upper right of the whole board, and its space can really only accommodate the order interface. It can be said that the whole PCB board has almost no rich position.
Power supply adapter cable included
Since this public version of the graphics card uses a single 12pin power supply interface, in order to facilitate the adaptation of the players’ existing power supply, an adapter cable is also included in the package. The single 12pin can be designed as 8+8pin, but due to the design of the interface direction , Will just block the belief logo of "GeForce RTX", which is slightly flawed.
Changes brought by the NVIDIA Ampere architecture
Let's take a look at the "greatest performance increase ever" compared to the first generation of RTX Turing architecture, what changes will NVIDIA Ampere have.
The First generation of RTX architecture turning.
Second-generation RTX architecture Ampere
First, let’s briefly review what we saw on the PPT of the September 2 press conference. Compared with the original Turing RTX architecture, the NVIDIA Ampere architecture has doubled its computing power, and each clock performs 2 times of coloring. Turing is 1 time, and the shader performance reaches 30 TFLOPS single-precision performance, while Turing is 11 TFLOPS.
The NVIDIA Ampere architecture doubles the throughput of light and triangle intersections, RT Core reaches 58 RT TFLOPS, and Turing reaches 34 RT TFLOPS.
In addition, in the new Tensor Core, it can automatically identify and eliminate less important DNN weights. The processing speed of sparse networks is twice that of Turing, and the computing power is as high as 238 Tensor TFLOPS, while Turing is 89 Tensor TFLOPS.
The new NVIDIA Ampere GPU core has 28 billion transistors and an area of 628 square millimeters. It is based on Samsung’s 8nm NVIDIA custom process, GDDR6X memory from Micron, and as we said above, the three processing cores are all twice the speed of the original Turing. , Which constitutes the most powerful Ampere ever.
The powerful performance of the NVIDIA Ampere architecture is not achieved overnight by NVIDIA. It can be said that the Turing architecture used in the 20 series graphics card is indispensable. Let's take a look at the complete GA102 core.
Complete GA102 core
The complete GA102 GPU consists of 7 GPCs (graphics processing clusters), 42 TPCs (texture processing clusters) and 84 SMs (stream processors). GPC is the dominant advanced module, with all the key graphics processing units, each GPC contains a dedicated raster engine. In the new NVIDIA Ampere architecture, each GPC also contains two ROP partitions, and each partition contains 8 ROP units. Let's take a look at the changes of each SM unit.
In each SM, there are four large processing partitions with a total of 128 CUDA cores, four third-generation Tensor Cores, one second-generation RT Core, one 256 KB cache file, and one 128 KB L1 cache , This L1 cache can be deployed according to different work requirements, and work efficiency is maximized.
In addition, everyone knows that the number of CUDAs of the RTX 3080 has soared to 8704, and the number of CUDAs of the RTX 3090 has reached an astonishing 10,496, but everyone should know that the GA100 core of the professional computing card Tesla A100 has a larger core Area, more transistors, theoretically only 8192 CUDA, how does the RTX 3080 achieve this effect?
In fact, this time NVIDIA Ampere's SM has doubled the FP32 arithmetic unit based on Turing, which doubles the number of FP32 arithmetic units per SM.
Complete GeForce RTX 3080 core
And usually we calculate the number of CUDA of the graphics card, instead of adding up all the units in the SM, but only counting the number of FP32 units, so the answer is obvious, the FP32 in SM: INT32 changes from 1:1 It is 2:1, such as 8704 CUDA of RTX 3080. In fact, it only has 4352 INT32 units, but because the number of internal FP32 has doubled, it finally achieved the amazing number of 8704.
But is this considered a "virtual bid"? In fact, for current games, floating-point operations are more commonly used than integer calculations, so the doubled FP32 can really double the performance.
Schematic diagram of optical chase working principle
In this NVIDIA Ampere architecture, NVIDIA officially announced the second generation RT Core. How is it different from the first generation? The first thing to know is that the working principle of RT Core is that the shader sends out a ray tracing request and hand it over to RT Core for processing. It will perform two tests, namely Box Intersection testing and Triangle Intersection testing. ). Judging based on the BVH algorithm, if it is a square, then return to reduce the scope to continue the test, if it is a triangle, then feedback the result for rendering.
The most time-consuming ray tracing is the calculation of intersection. Therefore, to improve the performance of ray tracing, it is mainly to accelerate the two kinds of intersection (BVH/triangle intersection).
Changes in RT Core
In Turing's RT Core, 5 BVH traversals, 4 BVH intersections, and one triangle intersection can be completed every cycle. In the second-generation RT Core, NVIDIA has added a new triangle position interpolation module and an additional The triangle intersection module of, the purpose of this is to improve the ray tracing performance of special effects such as motion blur.
Motion blur rendering principle
The second-generation RT Core allows ray tracing and shading to be performed at the same time. The more ray tracing is performed, the faster the acceleration. It doubles the processing performance of ray intersection. When rendering images with motion blur, follow NVIDIA's own The actual measurement is 8 times faster than Turing.
Sparse deep learning
In addition to the enhancement of ray tracing, the Tensor Core of the Ampere architecture has also been greatly enhanced. In the third generation of Tensor Core, NVIDIA has introduced sparse acceleration, which can automatically identify and eliminate less important DNN (deep neural network) weights. At the same time, it can still maintain good accuracy.
First, the original dense matrix will be trained, the sparse matrix will be deleted, and then the sparse matrix will be trained to achieve sparse optimization, thereby improving the performance of Tensor Core.
The processing power of the third generation Tensor Core is greatly improved
So the final result is that Tensor Core is processing sparse networks at twice the rate of Turing, with a computing power of up to 238 Tensor TFLOPS, while Turing is 89 Tensor TFLOPS.
At the same time, in the press conference, Huang Renxun also mentioned a new technology-RTX IO. At present, many games often dozens or perhaps even 100 G G installation space, for the burden of storage space aside, but the data stored in the hard disk, graphics card if you want to read, you need to first by the CPU reads from the hard disk compression over The data is decompressed and sent to the video memory.
Traditional data exchange
In this process, multiple CPU cores are occupied, the pressure increases sharply, and more memory is occupied. At this time, the GPU is actually in an idle state. The function of RTX IO is to go beyond the step of decompressing the CPU and then transferring data, read the compressed data on the hard disk directly from the PCIE bus, and complete the decompression, reducing CPU usage and improving performance.
RTX IO can greatly liberate the CPU burden
Of course, this technology, as the underlying operating mode of the system, needs to be implemented with the help of DirectStorage released by Microsoft. For games with current capacity, the improvement effect of RTX IO is limited, but if the game capacity of hundreds of G becomes the norm over time, This technology will play a huge role.
In RTX 3080, GDDR6X memory is used. GDDR6X has a 320bit bit width and 19Gbps bandwidth speed. Compared with Turing using GDDR6, it can increase the speed by 40%. In the same time, GDDR6X can transmit 2 times more data than GDDR6. . This is especially important for tasks that require a large amount of data load, such as ray tracing games, AI learning, and 8K video rendering.
At the same time, with the newly added HDMI2.1 interface, it can support single-line 8K video output, while the previous generation HDMI2.0 only supports 4K 98Hz video output. If you want to connect an 8K TV, you need more cable support.
3DMARK theoretical performance test
First introduce the test platform. In order to ensure the best performance of RTX 3080 graphics card in this evaluation, the motherboard and CPU adopt the current desktop flagship configuration, as follows.
Test platform software and hardware configuration
|Motherboard||Asus||EXTREME ROG RAMPAGE 12|
|Graphics Card||NVIDIA||GeForce RTX 3080 Founders Edition|
|Ram||Galaxy||HOF EXTREME 8GB DDR4 4266MHz*2|
|Power supply||CORSAIR||RM Series™ RM750|
|Operating system||Microsoft windows 10|
|Motherboard Driver||intel chipset driver|
|Frame number monitoring||frameview, benchmark|
In terms of test results, the benchmark test uses 3DMARK, and the game performance test uses the game's own Benchmark and FrameView to take the average value of the same scene.
RM Series™ RM750
RM Series™ RM750 — 750 Watt 80 PLUS® is a gold power supply built for high-end gaming platforms. The 750W rated power can meet the power demand of high-end gaming platforms. The 80 PLUS gold medal performance brings better The energy-saving performance, full module output can also provide a refreshing backline effect, which complements the 30 series graphics card.
First look at the parameters of GPU-Z, RTX 3080 uses GA102 core, Samsung 8nm, chip area reaches 628 square millimeters, has 8704 CUDA, frequency is 1440-1710MHz, uses 10GB GDDR6X memory, bit width is 320bit, and memory bandwidth It reaches 760.3GB/s, and the raster unit and texture unit are 96 and 272 respectively.
The following is the 3DMARK FS package used to measure the theoretical performance of the graphics card DX11 : FS, FSE, and FSU correspond to the theoretical performance of the graphics card at 1080P, 2K, and 4K respectively. The actual test results of the graphics card scores are as follows:
3D MARK FS suit test
In the 3DMARK FS suite test for the performance of the graphics card DX11, the RTX 3080 scored 54% higher than the RTX 2080 in FS, 58% higher in FSE, and 67% higher in FSU. It is not difficult to find that the higher the resolution score, the greater the gap, the same in the light pursuit effect and DLSS effect, the gap will be greater, we will introduce in detail below.
3D MARK TS set test
In the Time Spy and Time Spy Extreme tests for DX12 performance, the TS score of RTX 3080 is 65% higher than that of RTX 2080, and the TSE score is 76% higher. It is not difficult to find that the performance of RTX 3080 is particularly outstanding in the DX12 environment.
3D MARK light tracking test
PortRoyal is a test item in 3DMARK specifically for light tracking performance. Compared with RTX 2080, the score of RTX 3080 has increased by 79%.
Although the score of the theoretical test is a very important criterion for the performance of the graphics card, the actual game frame performance may be the most concerned about the players. Let's look at the actual game test.
Game performance test
In the game performance test, we selected "Control", " Shadow of the Tomb Raider ", "DOOM Eternal", "New Blood of the German Headquarters", " Far Cry 5 ", " Assassin's Creed Odyssey", domestic game "Frontier", Benchmark software for "Bright Memory: Infinite" . Among them, "Control" and "DOOM Eternal" do not have their own benchmarks, so we choose FrameView to take the average value of the same scene game for calculation, but the accuracy is definitely not comparable to the benchmark.
"Control" game test
The first is the masterpiece "Control". Currently, "Control: Final Compilation" is registered on steam . This game has excellent physical destruction and light and shadow effects, and because of the multiple choices in the settings, our test is divided into 2 groups and 6 tests. The first group is the preset highest quality, RTX OFF/DLSS OFF, the second group is the preset highest quality, RTX high/DLSS ON, we can see the specific performance in the above picture.
Among them, the scores of RTX 3080 are 58% and 55% higher than RTX 2080 at 1080P resolution; 73% and 68% higher at 2K resolution; 71% and 84% higher at 4K resolution. It can be seen that the higher the resolution, The better the light chasing effect, the more the RTX 3080 leads.
"Shadow of the Tomb Raider" game test
n "Shadow of the Tomb Raider", due to the addition of light pursuit and DLSS effects, we also divided into 2 groups of 6 tests. The first group is the default highest quality, RTX OFF/DLSS OFF, and the second group is At the default highest quality, RTX Ultra/DLSS ON. Among them, RTX 3080 is 30% and 45% higher than RTX 2080 at 1080P resolution; 57% and 55% higher at 2K resolution; 70% and 65% higher at 4K resolution, and the overall improvement is 50%-70% between.
"DOOM Eternal" game test
"DOOM Eternal" is the latest work in the Doom series. It has relatively low requirements for the configuration of the machine, and it is mainly refreshing. Among them, RTX 3080 is 47% higher than RTX 2080 at 1080P resolution; 59% higher at 2K resolution; 74% higher at 4K resolution. However, because "DOOM Eternal" also does not have a benchmark, it can only go to the scene average, and there are a lot of smoke effects in the scene, and the number of frames is not accurate, so it is for reference only.
"Assassin's Creed Odyssey" game test
Next is the Odyssey of Equality of All Living Beings. Although it is called Equality of All Living Beings, we can see from the picture that there are really graphics cards that can stabilize more than 60 frames at 4K resolution. Among them, RTX 3080 scores 38% higher than RTX 2080 at 1080P resolution; 42% higher at 2K resolution; 54% higher at 4K resolution.
"Far Cry 5" game test
"Far Cry 5" is also a 3A masterpiece with well-optimized comparisons. RTX 3080 scores 20% higher than RTX 2080 at 1080P resolution; 61% higher at 2K resolution; 92% higher at 4K resolution.
Temperature power consumption test
In terms of temperature and power consumption test, at room temperature of 24°C, we did not use a fully enclosed chassis, but a test platform method. This can ensure that the graphics card can minimize the external factors such as air ducts and other factors in addition to its own heat dissipation. .
Power consumption test (click to view larger image)
In the power consumption test, we choose FurMark software for copy machine testing, and the power consumption is only calculated on the graphics card itself. It can be seen that the new NVIDIA Ampere architecture graphics card is indeed a big power consumer. The two softwares are slightly different under peak conditions, but the overall average is between 310W-315W.
In terms of temperature, this time the RTX 3080 is still controlled at about 75°C, while the core area of the RTX 2080 is 545 square millimeters, and the core area of the RTX 3080 is 628 square millimeters, which is a full 15% larger, but the temperature is still well controlled. In terms of heat dissipation design, RTX 3080 has indeed worked hard.
To Sum Up
In terms of price, the overall pricing of the 30 series graphics card is very conscientious. Compared with the RTX2080, the performance of the RTX 3080 is nearly 2 times higher, and the price remains unchanged. And the positioning of RTX 3080, don’t forget that it is the current flagship product, which is over-performance for most players.
Some players will ask whether the RTX 20 series graphics card is considered a failed generation with such a "short life", I think it is not. Turing has created a new world of ray tracing and AI learning for us, laying the foundation for the future development direction of GPU, and realizing the real change from performance stacking to qualitative change. Ampere is standing on the shoulders of giants, taking the path of the previous generation wider and more solid.
ASUS ROG moba 5 plus Review
This time, the Moba 5 series has the same core configuration as the Gunshen 5 series, with proper flagship specifications. The battery capacity and fast charging specifications have been improved, and the first Ryzen 9 processor is also very much anticipated. What about the specific performance? Let's take a look at the running test. Before the test, as usual, we changed the 3D setting preferred graphics processor in the NVIDIA control panel to a high-performance NVIDIA processor, and then opened the Altron Center software to select the enhanced mode.
ASRock launches B550 Tai Chi Razer motherboard: integrated with Razer Chroma lighting effect
The Razer Edition motherboard is Razer's first cross-border motherboard product and the world's first motherboard with native integration of Razer Chroma RGB lighting effects