That ISR/DPC graph is crazy. Spending 1/3 of a CPU-second in ISRs for every 1 second of wall clock time running a single threaded DX11 application is insane. Even if they moved that into a DPC, that's still going to be disruptive to other latency-sensitive applications on the system. I'd be curious to know how evenly distributed they are across CPUs and whether it's a lot of short ISRs or a few long ones that add up.
It's unlikely that it's contributing significantly to the poor 3DMark API Overhead test results at these kind of frame rates, but it's certainly interesting that Intel can't seem to do Independent Flip on Vulkan swapchains. Having to context switch to DWM to present each frame isn't free and almost certainly is costing them significant performance in very high frame rate Vulkan applications. Nvidia's driver can also directly present Vulkan swapchains like you saw on the AMD test.
Very well written! Hopefully, these insights are allowed past the marketing department, managers, and other bureaucratic mechanisms that "protect" the developers. Hopefully, the driver team has enough bandwidth to address this, compared to working on new features, new hardware, or fixing other bugs.
Some of the issues you can see in the graphs indicate architectural issues, such as batching strategy and DMA chunk sizes. Other things could be hardware choices, such implementing less of the DMA mastering on the GPU side (i.e., using Windows paging to transfer data to the GPU instead of sending a physical address to the GPU DMA engine).
We can always speculate on the thoughts behind the implementation, but the results clearly show room for improvement. In the mean time, great job on creating something that is close the ideal bug report: specific, data-driven, repeatable, and fully described.
That ISR/DPC graph is crazy. Spending 1/3 of a CPU-second in ISRs for every 1 second of wall clock time running a single threaded DX11 application is insane. Even if they moved that into a DPC, that's still going to be disruptive to other latency-sensitive applications on the system. I'd be curious to know how evenly distributed they are across CPUs and whether it's a lot of short ISRs or a few long ones that add up.
It's unlikely that it's contributing significantly to the poor 3DMark API Overhead test results at these kind of frame rates, but it's certainly interesting that Intel can't seem to do Independent Flip on Vulkan swapchains. Having to context switch to DWM to present each frame isn't free and almost certainly is costing them significant performance in very high frame rate Vulkan applications. Nvidia's driver can also directly present Vulkan swapchains like you saw on the AMD test.
Well thanks for this. Iv been trying to figure out this 2days ago.
I found that INTEL has 3 pre-rendered frames as default vs amd 0 pre-rendered frames.
There is a setting in regedit called "HwQueuedRenderPacketGroupLimitPerNode" set to 3.
IF we talking DX It would be equivalent to flip buffer size or flip queue.
Changing this setting DOES nothing atm. Wonder if disabling intels HW flip and enabling legacy software flip set to 1-2 would reduce the overhead.
Very well written! Hopefully, these insights are allowed past the marketing department, managers, and other bureaucratic mechanisms that "protect" the developers. Hopefully, the driver team has enough bandwidth to address this, compared to working on new features, new hardware, or fixing other bugs.
Some of the issues you can see in the graphs indicate architectural issues, such as batching strategy and DMA chunk sizes. Other things could be hardware choices, such implementing less of the DMA mastering on the GPU side (i.e., using Windows paging to transfer data to the GPU instead of sending a physical address to the GPU DMA engine).
We can always speculate on the thoughts behind the implementation, but the results clearly show room for improvement. In the mean time, great job on creating something that is close the ideal bug report: specific, data-driven, repeatable, and fully described.
wonderful write-up. much more informative than your youtube brethren's attempts.
It is sort of surprising that it works so bad in vk. d3d<=11 doesn't really matter as there is alternatively like dxvk.