In August, Gabe Knuth published an article on BrianMadden.com describing Teradici’s new product announcements. An important aspect of Teradici’s product offering is their PC-over-IP protocol (PCoIP), an alternative to other remoting protocols. But how good is PCoIP and what are the ideal use cases? To answer this question, we compared the performance of Microsoft Remote Desktop Protocol (RDP) and PCoIP in June and July 2017. This project was sponsored by Microsoft with input from Teradici.

Our primary focus was on benchmarking the performance of graphics workloads in Hyper-V virtual machines accelerated by NVIDIA M60 GPUs attached through Discrete Device Assignment (DDA). DDA allows you to pass through a physical GPU that is plugged into the PCI Express bus of the Hyper-V host machine straight to a guest VM.

Test Environment

For this project we used an on premises lab, but because this setup can be deployed on premises using a physical server with an M60 card or in Azure using an NV-series VM, test results are relevant for both on-premises and cloud environments.

We used a Dell R730 server with NVIDIA Tesla M60 GPU accelerator, 128GB of RAM, 800GB SSD. On this physical host server, we installed Windows Server 2016 and activated the Hyper-V role. We created two Windows 10 guest VMs and assigned a M60 GPU to each one using the DDA installation procedures described in our DDA article. To get the full benefit of the GPU’s capabilities, once the guest VM had exclusive access to the GPU, we installed the graphics driver provided by the GPU hardware vendor. Then we enforced additional DDA-related settings via Group Policy :

Computer Configuration | Administrator Template | Windows Components | Remote Desktop Services | Remote Desktop Session Host | Remote Session Environment | Use the hardware default graphics adapter for all Remote Desktop Services sessions.

We used one of the DDA-enabled guest VMs to test the performance of RDP and we used the other to test PCoIP. The operating system on both guest VMs was Windows 10 v1607 with 2 virtual CPUs and 4GB of memory.

Only the Microsoft Remote Desktop Protocol version delivered with Windows 10 or Windows Server 2016 includes all the necessary features required to take advantage of GPU pass-through with DDA. Therefore, this protocol version is referred to as RDP 10. In the RDP 10 guest VM we enabled AVC444 mode in non-RemoteFX scenarios by group policy:

Computer Configuration | Administrator Template | Windows Components | Remote Desktop Services | Remote Desktop Session Host | Remote Session Environment | Prioritize H.264/AVC 444 Graphics mode for Remote Desktop connections and Configure H.264/AVC hardware encoding for Remote Desktop connections.

In the PCoIP guest VM we installed the Teradici PCoIP Graphics Agent for Windows version 2.8, which is part of the Teradici Cloud Access+ product. The PCoIP Graphics Agent is compatible with any GPU, but is optimized for NVIDIA GRID-compatible GPUs. It is important to note that the PCoIP Graphics Agent is single-user only, even though it can be installed on Windows 10, Windows Server 2016 or Linux. PCoIP is also not compatible with RD Connection Broker. The primary focus of PCoIP is on color accuracy and on high-end CAD/CAM design scenarios.

No other protocol-specific settings were adjusted in the guest VMs. This means that the maximum standard frame rate may be different between the two remoting protocols. In our test runs the frame rate at the client was not measured, which may result in a direct relationship to bandwidth especially for the high-quality use cases.

The client machine used in our test environment was a Shuttle Barebone PC running Windows 10 v1607 with Intel i5-2500 CPU, 16GB of RAM, 500GB HDD and an AMD FirePro v5800 GPU. We used the built-in Remote Desktop Connection client software to establish connections to the RDP 10 guest VM. We installed the PCoIP Software Client for Windows and used it for connecting to the desktop of the PCoIP guest VM. Again, we did not modify the default configuration settings, neither for RDP 10 nor for PCoIP.

Test Methodology

In our test environment, we used the REX Analytics framework to benchmark remote end-user experience (REX) by simulating a range of user interaction workloads. The REX Analytics framework includes fully automated (synthetic) test sequences, control services, management consoles, agents, screen and telemetry data recorders, analysis tools and a unique visualization component. The framework works on-premises and in cloud environments.

The underlying test methodology is based on tracking perceived user experience and related telemetry data of an interactive user. Both fully automated and manual test sequences are executed in a closely controlled way and recorded as screen videos. We correlate the videos to relevant feature and performance counters.

We covered four different test scenarios, defined by the network settings provided by an Apposite Mini2 WAN emulator in the on-premises environment. Network settings during the different test runs were as follows, both for RDP 10 and PCoIP:

  • LAN – 100Mbps bandwidth, 0ms latency, 0% packet loss
  • WAN – 8Mbps bandwidth, 50ms latency, 0.001% packet loss
  • WAN – 2Mbps bandwidth, 200ms latency, 0.001% packet loss
  • WAN – 2Mbps bandwidth, 0ms latency, 10% packet loss

Figure 1: Test environment.

In each of the test scenarios we ran and recorded 21 standard REX Analytics workload sequences for each protocol, covering simulated GDI, video, animation, Web app, DirectX and OpenGL use cases. In addition, we ran some high-end graphics sequences based on Unigine Heaven, Unigine Superposition and Geeks 3D GPU Test.

Test Results

We used REX Analyzer, a REX Analytics component, to visualize the test results. It presents recorded screen videos alongside collected performance data in a 4-up split screen arrangement, which is easy to understand and interpret.

First, we compared the RDP 10 and PCoIP LAN sequences (100Mbps maximum bandwidth, 0ms latency (round-trip time – RTT) and 0% packet loss). Sequence 11 (Chrome-HTML5) shows slightly smoother animation with PCoIP, but with 20-30% higher network bandwidth requirements. In sequence 16 (WMV video in 1080p resolution), PCoIP shows better color accuracy and video quality, but at the price of much higher network bandwidth requirements. In sequence 32 (OpenGL, Redway3D), REX Analyzer shows slightly better image quality in the PCoIP session with lower network bandwidth demand.

Figure 2: Sequence 32, OpenGL Redway3D on LAN.

Visual comparison of sequence 50 (Unigine Heaven, OpenGL on LAN) shows minor advantages for PCoIP. CPU usage is slightly lower and GPU usage is higher with RDP 10. PCoIP requires significantly higher network bandwidth. Observations in sequence 51 (Unigine Heaven, DirectX on LAN) are almost identical to sequence 50 (OpenGL). The only difference is that it takes the PCoIP session much longer to load the textures into video memory and start the animation – a behavior that can be adjusted by setting the bandwidth floor. In sequence 53 (Unigine Superposition, DirectX on LAN), it also takes the PCoIP session much longer to load graphics resources and start the animation. Required network bandwidth is again significantly lower with RDP 10 than it is with PCoIP.

Figure 3: Sequence 51, Unigine Heaven, DirectX on LAN.

Next we analyzed the captured videos and telemetry data for some selected WAN scenarios. We started with 8Mbps bandwidth, 50ms latency and 0.001% packet loss. In sequence 04 (MP4 video at 720p resolution), PCoIP shows better color accuracy than RDP 10, but at higher bandwidth consumption, more stuttering and artifacts in individual frames. In sequence 16 (WMV video at 1080p resolution), PCoIP shows a more blurred image quality and more stuttering due to network constraints. We made the same observations in sequence 32 (OpenGL, Redway3D), where PCoIP shows a lower frame rate and more stuttering than RDP 10.

Figure 4: Sequence 16, WMV video at 1080p resolution, 8Mbps bandwidth and 50ms latency.

In even more constrained networks, PCoIP starts falling behind RDP 10 even more significantly. As an example, Sequence 09 (Chrome, JavaScript, Ken-Burns Effect) at 2Mbps bandwidth, 200ms latency and 0.001% packet loss shows higher CPU load, but a much lower frame rate in the PCoIP session. At 2Mbps bandwidth, 0ms latency and 10% packet loss, sequence 20 (GoogleEarth, DirectX9), PCoIP shows artifacts due to packet loss. Under the same network conditions, sequence 32 (Redway3D, OpenGL) shows lower frame rate with PCoIP resulting from the constrained bandwidth and artifacts due to packet loss. In the same sequences, RDP 10 provides significantly better image quality and end-user experience.

Figure 5: Sequence 20, GoogleEarth, DirectX9, 2Mbps bandwidth, 0ms latency and 10% packet loss.

Summary

Under LAN conditions, PCoIP shows great color quality in videos and CAD scenarios, but at the price of higher network consumption. PCoIP bandwidth requirements are sometimes only moderately higher than with RDP10, but there are also sequences where network consumption is up to 8 times higher. The PCoIP documentation specifies that packet loss should be below 0.1% although higher loss is tolerated. In many sequences, PCoIP under LAN conditions can be described as visually lossless which is beneficial for high-end video and CAD/CAM use cases.

RDP 10 uses more aggressive compression algorithms which causes slight degradation of color quality under LAN conditions. Under the different WAN conditions, RDP 10 shows better quality in terms of frame rate and stuttering. This means that RDP 10 is better suited for network constrained connections than PCoIP.

It is important to note that one protocol is not generally “better” than the other. Determining which protocol is better suited really depends on the use case.

Future Benchmarking

In a future testing phase, we want to find out if PCoIP performance can be improved by tuning the settings, for example by specifying the maximum bandwidth under constrained conditions which forces the agent to transmit at a lower rate and configuring the bandwidth floor to allow the high packet loss scenario.

PCoIP performance is also highly dependent on client specifications. While RDP 10 uses a hardware video decoder at the client in the tests above, PCoIP is only using a software decoder. Using the hardware decoder in a PCoIP zero client for PCoIP would most certainly be beneficial for PCoIP performance – a great benchmarking test for the future.