Last year, Nvidia provides GRID K1 and GRID K2 GPUs for GPU virtualization. One could run multiple VM sharing the same graphic card to render complex applications. It takes the similar solution of SR-IOV but different approach. VMWare utilizes this GPU virtualization on their products and make the GPU performance in VM almost the same performance on bare-metal machine. Even without vGPU, VMWare’s Fusion still has about 90% of bare-metal performance.
I published an post about hardware virtualzation and SR-IOV last year. In That post, I mentioned that GPU virtualization is supported by Intel’s Iris CPU. Here, I would like to talk about more of GPU virtualization.
Nvidia vGPU – Grid
Nvidia announced Grid K1 and K2 which support OpenGL 4.4、DirectX 11 及 CUDA® 6.0 last year. It is a very interesting design with hardware virtualization support. Although the performance is not as high as the same price of GPU, it is sufficient for many use scenario like desktop applications. According to the result, the performance scales well from single VM to 4 VMs without a significant performance drop. It is very worth to try it out in your server environment to provide multiple desktop environments with close-to-full GPU performance on rendering graphics, such as 3D CAD.
VMware Horizon 
Now, let’s talk about the architecture of VMWare’s solution combining with Nvidia’s vGPU. Nvidia provides an interface API to use vGPU for virtualization. Guest OS will access directly from Nvidia’s driver without knowing other VMs existence. However, in reality, the resource is limited. GPU has limited CUs and memory size. One can only run about 4 VMs on a single card. Like SR-IOV, vGPU has limited virtual PCIE port for binding to VMs.
NOTE: The following paragraphs with Italic font are from VMWare’s Blog.
The goal of NVIDIA GRID vGPU with VMware Horizon is to enable the scalable rollout of graphics-rich 2D and 3D desktops across an enterprise. This is accomplished by combining the high performance and scalability of GRID vGPU with flexibility, security and manageability of VMware Horizon. Let’s see how that works.
First, the NVIDIA GRID vGPU Manager is the key component installed on each GRID enabled server and works with vSphere 6 to enable the selection and management of GPU profiles assigned to each user based on the available physical GRID graphics adapter.
Second, vSphere 6 manages the placement of GRID enabled virtual machines on an individual GRID enabled server. vSphere 6 will maximize the performance of each virtual machine by distributing virtual machines across different GPUs when possible.
VMWare Fusion 
On the other hand, VMWare Fusion takes a different approach to the normal situation of desktop/laptop environment. By removing redundant layers of software stacks of using GPU from guest OS to host OS to hardware, VMWare’s Fusion reaches around 90% of bare-metal performance on MAC. It is a breakthrough!! Fusion and Parallel were about the same performance in 2014. The secret sauce is redirecting DriectX API calls to host machine, reducing the software stack of calling rendering API from guest OS. The original path was roughly: call DirectX API(Guest) -> render on virtual GPU(Guest) -> redirect to render engine(X11, etc.) (Host)-> render again on physical GPU(Host). I don’t know the detail implementation of Fusion, but I believe they redirect the DirectX’s API or its low-level rendering API to host’s rendering API. It would be something like this: call DirectX API(Guest) -> render again on physical GPU(Host)
I’m still waiting for parallel’s counter attack. This is a tremendous breakthrough!! I can even play games on MAC without Bootcamp.
All in all, GPU virtualization is very mature. Even hardware does not support virtualization, one could still has an acceptable performance for doing 3D CAD and gaming.