Attuning VMware vSphere 6.0 with Dynamic Host-Wide Performance
By default, The networking stack of vSphere is tuned to balance the tradeoffs among CPU cost and latency to offer better performance across a broad variety of applications. But, there are few cases where using a tunable offers better performance. An instance is Web-farm workloads, or any such circumstance where a high consolidation ratio (many of VMs on a single ESXi host) is preferred and opted over extremely low end-to-end latency.
VMware vSphere 6.0 briefs out the Dynamic Host-Wide Performance Tuning feature (also called dense mode), which offers a single configuration choice to dynamically optimize individual ESXi hosts for very high consolidation scenarios under particular use cases. Later in this blog, we declare and define those use cases. Right now, we focus at how dense mode works from an internal viewpoint.
Mitigating Inefficiency of Virtualization under High Consolidation Scenarios
Shown in Figure 1, an instance of the thread contexts within a very high consolidation environment. Moreover, to the Virtual CPUs (every labeled VCPU) of the VMs, there are per-VM vmkernel threads (device-emulation, labeled “Dev Emu”, threads depicted in the figure below) and several others vmkernel threads for every Physical NIC (PNIC) running physical device virtualization code and virtual switching code. Single major source of virtualization inefficiency is the often happening context switches among all these threads. When context switches happens due to a various factors, the predominant networking-related factor is Virtual NIC (VNIC) Interrupt Coalescing, namely, how often does the vmkernel halt the guest for new receive packets (or vice versa for transmit packets). More frequently occurring interruptions are obvious to result in lower per-packet latency while maximizing virtualization overhead. At high consolidation ratios, the overhead from elevated and increased interrupts hurts performance.
Dense mode applies two techniques to minimize the number of context switches:
- The VNIC coalescing scheme is going to be changed to a less aggressive scheme known as static coalescing. With static coalescing, a static fixed number of requests are delivered in every batch of communication among the Virtual Machine Monitor (VMM) and vmkernel. This, usually, reduces the frequency of communication, hence fewer context switches, concluding in better virtualization efficiency and optimization.
- The device emulation vmkernel thread wakeup opportunities are tremendously reduced.
The device-emulation threads now are going only to be executed either periodically for a longer timer or while the corresponding VCPUs are halted. Such optimization largely minimizes the frequency that device emulation threads being waken up, therefore frequency of context switch is also reduced.
Figure 1. An Example of High Consolidation
Empowering and Enabling Dense Mode
Dense mode is deactivated or disabled by default in vSphere 6.0. For enabling it again, alter Net.NetTuneHostMode in the ESXi host’s Advanced System Settings (displayed below in Figure 2) to dense.
Figure 2. Depicting Enabling Dynamic Host-Wide Performance Tuning
“default” is disabled; “dense” is being enabled
Once dense mode is empowered and enabled, the system periodically examine and verify the load of the ESXi host (every 60 seconds is by default) depending on the following three thresholds conditions:
- Number of VMs ≥ number of PCPUs
- Number of VCPUs ≥ number of 2 * PCPUs
- Total PCPU utilization ≥ 50%
When the system load exceeds the above thresholds, these optimizations will be in effect for all regular VMs that carry default settings. When the system load drops below any of the thresholds, those optimizations will be automatically removed from all affected VMs such that the ESXi host performs identical to when dense mode is disabled.
Enabling dense mode may significantly impact performance negatively for few applications. So, before empowering, carefully profile the applications to examine whether or not the workload is going to benefit from this function. Usually, the feature enhances the VM consolidation ratio on an ESXi host executing medium network throughput applications with little latency tolerance and is CPU bounded. A good use case is Web-farm workload that needs CPU to process Web requests while only generating a medium level of network traffic and having a very few milliseconds of tolerance to end-to-end latency. In case the bottleneck is not at CPU, enabling these feature outcomes in hurting network latency only because of less frequent context switching. For instance, the following listed workloads are NOT good use cases of the feature:
- X Throughput-intensive workload: As network is the bottleneck minimizing the CPU cost would not necessarily enhance network throughput.
- X Little or no network traffic: In case there is very little network traffic, all the dense mode optimizations rarely going to have any effect.
- X Latency-sensitive workload: When executing latency-sensitive workloads, another set combination of optimizations is needed.
To analyzed and evaluate this feature, we deploy a lightweight Web benchmark, that has two lightweight clients and a huge number of lightweight Web server VMs. The client’s transmitted HTTP requests to all Web servers at a provided request rate, wait for responses, and report the response time. The request is denoted for static content and it includes multiple text and JPEG files total amounting to around 100KB in size. The Web server has memory caching enabled and hence serves all the content from memory. Two distinguished request rates are used in the analysis and evaluation:
- Medium request rate: 25 requests per second per server
- High request rate: 50 requests per second per server
In both cases, the total packet rate on the ESXi host is about 400 Kilo-Packets/Second (KPPS) to 700 KPPS in each respective direction, where the receiving packet rate is little higher than the transmitting packet rate.
System Setup and Configuration
We setup and configured systems with following details:
- One ESXi host (running Web server VMs)
- Machine: HP DL580 G7 server running vSphere 6.0
- CPU: Four 10-core Intel® Xeon® E7-4870 @ 2.4 GHz
- Memory: 512 GB memory
- Physical NIC: Two dual-port Intel X520 with a total of three active 10GbE ports
- Virtual Switching: One virtual distributed switch(vDS) with three 10GbE uplinks using default teaming policy
- VM: Red Hat Linux Enterprise Server 6.3 assigned one VCPU, 1GB memory, and one VMXNET3 VNIC
- Two Clients (generating Web requests)
- Machine: HP DL585 G7 server running Red Hat Linux Enterprise Server 6.3
- CPU: Four 8-core AMD Opteron™ 6212 @ 2.6 GHz
- Memory: 128 GB memory
- Physical NIC: One dual-port Intel X520 with one active 10GbE port on each client
Medium Request Rate outcomes
Primarily let’s present the evaluation outcomes for medium request rate workloads. Shown below in Figures 3 and 4 the 95th-percentile response time and total host CPU consumption as the number of VMs duly increases, respectively. In the 95th-percentile response time, we will consider 100ms as the preferred latency tolerance.
Shown in Figure 3 that at 100ms, by default mode consolidates only around 470 Web server VMs, considering dense mode consolidates more than 510 VMs, that is over 10% enhancement. For CPU consumption utilization, we consider 90% is the expected maximum consumption.
Figure 3. Shown Medium Request Rate 95-Percentile Response Time
(The Latency Tolerance up to 100ms)
Shown via Figure 4 that at 90% consumption, default mode consolidates about 465 Web server VMs, whereas dense mode consolidates around 495 Web server VMs, that is still a around 10% enhancement in consolidation ratio. We also observe that dense mode, in fact, also minimizes response time. This is because the great minimization in context switching enhances virtualization efficiency, that compensates the increase in latency because of more aggressive batching.
Figure 4. Showing Medium Request Rate Host consumption
(Desired Expected Maximum Utilization 90%)
High Elevated Request Rate
Below shown in Figures 5 and 6 describes the 95th-percentile response time and total host CPU consumption for a high elevated request rate as the number of VMs elevates, respectively. As the request rate is doubled, we minimize the number of Web server VMs consolidated on the ESXi host. Shown Figure 5 first projects which is at 100ms response time, dense mode only consolidates about 5% more VMs in a medium request rate case (from ~280 VMs to ~290 VMs). But, if we observe the CPU utilization as depicted in Figure 6, at 90% expected maximum load, dense mode still consolidates around 10% more VMs (from ~ 240 VMs to ~260 VMs). Taking into consideration both response time and utilization metrics, as there are a fewer number of active contexts available under the high request rate workload, the advantage of reducing context switches is going to be less significant in comparison to a medium request rate case.
Figure 5. The High Elevated Request Rate 95-Percentile Response Time
(The Latency Tolerance is 100ms)
Figure 6. The High Elevated Request Rate Host Utilization
(Expected Maximum Utilization to be at 90%)
We described the Dynamic Host-Wide Performance Tuning functions and properties, also term as dense mode. We also proved a Web-farm-like workload attains up to 10% higher consolidation ratio while also meeting 100ms latency tolerance and 90% maximum host utilization. We elaborated that the enhancements are not applicable to every type of application. As of this, you must carefully profile the workloads before enabling dense mode.
- Microsoft Sharepoint2018.03.30How to Select the Best between SharePoint Server and SharePoint Online
- SharePoint Hosting2018.03.22Avoid SharePoint Compliance Risk by implementing a Robust Information Governance Plan
- Dedicated Hosting2018.03.20Guide to Selecting the Best between Office 365 Hosting and Hosted Exchange
- QuickBooks2018.03.07Boost Up Your Accounting Performance with Managed QuickBooks Support Services