Since the introduction of enterprise flash devices about a decade ago, enterprise storage has seen remarkable advances in performance. In the early days, when solid-state devices were expensive and their capacity was relatively limited, enterprise flash was used as a performance tier in high-end storage systems, comprising just a small percentage of overall system capacity. Many storage systems continue to employ this hybrid design today, though the amount of flash employed has grown considerably as device costs have fallen and capacities have increased. Nevertheless, a hybrid combination of flash (for performance) and disk (for capacity) makes sense for many workloads because in many instances a small amount of data may be responsible for the vast majority of storage I/O activity. We have seen applications in which more than 90% of I/O can be attributed to less than 10% of the stored data.

Common Uses

Another common use of enterprise flash was as directly-attached storage (DAS) in the server. This deployment model was common in environments in which low latency was required, but low data capacity was acceptable and shared storage was not needed. Common use cases for enterprise flash as server-resident DAS included scale-out data platforms, web applications and online interactive environments including social media and games.

The Benefits

One area where enterprise flash offers considerable benefits is virtualized computing. Virtual machines are often most constrained by storage I/O. For this reason, much of the adoption of all-flash and hybrid-flash storage systems has been driven by the need to reduce storage I/O latency in virtualized environments like VMware. Additionally, hyperconverged infrastructure (HCI), which replicates data across direct-attached storage media in multiple host servers, almost always employ a flash-based performance tier. VMware’s Virtual SAN solution offers options for both all-flash and hybrid-flash host configurations.

One use case for enterprise flash that has become much more important recently is the combination of a server-resident, flash-based performance tier, combined with commodity disk-based shared storage (“just a bunch of disks” or “JBOD”). This deployment model is being driven by a combination of economic drivers and technical advances. The primary economic driver is increased enterprise cloud adoption. Cloud service providers can typically store enterprise data at a fraction of the cost of traditional on-premises enterprise storage infrastructure. Besides competing with on-premises storage, cloud service providers compete with each other on cost. Host-based data tiering is simply the most economical solution for realizing the high capacity of disk-based storage and the high performance of enterprise flash.

On the technology side, the advances that have made host-based data tiering possible took place through collaboration among a number of technology leaders. Beginning in 2015, VMware enabled the integration of third-party data management software in VMware vSphere environments through a new API framework, called the vSphere APIs for IO Filtering. This allows the third-party data management software direct access to the I/O within the hypervisor, in the path between the virtual machine and the virtual disk.

JetStream Accelerate

Our team at JetStream Software partnered with VMware in the design of the filter API used for tiering data on host-based flash memory. Our software, JetStream Accelerate, monitors the I/O activity and populates the low latency flash device in the server with the most frequently accessed data. Because it is integrated directly with the hypervisor through the IO Filter API, it’s fully supported within the VMware environment, and supports all standard VMware cluster functions like vMotion, storage vMotion, snapshots, DRS, etc.

What does this have to do with the cloud? Some of the most enthusiastic adopters of host-based data tiering are public and private cloud operators. They may standardize their cloud infrastructure on VMware, but each VM may run a very different workload, with a different I/O profile. Contention among heterogeneous I/O profiles is commonly referred to as the “I/O Blender Problem,” a challenge that affects even hybrid-flash and all-flash systems. When I/O requests are addressed within the host server at the flash data tier, the I/O blender effect is significantly reduced. This has the effect of ensuring faster, more consistent application performance.

Besides the performance benefits of host-based data tiering, using JetStream Accelerate in a VMware environment can deliver economic advantages as well. It turns out that VM density—the number of VMs that can be supported on a single physical host—is affected by I/O latency. Of course, physical CPU and memory resources must also be sufficient, but even higher-end systems will see a limitation on the number of VMs that the physical host server can support when I/O is constrained. Testing with JetStream Accelerate has shown that VM density can be improved 2x to 3x with no degradation in application performance. For a cloud service provider, this increase in VM density goes straight to the bottom line.

Most importantly, host-based data tiering gives the cloud service provider an extremely economical way to take advantage of flash memory as its cost has declined and capacities have increased. Rather than buying a hybrid-flash storage system from a traditional enterprise IT vendor, the cloud service provider can procure and deploy commodity components including enterprise flash devices and disk-based storage: the “server-flash-plus-JBOD” model.

Today’s IT architect has a rich choice among the various deployment models for enterprise flash: all-flash arrays, hybrid-flash systems, flash as DAS, HCI, “server-flash-plus-JBOD,” etc. As the saying goes, “different horses for different courses.” Some deployment models will be a better fit for certain workloads than others. For enterprise applications provisioned on VMware as a cloud service, using an IO Filter-based solution like JetStream Accelerate to enable host-based data tiering offers a number of compelling economic and technical advantages.