On the Horizon for NVMe Technology: Q&A on the Evolution and Future of NVMe Webcast

By David Allen and J Metz

Our recent Evolution and Future of NVMe webcast attracted over 700 flash storage technology fans. We had a great time discussing how the NVMe specification will grow in 2018 and highlighting key topics on the roadmap. If you missed the live presentation, you can still view the entire webcast on-demand and download our slides. We didn’t have time to answer every audience member question, so we wrote this Q&A blog to provide a little more insight into our top questions.

Is this roadmap reflective of both enterprise and client NVMe SSDs?
Yes, the roadmap encompasses both enterprise and client NVMe SSDs. The base specification protocol ensures NVMe devices are interoperable, but also includes additional features that target specific vendor market requirements.

Can you comment on which NVMe-oF transports have been gaining traction lately? For example, with the new NVMe-TCP, there must be industry demand for it even though RoCEv2 and iWARP are already supported.
“Gaining traction” is a somewhat relative phrase as there are many companies in the NVM Express group looking to use NVMe-oF for their individual solutions. The real key is the interest playing out by the respective transport standards bodies. For instance, the T11 Fibre Channel group released the
FC-NVMe spec in August 2017 and is already working on enhancements for FC-NVMe 2. There is also a working relationship between the NVMe-MI and SNIA Swordfish groups.

How much space is allotted to the Persistent Memory Region? Is it a set amount of storage or can the user select this?
Persistent Memory Regions are typically small since they use DRAM or internal SRAM buffers. Sizes can range from Kilobytes to Megabytes but are likely smaller than 1 Gigabyte. The amount available is dependent on the design and architecture of the SSD and what the advertised specification is.

Could you describe the challenges associated with NVMe subsystem connected to a host through bridges/switches?
Bridges and switches are driven more by the PCIe specification itself. Once those devices meet that specification, they provide additional connectivity and paths to the devices. If done correctly, only minor latencies occur through bridges/switches and it isn’t really impactful. Bridges/switches mostly stay out of the way of the NVMe protocol and must adhere to the physical layer and PCIe technology protocol implementation.

In many ways, bridges/switches can actually help by being aggregations before oversubscription.

Will NVMe Management Interface (NVMe-MI) replace many of the existing SSD features in places that monitor the drives or will they complement each other?
They complement each other. Many of the management applications or layers use some of the NVMe-MI pieces in place and others will utilize what they have in the past, like SCSI enclosure services on an enclosure box itself. The idea here is to ensure that multiple management applications have the right tools and paths to the devices themselves. As NVMe-MI matures they will consider adopting other industry standards as well as incorporate new enclosures.

Can name spaces be shared? What version of spec allows this?
Namespaces have been sharable since the very first NVMe specification.

Can you explain differences between NVMe Sets, namespaces, etc.?
As a simple example, think of these terms as follows:

An NVMe Namespace is like a “volume” and can vary in size.
An NVMe Set is a fixed allocation of the SSD and pertains to physical boundary (usually the number of NAND dice)

A Set can have multiple Namespaces with varying sizes.

Can we have one domain (1-port) in an NVM Subsystem?
There can be one domain in an NVM Subsystem. There’s no need to explicitly call out domains if you don’t have more than one.

When will you have a production level “base” release? Which ‘best tested’ NVMe release do I install in a World Class Enterprise Data Center?
World Class Enterprise Data Centers have been deploying NVMe since 2014 – several years now. New features are added to enhance performance and improve endurance in the SSDs but these features are only improvements to the current base spec. Current devices tout NVMe 1.2 or 1.3, but features are often optional. Inquire with your solution provider to determine if their system specification meets your requirements.

How does I/O Determinism compare to U.2 dual port? Is dual port an implementation of I/O Determinism or is it unrelated? Additionally, what is the relationship between M.2 and NVMe?
M.2 is a popular compact “Gum Stick” PCIe SSD form factor that comes in varying lengths and not part of the NVMe base specifications. Other working groups and industry specification bodies define the SSD form factors.

U.2 is a popular serviceable 2.5” SSD form factor usually in a standard HDD type case and is not related to I/O Determinism (IOD). IOD is an NVMe feature Set focused on driving consistent latencies when accessing an SSD and applies to any SSD form factor.

IOD and the U.2 dual port are separate because not all dual port drives are IOD drives. The dual port allows an additional connection point to the SSD itself and high availability SSDs that can provide an active failover mechanism or an active-active path into the SSD. Even at the subsystem level there can be multiple ports in a subsystem which is not part of IOD.

What is “deterministic” about I/O Determinism? From your description, it seems like it just improves performance by increased SSD segmentation.
One person’s deterministic is another person’s low latency device. Nothing exists in a perfect world that is totally deterministic, but as these latencies are reduced, devices look much more deterministic.

IOD makes a lot of sense for large systems and massive deployments like hyperscale, data systems and data centers. IOD allows users to control where the data locality exists. Priorities should be less about the direct latency between certain systems and more about ensuring users can avoid certain types of congestions. IOD also helps on a smaller, more local scale. If there are servers with large amounts of drives and you are trying to control where certain types of processes go (i.e. garbage collection) you can avoid certain impacts and conflicts with natural processes that go on inside of the SSDs. It’s not just about the segmentation, it’s about being able to see things at a more macro level than only identifying where the bits are placed on the SSD.

Can you elaborate on how data security requirements are being stipulated to work along with the IOD feature?
Data security requirements should work hand in hand with IOD. IOD uses generic NVMe architecture constructs and there are no new contracts other than the Sets and implementation—putting physicality and segmentation on the drive to specific mandates. Theoretically, only half the drive could be a Set of SED because the spec does not prohibit breaking up into a partial SED drive. It would work in conjunction, though that may be an unrealistic implementation.

Is there OS support for IOD?
Current OS’s do not need modification to support IOD. A drive can be separated into something like four Sets with Namespaces applied to those Sets. Applications would see the different Namespaces/Sets.
How will performance (e.g. latency) differ between RDMA, FC and the new TCP transport implementations?
We are trying to solve the problem of architecture and this is where fabrics become important. If there is zero contention on the wire, you’re going to get line rates for RDMA, Fibre Channel and TCP. Latency is affected in more than just the link-to-link relationship between the two devices. If there is oversubscription, overutilization or underutilization, more issues will arise than just the time it takes to send bits back and forth from one device to another. Realistically, there will be a greater importance of topologies and architectures for overall net performance—making a huge difference in terms of size, scale and scope of deployments. For some programs, having a smaller type of deployment at the top of the rack is going to be the best way of getting the type of information that you’re looking for. In situations with multiple systems, Fibre Channel might end up being the best approach. For massive scalability, TCP might be the best option. The overall impact of the performance is going to be how those are systematically laid out topologically speaking. No contention or congestion is the perfect work utopia for networks.

If your question wasn’t answered in this blog, please contact us to learn more about the NVMe specification and roadmap. Remember, you can watch the whole webcast on-demand and download the presentation slides. Stay tuned for the full 2018 webcast calendar coming soon!