NVMe NamespacesTechnology Features
What is a Namespace?
In NVMe® technology, a namespace is a collection of logical block addresses (LBA) accessible to host software. A namespace ID (NSID) is an identifier used by a controller to provide access to a namespace. A namespace is not the physical isolation of blocks, rather the isolation of logical blocks addressable by the host software.
In Linux, namespaces appear each with a unique identifier in devices; /dev/nvme0n1 is looking at controller 0 and namespace 1).
There are many reasons why host software would want to break up an NVMe SSD into multiple namespaces: for logical isolation, multi-tenancy, security isolation (encryption per namespace), write protecting a namespace for recovery purposes, overprovisioning to improve write performance and endurance and so on.
On an NVMe SSD by default, the size of the namespace is equal to the size of LBA’s determined by the manufacturer. Each namespace is presented as a separate target device in the host software.
Namespace Size, Capacity and Utilization
The Identify Namespace data structure contains related fields reporting the Namespace Size, Capacity and Utilization:
- The Namespace Size (NSZE) field defines the total size of the namespace in logical blocks (LBA 0 through n-1).
- The Namespace Capacity (NCAP) field defines the maximum number of logical blocks that may be allocated at any point in time.
- The Namespace Utilization (NUSE) field defines the number of logical blocks currently allocated in the namespace. This allows the host software to keep track of how a device is responding to the deallocate command. After a format, the expectation is that this would be zero (no utilization) but after writing, the utilization increases until receiving a deallocate command (this is sometimes referred to as “TRIM”)
Identify Command Contains Information about Namespace Capabilities
In addition to learning about the size and utilization of a namespace, another important attribute is LBA formats. This helps host software know what the optimal size of commands to send to the namespace is, whether or not the drive supports protection information (end-to-end data protection), etc.
Image 1 is a screenshot from nvme-cli in Linux reporting the output of an identify namespace command.
Note: a namespace may be attached to two or more controllers, called a Shared Namespace. Conversely, a namespace may be attached to only one controller, called Private Namespace. Both of which are determined by the host.
How Are Namespaces Managed?
There are two command sets: Management and Attachment.
Namespace Management: Create, Modify, or Delete.
Namespace Attachment: Attach or detach.
- Once the host has created a namespace, it is not yet visible.
- Each namespace is identified by a corresponding NSID (namespace ID), and with that ID each namespace needs to be attached to a controller in the subsystem.
Figure 1: Attached Namespaces (Private & Shared)
NS A: Private, attached to controller 0 | NS B: Shared, attached to controller 0 & 1 | NS C: Private, attached to controller 1
In figure 1, the host has created three namespaces. This showcases not only the abstraction of an NVMe subsystem, but also how private and shared namespaces are utilized. Each namespace is seen as separate target devices.
A use case may involve having two or more customers using one NVMe SSD. Having more than one customer using one SSD raises concerns of consistency of service, dedicated performance and lowering the cost of having to purchase multiple SSDs. The logical separation between each tenant allows the owner to cater each namespace to the tenant’s workload habits. The SSD can still wear level and share spare areas for garbage collection between namespaces. This is different from NVM Sets where the expectation is physical isolation rather than namespaces, which is logical isolation.
Figure 2: Multi-Tenant Use Case
SSDs will use unwritten to LBAs and a certain amount of spare area for garbage collection, wear leveling, etc. By reducing the size of namespace relative to the amount of NAND flash on the device, referred to as overprovisioning, will improve endurance, performance and quality of service.
Example of a 3.84TB NVMe SSD, typical spare area and namespace size:
- Un-provisioned space is not directly accessible by the host, rather it allows the controller to use this spare area during garbage collection, TRIMs, etc.
- Overprovisioning is useful not only for one namespace, but in the case of multiple namespaces. The percentage of overprovisioning per namespace may increase or decrease to cater to the most frequent workload each namespace encounters.
Steps to overprovisioning using NVMe cli, per namespace:
- Detach all namespaces from each controller (spec recommends detaching first but delete also works).
- nvme detach-ns /dev/nvme0 –namespace-id=1 –controllers=0
- Delete each namespace you’ve detached.
- nvme delete-ns /dev/nvme0 –namespace-id=1
- Create a new namespace at the desired capacity (repeat for each namespace).
- Going from 7.68TB to 6.14TB
- nvme create-ns /dev/nvme0 –nsze 11995709440 –ncap 1199570940 –flbas 0 –dps 0 –nmic 0
- See all options for creating a namespace by issuing ‘nvme create-ns’
- Attach new namespaces to desired controllers.
- nvme attach-ns /dev/nvme0 –namespace-id=1 –controllers=0
- Reset device to make the target visible to the host.
- nvme reset /dev/nvme0
- List the devices to confirm successful overprovision.
- nvme list
Security, encryption per namespace (OPAL):
OPAL is a standard that encrypts LBA ranges and has an encryption key to access those LBA’s. If there is only one namespace, multiple locking ranges for encryption can be set up. This may need to be set up if certain data on the drive is more sensitive than others.
If multiple namespaces are each assigned to a different tenant, not only does that separate who has access to which namespace but encryption can be applied to a namespace with more sensitive data.
OPAL provides security from outsiders, other users of a drive and other users within a namespace.
Mobile (read-only, write protect and boot image):
NVMe controller provides write protection capabilities per namespace.
- Read-only until the next power cycle.
- Read-only until the first power cycle after the write protect feature is disabled.
- Permanently read-only for the lifetime of the drive.
These features provide a range of options for embedded or high-security systems.
During write protection, the namespace cannot be edited, formatted, sanitized or deleted.
In mobiles or desktop computers, operating systems can be secured by placing it in a separate namespace that is write protected and read-only. All the other applications can move to the other namespace that allows read/write, which helps protect the operating system from accidental or malicious tampering.
Having these capabilities allows control over what your users can see, and what they can modify.
Optimal Size Granularity:
The NVMe controller assigns namespace capacity in units other than LBAs. For this reason, namespace capacity does not have to be a multiple of the logical block size. The namespace capacity may have some memory that cannot be addressed using LBAs. The goal is to reduce un-addressable memory. The namespace is fully provisioned if:
Namespace size = Namespace Capacity,
Namespace size granularity = Namespace size * Formatted LBA size *alpha (integer), AND
Namespace capacity granularity = namespace capacity * Formatted LBA size * beta (integer).
During namespace creation, the host can consider namespace size granularity and capacity granularity and can reduce the un-addressable memory in namespace capacity and bring it to an optimal size.
Formatting per Namespace:
If multiple namespaces need to be set up when formatting the drive, it is important to specify which namespace should be affected. Each namespace can be formatted separately with secure erase, crypto erase and sanitize. Some drives only support a single LBA format between namespaces, others may allow for different LBA sizes and formats per namespace.
Contributors: Jonmichael Hands, Dennis Worley, Lakhveer Kaur – Intel