An introduction to how vSAN stores and reads data in OSA and ESA
In this post, I will provide an explanation of how vSAN stores and read data in both OSA (Original Storage Architecture) and ESA (Express Storage Architecture) architecture to enhance understanding of vSAN. The explanation will start with the former OSA and then proceed to the latter and newer ESA, covering the data processing and storage aspects.
The OSA architecture can be configured as either a hybrid or all-flash configuration. A disk group, which includes a maximum of one flash cache and up to seven capacity devices, offers both performance and capacity to the vSAN cluster. When both the cache and capacity devices are SSD, it becomes an all-flash configuration. It is important to note that each host is limited to five disk groups, meaning that each ESXi can contribute a maximum of 35 capacity disks and 5 flash disks to the vSAN cluster, resulting in a total of 40 disk devices supported. Only hard disks and flash devices that contribute to the capacity tier are those used for vSAN datastore capacity, so please keep this in mind.
The figure below illustrates an ESXi-Host within a vSAN cluster configured in hybrid mode, with one SSD in the cache tier and seven magnetic disks in the capacity tier.
And the figure below illustrates an ESXi-Host within a vSAN cluster that has been set up in all-flash mode. The configuration includes one SSD in the cache tier and again seven SSD disks in the capacity tier.
The All-flash configuration dedicates the cache tier exclusively to the write cache (write buffering), rather than the read cache. This is because both tiers comprise SSD devices, and the capacity tier can handle reads efficiently. However, in Hybrid configurations, the cache tier serves as both the read cache and write buffer. The read cache helps to reduce I/O read latency when a cache hit occurs.
What is a hit cache?
When a requested disk block is read from the cache tier instead of a magnetic device, a cache hit occurs. However, if the requested block is not present in the cache tier, it takes more time to read it from a magnetic device.
In a hybrid configuration, If a disk group receives a read request for a block that is not currently in the cache, the system should search for that block in the capacity tier. Once the block is located, it should be saved in the cache tier for future requests and a response should be sent back to the VM.
Whenever a write request is received by a disk group, it is first directed to the cache, which then acknowledges the request, so you get a performance level at the write speed of the cache device. A hybrid configuration designates 70 percent of its cache for reading and the remaining 30 percent for buffering write operations.
In an all-flash configuration, while write requests follow the same hybrid configuration, the read request does not return to the cache tier because both tiers consist of SSD devices and the capacity tier can handle reads effectively. So the entire cache tier is allocated solely for write-buffering purposes.
The vSAN behavior in the original architecture (OSA) differs from the behavior in the new architecture (ESA). The question arises, why do we require a new architecture? Essentially, the original architecture was sufficient for several years, but with technological advancements and the emergence of new storage devices, like TLC and QLC flash devices, the bottleneck has shifted from the storage layer to the upper layer, which encompasses vSAN. Consequently, a new architecture is necessary to meet the new requirements, such as efficiency, high performance, scalability, etc. Keep in mind that you require a minimum of four NVMe TLC devices for this architecture. QLC is presently unsupported.
The new architecture goes beyond traditional concepts such as disk groups, cache, and capacity tiers by using a single-tier pool, which allows users to allocate storage devices to a centralized “storage pool” where each device contributes both capacity and performance to the vSAN. Consequently, vSAN can enhance drive serviceability, improve data availability management, and ultimately reduce costs by benefiting all disks simultaneously, unlike the previous structure. This architecture moves the number of supported storage devices from vSAN to the lower tier (the physical server).
The new architecture is based on the original architecture with a new layer added to it. A new log infrastructured layered (vSAN Log-Structured File System) was added to the vSAN stack that ingests new data fast and efficiently while preparing the data for a very efficient full stripe write. The vSAN LFS also allows vSAN to store metadata in a highly efficient and scalable manner. Our LFS can write data in a resilient, space-efficient way without any performance compromise, and a new optimized log structured object manager (vSAN Log-Structured Object Manager) and data structure to sore write payloads while minimizing overhead needed for metadata. it is highly parallel and built efficiently to send data to the devices without connection.
The vSAN LFS processes incoming writes by performing several actions, such as ingestion, coalescence, packaging with metadata, and writing to a durable log associated with the object. This durable log is located on a new branch of the data structure within an object called the “performance leg“. Upon receiving the packaged data, the durable log promptly sends a write acknowledgment to the guest VM to minimize latency. For optimal write performance, the performance leg is stored as components in a mirrored across multiple hosts.
Once a full stripe (512KB) is formed, the data is transferred to the capacity leg, and the metadata is efficiently reorganized and stored in the metadata log. The capacity leg, which is either RAID-5 or RAID-6, provides optimal space efficiency but is not a different flash device class from the performance leg. Both the performance leg and the capacity leg belong to the same pool.