DAS Stack: Let’s continue the journey – About the “Disk”

From a typical internal construction perspective, we can say the disks can be broadly classified as three types. The physical hard disk, a flash based Solid State Disk (SSD) and virtual disk from Storage Arrays (Network Disks).

The physical hard disks are usually made up of a hard metal disk (usually aluminum) coated with a magnetic material. The magnetic material records the data and the aluminum disk provides the rigid support.

The diagram below shows a quick view of insides of a typical hard disk drive.

More detail at http://news.bbc.co.uk/2/hi/technology/6677545.stm

Let’s quickly touch a few ares in a word or two. For further information http://en.wikipedia.org/wiki/Hard_disk_drive is an excellent source of detailed information.

The disk or platters here is held and rotated by the spindle. The Read/Write head moves about the surface of the disk and, of course, reads or writes the disk as it rotates. The head arm holds the Read/Write head(s) and is positioned by the voice coil actuator. The air-filter filters out the dust etc. from entering the hard drive compartment.

Just for notes, the head “flies” above the disk with a very thin gap between the head and the disk. If it ever touches the disk during spin, the head “crashes” and the disk is essentially useless. There are some data recovery technologies which can try and recover the data that is not on the crash path. However, as far I know, it is not practical to extract data under the tracks that is actually in the crash zone.

Disk Block Layout

This diagram shows a rough view of Disk Layout. The disk itself if split in to tracks and sectors. There are two possibilities with this kind of arrangement, as you might guess or know. The first is variable track length (or circumference, if you prefer). This is a typical concentric circle division. The center circles are smaller and hence will have smaller circumference and the outer circles are larger having greater circumference.

The other is fixed track length. This can normally be achieved by spirals instead of concentric circles. The track is of fixed length. Near the center of the disk, the circle can have lesser tracks than near the edge of the disk. This is more standard industry practice. The diagram below shows a typical tracks and sector division. Each sector is further divided in to blocks and we have that division on display here as well.

The block is the smallest individually addressable entity on the disk. It is usually 512 or 520 bytes, with 512 is most common. There is an advanced format proposal which makes the block size to 4096 (or 4K) bytes. The host system might need to have some support for this version, though. More on this later.

Shown at http://computershopper.com/feature/how-it-works-platter-based-hard-drive

The drive has the disk or platter(s) (1), spindle(2) to hold and spin them, the head arm (3) to hold the Read/Write heads, the voice coil (4) to move the head around, the heads (5), and the head landing zone(9) where the head can rest without crashing on the disk during power down. The tracks (6), sectors (8) and blocks (7) are the locations on the disk where data is stored and retrieved.

Typically, the disk is accessed as a set of serially numbered blocks – the Logical Block Address (LBA). The on disk electronics worry about translating disk LBA in to a location on the disk and store retrieve information. This also makes the life of host system easier as it only needs to worry about the block address as a logical block number – no need to remember on which platter, on which side, on what track in which sector is the block of our interest.

The logical block addressing also helps the on disk controller map out bad blocks without the host system worry about them. This is an unwanted behavior in some cases, though.

For the Solid State Disks, or flash drives – there are no spindles as you might guess. There are just a few chips under the hood storing and retrieving blocks of information with a bit of glue logic to attach these storage chips to the bus. The blocks can again be accessed with LBA and the controller translates that in to chip, sector/group and block address depending on flash chip organization. Let me try to touch SSDs in some more detail in a separate post. A typical small scale example of SSD is our USB pen drive or SD Card.


Storage: The DAS stack

Let’s talk about a typical Direct Attached Storage stack from a server system’s perspective.

Let’s cover the storage stack in both software and hardware layers, which can later help us bring in the networked storage concepts with ease.

Let’s take a quick look at the stack for ease of understanding.

The Storage Subsystem

The storage subsystem stack diagram has software components in blue  and hardware components in red. As with common sense, both are in just reverse order to each other. In a way, this stack might look just like networking stack. Let’s walk through the maze.

When an application requests for some operation a file, the request is passed on to the filesystem layers. Like in Linux, it’s possible to have a Virtual File System or VFS layer which will then switch to actual filesystem drivers to do the job. The filesystem then starts working through it’s magic on it’s internal data structures to figure out where and how to go about serving the request. One best example is looking up an inode and the indirect blocks. Let’s save some for detailed discussion later.

The filesystem then asks the disk accessing components, e.g. SATA or SCSI about the blocks it’s interested in. Once the request arrived in here, from now on we are only talking in terms of block numbers, be it LBA or some other mechanism. We don’t know anything about filesystem at all. Tell the block number and get it, read or write. The disk accessing subsystem then, looks up to do what best it can do, often re arranging the commands for a sequential operation improving the performance. Finally places the request packet to the HCI.

The host controller interface (HCI) or host bus adapter (HBA) in some cases such as SCSI drivers take in the packet , does the job of taking the series of commands for the disks, appropriate buffers and the disk ID as input, makes the queue up to place the commands to the disk along with disk ID. The appropriate queue is then passed on as a bunch of bytes/words to the BUS interface driver. In some cases, the HCI is rather really complex piece of software, such as for SATA, containing may layers such as transport layer and link layer within itself.

Finally the data to be sent on the bus arrives at the bus drivers. The bus drivers often are just simple ones to fiddle with a few flags and place the data on the bus, often with DMA or direct memory accessing subsystem and then let things go. When the DMA is done an IRQ comes up and informs the driver that the operation is complete. Here, the driver then returns control the HCI driver and HCI driver might wait for it’s own IRQ telling the command queue is sent successfully to the disk or just return the control to the Disk driver. Here the disk driver will have to wait for the disk job done and the IRQ tells that it’s complete. Only then the control can reach back to the filesystem telling that job was well done and the day is going good.

The skipped RAID component, let’s talk about it in a while.

Talking about the hardware, let’s start with the disks. The disks are just a few stores for the data and a few electronics glued to transfer data in and out. Often, disks take a single Logical Block Address and return the appropriate block by maintaining internal LBA to track/sector mapping. In the initial days, the disks just had a minimal electronics, but later the integrated drive electronics or IDE came in to picture. The same way, the Small Computer System Interface (SCSI) came in to existence. Though this is a dinosaur age story, that might just amuse you a bit. Though the SCSI is a common interface for the system, it’s mostly used for mass storage. Often, most storage systems emulate SCSI, such as USB pen drives. Then came along the AT Attachment or ATA (AT coming from PC-AT) and then came down the Serial ATA or SATA.

The SATA, however has it’s own protocol stack, though it supports a legacy mode in which the system can access a SATA subsystem as if it were PATA. As the SATA HCI can be operated in both backward compatibility mode with PATA and advanced HCI mode, the controller includes both standard IDE electronics & advanced HCI components.

Finally, the Bus interfaces. Most of today’s systems run on top of PCIe or Peripheral Component Interconnect – Express bus. In few words, it’s an advanced serial IO bus system that interconnects the CPU with the peripherals at a very high speed. Please visit http://en.wikipedia.org/wiki/PCI_Express for a quick look. Some posts on it a bit later.

In the next post, let’s talk about blocks, RAID and stripes.

%d bloggers like this: