A primer to File Systems
(NetDiox R&D Team)
Authored By :
Peter Chacko, Founder & CTO
Netdiox Computing System Pvt. Ltd.
(Linux kernel research/training center)
About the Author:
Peter Chacko has been working on system/networking/storage systems development since 1994. His career spans
The fundamental role of a computer is to run applications using compute and network resources, taking, processing, emitting digital data using storage resources or file systems and optionally transmitting various forms of data across data communication network using communication networks using networking stack. In this context, file systems define the fundamental character of a data processing machine or a computer as it relates to how data is created, stored, retrieved, protected and managed. File systems are the key software components of storage systems that abstract out the block level storage media into a hierarchy of file system directory structure that makes data available for access to users as files and directories. As operating systems makes each
File systems is typically a layer of software that runs as part of the operating system kernel, that interface with storage
there are 100s of types of file systems in the market as there are many different approaches to the task of managing permanent storage.
Layout of user data and associated information to locate this data – a.k.a metadata- depends of the file system types in context. There are different ways a file system can be implemented mainly driven by the storage layouts in the physical media the capabilities which the file system is tuned for – like data reliability, random access versus sequential access performance, primary versus secondary storage use cases so on and so forth. File systems also differs by the targeted hardware environment for which it’s optimized for. We will briefly cover the general characteristics of file systems and then will discuss the few leading file system technologies starting with key storage terms first.
Key Storage Terms
Disk: A permanent storage medium typically captive in your computer, of a certain size. A disk also has an attached sector or block size, which is the minimum unit that the disk can read or write at a given time, limited by its hardware implementation. Typically, the block size of most modern hard disks is 512 bytes.
Block: The smallest unit writable by the file system. Everything a file system does is composed of operations done at the granularity of blocks. A file system block is always the same size as or greater than the disk block size and is always a multiple of sector size.
Partition: A subset of all the blocks on a disk. A disk can have several partitions.
Volume: The software abstraction for a collection of blocks on some storage medium (i.e., a disk). Basically a volume may be all of the blocks on a single disk, some portion of the total number of blocks on a disk, or may span multiple disks and or any software- conceivable aggregation of the blocks. The term ―volume‖ is used to refer to a disk or partition that has been initialized with a file system. Volume is a
Superblock: The area of a volume where a file system stores its critical volume wide information. A superblock usually contains information such as how large a volume is, the name of a volume, file system resource information such as no of inodes and data blocks and so on.
Metadata: A general term referring to information that is about something but not directly part of it. For example, the size of a file is very important information about a file, but it is not part of the data in the file.
Journaling: A method of insuring the correctness of file system metadata and data consistency even in the presence of power failures or unexpected reboots. For example leading File systems in the industry such as Athinio
inode: Also known as index node, historically in Unix.The fundamental storage unit in disk for housing the main metadata of file system stores about all the necessary metadata about a file. The inode connects the contents of the file and any other data associated with the file. An
Extent: A physically contiguous blocks with a starting block number and a length of successive blocks on a disk. For example an extent might start at block 4000 and continue for 5000 blocks.
Key File system concepts
Files: The primary capability that all file systems must provide is a way to store a named piece of data and to later access that data using the name given to it. We often refer to a named piece of data as a file. This abstraction provides only the most basic level of functionality in a file system. This identifier of a file is called its filename. A file is where a program stores data permanently in disk. If you removed this name entry from the systems, using the removal operations of the file system, data is never accessible unless there are other
The Structure of a File Given the concept of a file, a file system may impose no structure on the file, or it may enforce a considerable amount of structure on the contents of
the file. An unstructured or a raw file is often referred to as a stream of bytes, literally has no structure. The file system simply records the size of the file and allows programs to read the bytes in any order or fashion at any location that they desire. This is different from structured data where there is a fixed structure in the file and user operates at those structures for various
Directory: Apart from a single file being stored as a stream of bytes, a file system must be capable of organizing multiple files. File systems employ the term directory or folder to describe a ―bucket‖ that organizes files by name.
Inode: Inode uniquely represent the file object in the file system. The primary purpose of a directory is to manage a list of files and to connect the name in the directory with the associated inode. There are several ways to implement a directory, but the basic concept is the same for each. A directory contains a list of names. Associated with each name is a handle that refers to the contents of that name . Although all file systems differ on exactly what constitutes a file name, a directory needs to store both the name and the inode number of this file. The name is the key that the directory searches on when looking for a file, and the inode number is a reference that allows the file system to access the contents of the file and other metadata about the file. When a user wishes to open a particular file, the file system must search the directory to find the requested name. If the name is not present, the file system can return an error such as Name not found. If the file does exist, the file system uses the
Metadata:The name of a file is metadata because it is a piece of information about the file that is not in the stream of bytes that make up the file. There are many pieces of metadata about a file as
―hooked‖ through this mount operation. Filesystem operations refers to the actual file system functions invoked when a certain operation on file system, by a user is performed. Such operations includes creation of a new file, updating a file, changing the metadata or accessing various attributes or data.
We will now discuss different features of XFS file system for a “sense of” a typical,
enterprise class file system.
XFS - File system for the Big Iron
XFS was created for the environment where large numbers of CPUs and large disk arrays are the norm. It focuses on supporting large files and good streaming I/O performance. It also has some interesting administrative features not supported by other file systems supported by Linux kernel at the time it was built. Each XFS file system is partitioned into a fundamental abstraction called allocation groups (AGs). AGs are somewhat similar to the block groups in
inode consists of two parts and an optional part; The inode core, the data fork, and the attribute fork. The inode core contains, as expected traditional UNIX inode data structure such as ownerships, number of blocks, timestamps, and a few
ID. The data fork contains the previously mentioned extent descriptors or the root of the extent map. The optional attribute fork contains the extended attributes.
Extended attributes , as its is implemented in Linux, are simple name/value pairs assigned to a file that can be manipulated, one at a time. The attribute fork in XFS can either store extended attributes directly in the inode if the space required for the attributes is small enough, or use the same scheme of extent descriptors as described for the file data above to point to additional disk blocks. Extended attributes are used by Linux to implement various things such as access control lists (ACLs) and labels for SELinux.
Inodes in XFS are dynamically allocated, which means that, there is no need to statically allocate the expected number of inodes when creating the file system, with the possibility of under or
space or based on the size needed. Clearly this organization offers significant advantages for efficiently finding the right block of disk space for a given file. The only potential drawback to such a scheme is that the B+trees both maintain the same information in different forms. This duplication can cause inconsistencies if, for whatever reason, the two trees get out of sync. Because XFS is journaled, however, such accidental inconsistencies are minimized to
XFS uses extent maps to manage the blocks allocated to a file. An extent map is a starting block address and a length .
Unlike other file systems, XFS maintain the mapping of extents to file offsets thats contained by the extents in the B+ Tree. Which allow XFS to use
Since Directory entries are stored in the B+ Tree, name look ups are very fast.
To keep a RAID array busy, the file system should submit I/O requests that are large enough to span all disks in the array. In addition, I/O requests should be aligned to stripe boundaries where possible, to avoid
large as a contiguous range of blocks, it is critical that files are allocated as contiguously as possible, to allow large I/O requests to be sent to the storage. The key to achieving large contiguous regions is a method known as
sync re- quest. With delayed allocation, there is a much better approximation of the actual size of the file when deciding about the block placement on disk. In the best case the whole file may be in memory and can be allocated in one contiguous region. For today’s large file systems, a full file system check on an unclean shutdown is not acceptable because it would take too long. To avoid the requirement for regular file system checks, XFS uses a
Disk Quota management is another strong area of XFS.
XFS provides an enhanced implementation of the BSD disk quotas. It supports the normal soft and hard limits for disk space usage and number of inodes as an integral part of the file system. Both the
quotas supported in BSD and other Linux file systems are supported. In addition to group quotas, XFS alternatively can support project quotas, where a project is an arbitrary integer identifier assigned by the system administrator. The project quota mechanism in XFS is used to implement directory tree quota, where a specified directory and all of the files and subdirectories below it are restricted to using a subset of the available space in the file system.
The final area that XFS excels in is its support for parallel I/O. Supporting
locking was essential for XFS. Although most file systems allow the same file to be opened multiple times, there is usually a lock around the
to the file that bypasses the cache.
XFS offers an interesting implementation of a traditional file system. It departs from the standard techniques, trading implementation complexity for performance gains.
The xfs fsr utility defragments existing XFS file systems. The xfs bmap utility can be used to interpret the metadata layouts for an XFS file system. The growfs utility allows XFS file systems to be enlarged
XFS is modularized into several parts, each of which is responsible for a separate piece of the file system’s functionality. The central and most important piece of the file system is the space manager. This module manages the file system free space, the allocation of inodes, and the allocation of space within individual files. The I/O manager is responsible for satisfying file I/O requests and depends on the space manager for allocating and keeping track of space for files. The directory manager implements the XFS file system name space. The buffer cache is used by all of these pieces to cache the
contents of frequently accessed blocks from the underlying volume in memory. It is an integrated page and file cache shared by all file systems in the kernel. The transaction manager is used by the other pieces of the file system to make all updates to the metadata of the file system atomic. This enables the quick recovery of the file system after a crash.
XFS journals metadata updates by first writing them to an
read by the recovery code which is called during a mount operation. XFS metadata modifications use transactions: create, remove, link, unlink, allocate, truncate, and rename operations all require transactions. This means the operation from the standpoint of the file system
An important aspect of journaling is
With the above note on XFS, it’s hoped that the reader of this article got a
start on the exciting world of file systems!
About NetDiox Computing Systems:
NetDiox is a Bangalore (India) basedcenter of research and education focusing around OS technologies, networking, distributed storage systems, virtualization and cloud computing. It has been changing the way computer science is taught and applied to distributed data centers. NetDiox provides high quality training and R&D project opportunity on system software, Linux kernels, virtualization, and networking technologies to brilliant BE/M.Techs/MCA/MScs who are passionate about system/storage/networking software.
NetDiox R&D Team
Netdiox Computing System Pvt. Ltd.
Address: #1514, First Floor,19th Main Road,
Contact No: 08289884406, 09946711160