Brief Introduction of Cluster


Cluster When data are stored in hard disk, they make cluster as a unit. So no matter the file is large or small, there will be some unused space in the last cluster (unless the size is integer times as large as the cluster size). Furthermore, the left space can not be used by other files (even if the file is only 0 byte. It does not allow 2 or more files to share a cluster, because it may cause data corruption.)

In Microsoft's operating system (like DOS, WINDOWS, etc.), the smallest unit of file management and storage, is called "cluster".

A file is usually stored in one or more clusters. But it should occupy a "cluster" at least. That is to say two files cannot be stored in the same cluster.

The original meaning of cluster is "a group of" or "a group". It means a set of sectors (a track can be divided into several arc with the same size. These arcs are called sectors).The unit of sector is too small, so we put it together to form a larger unit to do more convenient and flexible management. Often, we can change the size of a cluster. It is regulated by the command "(advanced) format” in the operating system. So the management is more flexible.

In general, file is like a family, and the data is man (or family members). The so-called cluster is some unit suites. Sector composes these unit suites. A family can live in one or more units, but a unit suite can only hold one family.

The file system is a connection between operating system and driver. When operating system needs reading a file from the hard disk, it will request the corresponding file system (like FAT16/FAT32/NTFS) to open the file. Sector is the smallest physical disk storage unit. But the operating system can not address the numerous sectors, so the operating system will put the adjacent sectors together. By this way, it can form a cluster and then manage the cluster. Each cluster can include 2, 4, 8, 16, 32, or 64 sectors. Obviously, the cluster is the logical concept (the physical properties of the disk) used by operating system.

In order to get better management to the disk space and read data from the hard disk more effectively, operating system rules that one cluster can only hold content of one file. Thus space occupied by the file can only be integer times of the clusters. Even if the file's actual size is smaller than a cluster, it will take up a cluster. If the file's actual size is more than a cluster, according to the logical calculation, the file must be accounted for two or more clusters. So, the file usually occupies slightly larger space than its actual size. Only in rare cases where the file's actual size is an integer times of the cluster, the file's actual size is exactly the same as that of the occupied space.