Light-Weight Clones — FAQ
posted on May 3, 2017 by Amit Golander
About a year ago, a light-weight clone API was standardized and added to POSIX. The storage-agnostic API empowers users, so that they can create a logically cloned file at close-to-zero time and capacity overheads:
# cp –reflink dir1/SrcFile dir2/CopyFile
Plexistor supports the POSIX-compliant LW-Clone API since our M1FS v2.0 release in 2016. Since then however, we had several repeating questions, which we would like to address in this FAQ. The three most frequently asked questions are:
- How are LW-Clones different than soft/hard links?
- Can LW-Clones be used in conjunction with direct mmap access (DAX)?
- How much is “close to zero” time and capacity?
How are LW-Clones different than Soft/Hard Links?
Soft and hard links do not create new files. Not even logical ones.
A hard link is an alias (i.e. the same file has two names that can be used interchangeably). A soft link is a stub file (i.e. the 2nd file only includes the indirection information: the path and name of the first file).
After a light-weight clone however, there are two independent files. Changing or removing one does not affect the other.
The following figure illustrates how two linked/cloned files are expected to behave according to the semantics used to create /dir2/file2 from /dir1/file1:
Can LW-Clones be used in conjunction with direct mmap access (DAX)?
Yes, all our features support both:
- Storage semantics (read, write system calls)
- Memory semantics (mmap followed by DAX load/store machine level instructions)
How much is “close-to-zero” Time and Capacity?
“Close to zero” is file system implementation dependent. For Plexistor M1FS, it is under 1us delay and 128 Bytes per cloned file. This is orders of magnitude faster than leading file systems such as XFS and BTRFS, even when using the most modern versions and NVMe devices.
The following table compares the LW-Clone latency for two types of files across the three modern file systems, using the same server and operating system version. Lower is better:
on NVMe & PM
Note for those wishing to reproduce the performance test:
Latency was measured by averaging out 1000 serial calls of ioctl FICLONE followed by fsync.