Playing with ZFS on Linux

At Virtual Instruments we are buying a new SAN storage array and one of the companies that we are evaluating is Tegile. One of the features of this product is that it has in-line compression and de-duplication because on the back-end all of the data is stored via ZFS. The compression and de-dupe in-turn give the hybrid SSD/mechanical disk array very good $$/GB numbers.

As part of a potential POC one thing we’d be looking at was the space savings allowed by compression & de-dupe to see if the numbers matched the sales hype.  Of course getting the array physically delivered and moving data onto it is somewhat time intensive and as a little experiment I though it’d be fun to fire up my own ZFS filesystem with compression and de-dupe enabled to see what kind of numbers we’d get for our data set.

For the filesystem source data I figured we could use some builds artifacts like the JARs stored on our Nexus artifact repository, or several of our product builds which comprise of many GB of RPMs, ISOs, OVAs, etc.

Here are my notes for setting up ZFS on Ubuntu 12.04:

# install ZFS for Ubuntu
$ sudo apt-get install python-software-properties 
$ sudo apt-add-repository ppa:zfs-native/stable
$ sudo apt-get update
$ sudo apt-get install -y ubuntu-zfs

# Get the disk UUID
# NOTE:  If you're using VMware, you need to modify the .VMX file to enable disk UUIDs,
# otherwise the "/dev/disk/..." aliases in the Ubuntu guest OS will not be created.
# See: https://github.com/zfsonlinux/pkg-zfs/wiki/HOWTO-install-Ubuntu-to-a-Native-ZFS-Root-Filesystem#v-vmware
$ ls -l /dev/disk/by-id 
...

# Create our first zpool using the UUID that came from the `ls` command
# The "ashift" property forces the zpool to use 4k blocks instead of
# the default 512B.
# See: http://zfsonlinux.org/faq.html#HowDoesZFSonLinuxHandlesAdvacedFormatDrives
$ sudo zpool create -f -o ashift=12 data /dev/disk/by-id/wwn-0x6000c297dd61c17d96ed4016a53bd246

# BA-DA-BING! Your ZFS pool will be mounted under the root partition
# using the pool name you specified above
$ df -klh /data
Filesystem      Size  Used Avail Use% Mounted on
data             98G  128K   98G   1% /data

# COMPRESSION
$ sudo zfs set compression=lz4 data
$ sudo zfs get compressratio
NAME  PROPERTY       VALUE  SOURCE
data  compressratio  1.00x  -
...  ...
<write several GB of data>
... $ sudo zfs get compressratio NAME PROPERTY VALUE SOURCE data compressratio 1.88x - # The "du" command will show you the on-disk (compressed) size of a file # DEDUPE # First you can simulate your deduplication savings! :-0 $ sudo zdb -S data Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 29.3K 361M 188M 257M 29.3K 361M 188M 257M 2 544 3.41M 1.52M 2.91M 1.09K 6.89M 3.10M 5.95M 4 37 208K 128K 232K 178 1002K 606K 1.08M 8 17 21.5K 14.5K 68K 173 198K 138K 692K 16 4 2K 2K 16K 87 43.5K 43.5K 348K Total 29.9K 365M 190M 260M 30.8K 369M 192M 265M dedup = 1.02, compress = 1.93, copies = 1.38, dedup * compress / copies = 1.42 # Then you can turn dedupe on, but it will only effect newly written data $ sudo zfs set dedup=on data ...... # Get stats about dedupe. # Note that we didn't get much dedupe here :( $ sudo zdb -b data Traversing all blocks to verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 41407 bp logical: 429343232 avg: 10368 bp physical: 215205376 avg: 5197 compression: 2.00 bp allocated: 368242688 avg: 8893 compression: 1.17 bp deduped: 5054464 ref>1: 602 deduplication: 1.01 SPA allocated: 363188224 used: 0.34% # Other useful commands $ sudo zfs get all data # displays all kinds of properties and their settings # Modify properties # ex: Disable the recording of access time in the zpool: $ zfs set atime=off zpool # Verify that the property has been set on the zpool: $ zfs get atime NAME PROPERTY VALUE SOURCE zpool atime off local

Here are all the useful links I read through for this experiment:

Leave a comment