SATA-RAID and ZFS: Infortrend/ADVUNI OXYGENRAID: top or flop?

SATA RAID devices have become quite popular as they have a low pricetag for the storage size offered. SATA disks are said to have much lower input/output operations per second capability compared to SCSI/SAS or FC devices. Public opinion is also that SATA devices are failing much earlier than their SCSI/FC counterparts.

And then there is Sun's new filesystem and volume manager system: ZFS. ZFS also is capable to create RAID groups of its own - how does it perform with SATA devices?

I had the opportunity to have an Infortrend aka Advanced Unibyte OXYGENRAID device with 16 spindles à 750 GB each for three days to test in ZFS environments. Not all tests I wanted to made could be made in that short time. This isn't meant as a scientific work because I did not have the time take three or four results per configuration to guess the error range of my results. So take the following numbers as a first impression.

So let's begin.

Test equipment

  • Sun X4200 M2 dual opteron server with 20 GB RAM, 10 GB used for ARC cache
  • 2 Sun (Q-Logic) FC cards installed, 4 Gbit/sec capable, connected through a SAN switch fabric to the raid device
  • SAN switch fabric without other traffic (isolated)
  • Solaris 10 x86 with Kernel 127112-11 installled (all relevant ZFS patches as of 2008/05/12).
  • Advanced Unibyte (OEM Infortrend) OXYGENRAID, Firmware 4F614, 16 x 750 GB SATA Seagate ST3750640AS Barracuda 7200.10

This was a real world test, so caches on the system (ARC) and on the RAID device are ON. I did explicitly not want a lab benchmark with all caches turned off.

Test configurations

The following configurations were tested:
  • 1x1: single disk, configured as JBOD/non-RAID, for reference
  • 1x2r1: RAID1 (mirror) with 2 disks
  • 1x2m: ZFS mirror with 2 disks as JBOD/NRAID
  • 1x3r5: RAID5 with 3 disks
  • 4x3r5: 4 RAID5 sets with 3 disks each, striped via ZFS
  • 1x6r5: RAID5 with 6 disks
  • 1x6z1: ZFS raidz1 with 6 disks (JBOD/NRAID)
  • 2x6r5: 2 RAID5 sets with 6 disks each, striped via ZFS
  • 1x12r5: RAID5 with 12 disks
  • 1x12z1: ZFS raidz1 with 12 disks (JBOD/NRAID)

All devices have been accessed via Sun's scsi_vhci (MPxIO) driver, logical blocksize 20 (1 MB). For the 12-disk-raidz1-Test, 6 disks were configured per Infortrend controller, because one Infortrend controller can only handle up to 8 LUNs.

A side note for raidz1/JBOD configurations: I wanted to know whether raidz1 is usable with this RAID device by configuring the disks just as JBOD. I did not want to test the usability of ZFS' raid implementation in general. Infortrend's NRAID/JBOD implementation is 'in' these numbers, too.

Test method

Sun's filebench tool was used to generate the results.

The following filebench personalities and parameters were used:

  • multistreamread
    • $filesize=10g
  • multistreamwrite
    • $filesize=10g
  • varmail
    • $filesize=10000
    • $nfiles=100000
    • $nthreads=60
  • oltp
    • $filesize=4g

All tests were run for 300 seconds ("run 300").

ZFS was used as filesystem in all scenarios.

multistreamread


The result is not very impressive, more spindles mean more i/o bandwidth. More interesting is the fact that ZFS' mirror mechanism performs better with sequential reads than the RAID1 mirror implemented in the raid device. A 12 disk RAID5 configuration performs a little bit better than a 4-way-ZFS-stripe of 3-disk-RAID5-devices. The differences are not substantial. It seems that 85 MB/s are somewhat of a bottleneck for this raid device. Note that ZFS' raid configurations have a slightly lower bandwith.

The next graph compares used cpu time:


Again - nothing impressive. As ZFS' raid implementation uses the system's CPUs it will use more cpu time compared to the access to the RAID devices handled by the RAID system.

multistreamwrite


Write numbers are more interesting - it seems that our RAID device does do writes very well - better than reads. Note that even with a RAID1 (mirror) configuration, write bandwith is higher than in the single disk configuration. ZFS' mirror does not enhance the write bandwidth in any way. Infortrend's write back cache seems to do a very good job.


CPU time numbers are somewhat similar than the read numbers, again the cpu overhead for zfs' raid implentations is visible.

varmail

The varmail scenario is a heavy random i/o scenario with many files in one directory with concurrent access of many threads. If you don't want to use a RAID system for single user video streaming, read on.


The graphic above shows the number of total operations per second measured. If you like to have real read/write operations, this is the graph for you:


Main results:

  1. ZFS' mirror does not enhance performance.
  2. More spindles mean more operations per time unit.
  3. ZFS concatenation/stripe algorith introduces a performance decrease.
  4. ZFS' RAID implementation in conjunction with Infortrend's JBOD setting is not recommendable for this scenario.

More spindles effectively reduce latency. ZFS' raid latency is bigger than Infortrend's. 19ms is the lower limit - 5 times the value stated by Seagate for one of these disks (Barracuda 7200.10).

The amount of CPU time used by ZFS' raid implementation is not a big hit (times in microseconds):



oltp

Now, a more complex scenario. "oltp" simulates a transactional database (like Oracle, Postgres, ...) with (very) small database updates, a common shared memory mapped region and a transaction log file. 230 threads are running in parallel. First result: Do not use this RAID for that kind of workload - and use many spindles in case you have to do so - and bury immediately any idea to use ZFS' raid implementation raidz1 with this Infortrend device.


  1. raidz1 (ZFS raid) does not scale at all with the number of spindles.
  2. ZFS' concatenation/stripe algorithm performs very well with this kind of workload.
  3. These Seagate SATA disks seem to be able to handle 100 oltp ops per second. FC and SAS disks should handle more than 100.

The last graph shows cpu usage:


CPU time overhead of ZFS raidz1 is considerably high for this kind of workload.

Notes

In conjunction with this Infortrend device, ZFS raidz1 is performing very badly besides sequential access patterns. As many sources on the internet are stating thet ZFS raidz IS in fact very fast, it must be a rather dull JBOD implementation of this Infortrend device - as it is sold as a hardware RAID device, nobody wants to use them as JBOD cases - it's not the job of that box.

For comparison, I would like to make this test with the same hostsystem with other devices. Main problem: I don't have access to other FC storage devices at the moment. I'll have an old LSI Fiberchannel disk system with real FC disks next week, so I will be able to make filebench tests.

The tests were done in the datacenter of University of Konstanz.

0 TrackBacks

Listed below are links to blogs that reference this entry: SATA-RAID and ZFS: Infortrend/ADVUNI OXYGENRAID: top or flop?.

TrackBack URL for this entry: http://southbrain.com/mt/mt-tb.cgi/28

2 Comments

Thank you for posting your results. I have been working with these Infortrend raids and ZFS for a while now. We did get really low performance until we added:

* NOTE: Cache flushing is commonly done as part of the ZIL operations.
* While disabling cache flushing can, at times, make sense, disabling the
*ZIL does not.
* If you tune this parameter, please reference this URL in shell
* script or in an /etc/system comment.
* http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#FLUSH
set zfs:zfs_nocacheflush = 1

to the /etc/system. I had talked to infortrend about how to turn off the syncs on their end (to ignore them as other raids allow) but they say there is no way, so I have to live on the danager side with a good UPS to back it up :)


We did that setting (not only for this test). Otherwise the numbers would have been too abysmal. I made a note on that setting also on this blog.

The device is running fairly well, we use it as a big mail storage for a campus mail system (University of Konstanz, Germany).

December 2015

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

About

This blog is owned by:

Pascal Gienger
Jägerstrasse 77
8406 Winterthur
Switzerland


Google+: Profile
YouTube Channel: pascalgienger