I've done for MySQL Users Conference almost two years ago were done on RH AS 3 and 8 Drive SATA 7200 Drives connected via 3Ware 8500 or something.
In that configuration RAID5 looked horrible compared to RAID10 showing absolutely terrible performance.
This time I've tested on Dell PowerEdge 2850 with 6 10000RPM SCSI drives connected to embedded RAID (LSI MegaRaid) with 256MB of cache and battery backed up. Operation system was CentOS 4.2 x86_64, ext3 file system.
I've used SysBench this time to get to the roots of performance, rather than adding MySQL on top which could possibly have its own problems of loading
drives efficiently. Here is the run option I've used after "preparing" 32GB dataset:
./sysbench --num-threads=128 --max-requests=1000000 --file-total-size=32G --file-extra-flags=direct --test=fileio --init-rng=1 --file-test-mode=
Mode was "rndrd" - random reads, "rndwr" - random writes and "rndrw" - random read/write mix
On RAID5 I first tested performance of different schedulers: Results are in IOs/SEC
First I should note all schedulers have recently improved or other way become much closer. A while ago anticipatory (AS) scheduler was much slower
than counter parts. now it is OK unless you're doing RW mix... which is quite common however. CFQ and Deadline got close with DEADLINE being just
a bit faster. Some people tell CFQ may give better latency so I'd say - run your own benchmarks and what works best for you.
Now lests look at number. 10000 RPM drive does 166 rotations/sec - if we assume next IO will require drive to rotate 180 degrees for IO in average
and we have some 1ms seek time we should end up with some 4ms latency or 250 Random IO/sec per drive. For reads this should give us some 1500 IOs/sec.
With Deadline scheduler we however get some 341 IO/sec - much more than expected. With 32GB data set and O_DIRECT we only have 256MB of cache
on the RAID which should not give such boost. May be 128 threads allow decent optimization ? I still think it is too good to be true but
it seems to be the case.
Now look at Writes - We have write speed almost 6 times worse than Reads. This is random writes. Sequential writes are decent in RAID5.
So we have 61 IO/Drive in our RAID5 setup. With expected 2 reads+2 writes for random writes at RAID5 volume it is just where our theory brings us
250/4=62.5 - quite close to what we get.
In Read/Write workload we get some 125 IO/sec per device which is decent. We have Read/Writes Ratio=1.5 So it is some
50 writes/sec + 74 reads/sec per device. As each write is actually 4 separate IO requests we end up with 270 IOs/sec per device
which is once again close to drive abilities.
Now lets see what happens with RAID5 in degraded mode (if we pull one of the drives out) as well as in recovery mode (when it resyncs to the new drive) I'll just use DEADLINE scheduler for this test
|Regression||1.78||1.12||1.26 (How much slower it became)|
So READ performance on RAID5 drops a lot in case one of disks is out - plan on it. With 6 drives there is 1/6 probability for IO to be done from
missing/failed drive but each of these IOs will have to do 5 IOs to recover the data using checksum. So the expectation of number of IOs is
5/6*1+1/6*5 = 1.67 - number of IOs we really need to do on degraded RAID5 volume with 6 drives for each submitted random IO. So Our 1.78 times
lower performance is quite expected.
Number of writes drops less - we already had to do 4 IOs for each write, now we just need to do a bit more, I'll not count it here as there are few cases for writes. The Read/Write workload is somewhere in the average.
What do you think happens for RAID10 ? I do not have data handy, but from the glance view it should suffer less as in worse case scenario you just
get 1 of drives instead of two for 1/3rd of the blocks. This is right but also wrong.
In practice performance is limited by bottleneck - which is one hard drive in this case. Think about it lets say you had workload which required
you do do 500 reads/sec, which come from the chunks which are stored on certain 2 mirrored drives in RAID10 volume. Now one of these drives do and
you only able to do 250 reads/sec... so your workload becomes twice as slow if other spans of RAID10 volume are not overloaded.
In RAID5, job of data recovery in degraded mode, even if it is more complicated, is spread across all remaining devices, evening their load.
Now lets see which data we have for Recovery state - when failed drive is replaced with fresh one and needs to be resynced
As we can see READ performance barely decreased from degraded mode, while write and read-write workloads slowed down more, but still just 10% more.
It may look strange from the glance view but it can be explained by very simple fact - you can run recovery in the background with as idle priority
as you like. I guess this array is just configured to try to do it as gentle as possible. This however means it may take long hours before
drive is resynced increasing chance of failure.
I've also did these tests for deadline only:
|Raid5 Ratio||0.70||1.66||1.33 (Times faster)|
As we can see with RAID10 number of reads is 249 IOs/sec per device - which is quite close to the one which we computed. I have no good idea why
it is so much slower than RAID5 even with 128 outstanding threads. In theory it should be even a bit faster. Might be there are certain optimizations
in this RAID array which cause this effect. I do not know - Hope I'll have a chance to run this benchmark for different arrays and see if they show
Writes are seriously faster than for RAID5, about 102 IO/sec per device, which is however not as good as it should be in theory (250/2=125 IO/sec)
In RAID10 for each write data has to be written on 2 devices.
So my impression is as RAID10 in this setup is not as well optimized as RAID5.
Array Selection. So what to chose RAID5 or RAID10 ?
Of course all depends on the application - Is space an issue ? Do write constitute serious portion of your workload ?
Which types of reads/writes do you have (RAID5 does much better for sequential writes). Is reliability an serious concern ?
How many space do you have in RAID enclosure or case itself ?
For our case if we would estimate RAID5 scales linearly with number of drives in it (which is not exactly the case) - we'll need 1.7*6~=10 drives
to get the same write performance as RAID10 delivers on 6 drives. It would however require 10 drives for RAID10 to get the same space as RAID5 delivers on 6 drives.
I also should put a note about failure rates. The failure of second drive before first one is replaced is probable, especially if you save on
hot spare. RAID5 is in bad shape here - with full loss of second drive you always loose your data. With RAID10 having 6 drives the probability
is only 1/5 - if drives fail in other mirrors they continue to operate on one drive.
One good site on this matter I could advice (which is kind of anti RAID5 biased however) is http://www.baarf.com/