| Version 3 (modified by jeremian, 3 years ago) |
|---|
Comparison of the grid/cloud computing frameworks (Hadoop, GridGain, Hazelcast, DAC) - part II
Introduction
What would happen, if 60% of your cloud suddenly goes down? Can you rely on the 'fail-over' capabilities of the framework of your choice? What about consistency of your data? How big would be the performance impact of node failures? Continuing our experiments from the previous article, Comparison of the grid/cloud computing frameworks (Hadoop, GridGain, Hazelcast, DAC) - part I, we have decided to give a try the following frameworks:
- Hadoop
- GridGain
- Hazelcast
- DAC
As always, we describe all the methods and results and give you access to all the sources. You should be able to repeat all the tests and receive the similar results. If not, please contact us, so we can revise our report. Moreover, taking into consideration comments from the previous article, we have added detailed cpu/memory/network usages from all the machines in our test environment.
You can find all sources used during tests in our code repository: http://dacframe.org/lab
This is part II of our comparison, where we concentrate on the fail-over capabilities. There were serious node failures (up to 60% nodes went down) and multiple transactions rollbacks.
Test environment
Our test environment consisted with 5 machines (named intel1 - intel5), each one with dual Quad-Core Xeon E5410 2.33GHz, 4GB RAM on board, which gave us 40 processing units. The only difference between current test environment and the one used in part I of our comparison is the JVM version, which has been updated to 1.6.0_18. You can see the architecture of the test environment on the following figure:
Methodology
We based our benchmark on the same mathematical problem as in part I of our comparison. Because of that, we can easily compare results from both tests, which gives us more wider view on the given frameworks.
In this 'fail-over' comparison we used only one test scenario:
- compute problem divided into 2705 tasks (CMBF with arguments: n = 4, level = 10000)
We have simulated node failures during our tests in the following order:
- intel4 went down after 60 seconds from the beginning of computations
- intel3 went down after 180 seconds from the beginning of computations
- intel2 went down after 300 seconds from the beginning of computations
All tests were repeated ten times in order to avoid measuring error.
Results - overview
We have compared the following aspects:
- average time of computations
- cumulative cost (time of computation multiplied by the amount of available processing units)
- cumulative cost – difference with CMBF (difference with the optimal solution: single-threaded version of CMBF)
- total CPU usage
- maximum memory usage
- total network usage
Average time
Show <-->
| Test I | Test II | Increase (%) | |
|---|---|---|---|
| GridGainEx 2.1.1 | 338 310.40 | 568 426.60 | 68.02 |
| Hazelcast 1.8 | 321 922.70 | 501 223.60 | 55.70 |
| DAC 0.9.1 | 299 815.70 | 881 464.00 | 194 |
| Hadoop 0.20.1 | 384 331.60 | 1 307 526.80 | 240.21 |
Cumulative cost
Show <-->
| Test I | Test II | Increase (%) | |
|---|---|---|---|
| GridGainEx 2.1.1 | 13 532 416.00 | 13 414 825.60 | -0.87 |
| Hazelcast 1.8 | 12 876 908.00 | 12 339 577.60 | -4.17 |
| DAC 0.9.1 | 11 992 628.00 | 18 423 424.00 | 53.62 |
| Hadoop 0.20.1 | 15 373 264.00 | 25 240 428.80 | 64.18 |
Cumulative cost – difference with CMBF
Show <-->
| Test I (%) | Test II (%) | Delta (%) | |
|---|---|---|---|
| GridGainEx 2.1.1 | 13.83 | 12.84 | -0.99 |
| Hazelcast 1.8 | 8.31 | 3.79 | -4.52 |
| DAC 0.9.1 | 0.88 | 54.97 | 54.09 |
| Hadoop 0.20.1 | 29.31 | 112.31 | 83.00 |
Cumulative cost – CPU, memory and network usage
Show <-->
| CPU (user) | CPU (system) | memory (MB) | transmitted kB | |
|---|---|---|---|---|
| GridGainEx 2.1.1 | 149 928.21 | 835.62 | 1 235.52 | 3 228.94 |
| Hazelcast 1.8 | 142 066.11 | 425.79 | 1 997.57 | 768.79 |
| DAC 0.9.1 | 148 730.82 | 1 229.16 | 14 416.35 | 2 204.93 |
| Hadoop 0.20.1 | 168 922.41 | 1 587.27 | 4 327.75 | 1 450.08 |
You will find the detailed methodology (sources, test environment description) and results (all performed test cases with std deviation and average values) on the following per framework pages:
CPU
Average CPU usage (%user) gathered on all machines:
Average CPU usage (%system) gathered on all machines:
Memory
Average memory usage gathered on all machines:
Network
Average network usage (received bytes/s) gathered on all machines:
Average network usage (transmitted bytes/s) gathered on all machines:
Summary
The above part II concentrates on the fail-over capabilities. All frameworks properly handle node failures, but we had to slightly modify our code for Hazelcast to catch new exceptions (other frameworks resubmit invalid tasks by default). Taking the above results into consideration, we can infer the following conclusions:
- Hazelcast and GridGain are the best choice for an easily-parallelized, low-data, CPU-intensive tasks. Moreover, they are even better choice, when some unexpected node failures can happen.
- Hazelcast consumes the smallest amount of CPU and network bandwidth
- GridGain consumes the smallest amount of memory
- Hadoop was designed to manipulate large data sets, so the above not the best results are totally understandable
- DAC with its default settings do not handle node failures efficiently



