Navigation


Laboratory/GridComparisonPartI

Comparison of the grid/cloud computing frameworks (Hadoop, GridGain, Hazelcast, DAC) - part I

Introduction

Some time ago we faced a typical programming challenge: how to perform an enormous amount of computations as fast as possible? The answer is simple: divide the problem into smaller ones, compute them in parallel and gather results. The overall conception isn't complicated, so it can be achieved on a variety of ways.

One option is to do it by yourself. Simply set up an HTTP server to farm out requests RESTfully, and accept responses the same way. You can also use AMQP/JMS, RMI or even the old-school Corba - it depends only on your skills and imagination.

However, you do not have to reinvent the wheel. There are several open-source frameworks, which can be used almost out of the box. Simply download the library, adjust your sources and observe how fast your problems are solved. But which framework should you use? The answer isn't simple and it highly depends on your individual needs. We cannot state, that for example Hadoop will be always the best choice, but we can point out the advantages and disadvantages of several frameworks, so you will be able to choose the best one for you by yourself.

We have decided to give a try the following frameworks:

  • Hadoop
  • GridGain
  • Hazelcast
  • DAC

We believe, that the serious study should be as transparent as possible. This is why we describe all the methods and results and give you access to all the sources. You should be able to repeat all the tests and receive the similar results. If not, please contact us, so we can revise our report.

You can find all sources used during tests in our code repository:  http://dacframe.org/lab

This is part I of our comparison, where we concentrate on the task distribution time. There were no node failures nor transactions rollbacks.

Test environment

Our test environment consisted with 5 machines (named intel1 - intel5), each one with dual Quad-Core Xeon E5410 2.33GHz, 4GB RAM on board, which gave us 40 processing units. You can see the architecture of the test environment on the following figure:

Methodology

We based our benchmark on the mathematical problem known as 'counting monotone boolean functions' (also known as Dedekind's problem - you can find more information about it at ). Why such unrealistic problem? Because:

  • it is highly cpu-consuming (one cpu will need more than 800 hours to count all monotone boolean functions for N=8)
  • you do not need to solve the whole problem in a benchmark (we chose a fragment that a single cpu will compute in more than 3 hours)
  • it can be easily divided into an arbitrary number of tasks
  • generated tasks have different computational needs (it is a perfect use case for load balancers)

With a such flexible problem in hands, we had decided to prepare three test scenarios:

  • compute problem divided into 33700 tasks (CMBF with arguments: n = 4, level = 1000)
  • compute problem divided into 2705 tasks (CMBF with arguments: n = 4, level = 10000)
  • compute problem divided into 341 tasks (CMBF with arguments: n = 4, level = 100000)

All tests were repeated ten times in order to avoid measuring error.

Results - overview

You can find the average results on the following figure:

  • X-axis: number of tasks the problem was divided to
  • Y-axis: average time of the algorithm (in milliseconds)




As you can see, all frameworks obtained quite similar results. However, these are the best cases only (we performed two versions of the test for GridGain and Hadoop frameworks). Moreover, the best results were obtained in the middle test case (2705 tasks) with the exception of Hadoop, which gained the best time when there were 33700 tasks.

You will find the detailed methodology (sources, test environment description) and results (all performed test cases with std deviation and average values) on the following per framework pages:



Cumulative cost

We can compare the results also using cumulative cost metric, i.e. number of used processors multiplied by the average processing time. Moreover, we can compare cumulative cost with the cpu-time needed to compute CMBF on a single processor.

This is shown in the following figure:







CPU

CPU usage (%user) gathered on the intel1 machine:


Network

Network usage (received bytes/s) gathered on the intel1 machine:



Network usage (transmitted bytes/s) gathered on the intel1 machine:

Summary

The above part I concentrates on the task distribution time. Because of the test environment limitations (only 40 cpu units), all frameworks obtained quite similar results. However, Hadoop was distributing tasks 20%-30% slower than other frameworks, but Hadoop was designed to manipulate large data sets, so the above results are totally understandable.

In the upcoming part II we will concentrate on the fail-over capabilities of the selected frameworks.

Attachments