Ampool data pipeline on Plexistor SDM @ IMC Summit
posted on May 24, 2016 by Amit Golander
Today was the 1st day of In-memory Computing (IMC) summit at San Francisco. Approximately 350 Big Data fans gathered, exchanged ideas and walked by the different booths. Hopefully, they will all return tomorrow to attend our talk on the benefits of Memory and Storage convergence to in-memory computing.
In-memory computing frameworks using SDM can be split to two types:
- Existing IMC frameworks –
in which, the benefit should be seen out-of-the-box with no integration. This type represents most of our public white papers (refer).
- New IMC frameworks or those who are willing to put some thought into SDM integration.
Such an IMC framework is the focus of this blog post. The name of that framework is Ampool and they presented a demo at our booth.
First things first – Ampool?
Ampool is a startup building a distributed, in-memory, object store optimized for highly concurrent, fast analytical workloads. Ampool software is based on Apache Geode. It is a peer-2-peer architecture that is highly available and dynamically scalable based on data/query loads.
Ampool, instead of developing their own storage engine like many other IMC frameworks do, went ahead and adopted Plexistor SDM. Getting the best-in-class storage, while saving a lot of development effort and time as a side benefit. Similarly, they have integrated best-in-class data processing and querying engines such as Spark and esgyn as can be seen in the following figure.
Another nice property of the Ampool solution is that they use a unified & standard access across the Data Pipeline. Leveraging SDM this access is not only standard, but also blazing fast to access, cost-efficient, fully persistent and quick to rebuild when needed. The following figure illustrates how they move data between Ingest, ETL, Analytics etc. back to the consuming App.
Back to the demo at IMCS’16
Previously, Ampool showed that running on SDM they are 6x times faster than Tachyon for Spark, and 3-4x times faster than HBase for OLTP/OLAP.
IMCS’16 demo focused on ingest rate. A (transactional) write for a single server. 1000 K-V pairs each time.
The methodology used was using the same hardware and operating system. One time using Ext4 on Flash SSD and one time using Plexistor SDM. The demo screenshot speaks for itself
56x speedup on both throughput and latency.
Interestingly enough, the heap memory size and CPU utilization are far from being saturated. This implies that using stronger hardware would not have solved the problem.
If you’re in San Francisco tomorrow, you’re welcome to stop by the our booth, watch the demo and talk to the Ampool team.