Open Big Data Infrastructure to Everyone

Konstantinos Tsakalozos, Cory Johns, Kevin Monroe, Pete VanderGiessen, Andrew Mcleod, and Antonio Rosales (Canonical Ltd)

The evolution of big data has increased the complexity of the respective software. Big data infrastructures require progressively more time and effort to setup, configure, maintain and integrate with existing systems. In absence of a big data “expert”, users are often discouraged from using such solutions. The option of consuming big data infrastructures as a service seems to be a viable one, yet it is not without drawbacks. Such an option a) is costly, b) often locks users down to a vendor, and c) is limited to what the vendor decides to make available

In this paper we present Juju, an open source service modelling approach by Canonical that addresses the above shortcomings. With Juju users can deploy and maintain their infrastructures to a rich variety of target environments that include almost any cloud, local machines (using containers & VMs), bare metal systems and any remote machine the user might have ssh access to. The Juju big data community makes sure that deploying big data infrastructures is as simple as running “juju deploy hadoop”, while interfaces among infrastructures allow for easy system integration. In this work we also show how the operational knowledge of complex software such as Apache Spark can be encapsulated in a few hundreds lines.