Apache Spark Small Talk
Apache Spark is sub project of Apache Hadoop. It is developed in University of California, Berkeley. Apache Spark was build for "Lighting faster cluster computing" as it's official web site says. By using Spark some issues in Hadoop was addressed well therefore Spark was popular in no time. It was open sourced in 2010 under BSD license. Main Components in Apache Spark Apache Spark has a few tightly integrated components as you can see here. As you may understand Spark Core has core functionalities like memory management, task scheduling, fault recovery etc. The main data abstraction Resilient Distributed Data set (RDD) is also defined in the Spark Core. Spark SQL, Spark Streaming Real Time, MLib(Machine Lerning Library), Graph X each component has unique and different functionalities. Since we focus more on Spark Core functionalities and concepts at this stage, we will dive into those in later episodes of this serise. Resilient Distributed Dataset ...