fast general purpose distributed computing platform
spark make efficient use of memory and can execute equivalent job 10 to 100 times faster than hadoops mapreduce
spark creators managed to abstract the fact that one is working with a cluster of machines and instead seem as if working with a set of collections-based API's
definition
spark is a unified computing engine and is a set of libraries for parallel data processing on computer clusters
it supports widely used programming languages - py, java, scala, r
sql to streaming ml
run from laptop to multiple clusters
easy system to start with and scale-up to big data processing or large scale
meaning
unified - it supports wider range of data analytics tasks
simple data loading
sql queries
machine learning and streaming computation
computing engine
spark handles loading data form storage systems and performing computation on it