There 8 things while designing an Apache Spark enabled application. Porting to a SPARK hadoop eco-system is an important step that is dictated by the need for streaming capabilities and extreme speed of execution. Apache SPARK uses clustering algorithms and can be used with HDFS making it a composite architecture. Unless you understand the business process and the incoming data, it would be in-efficient to build such architecture. Remember, from bigdata volumes comes value and NOT traditional reports.
1.SPARK relies on in-memory execution of tasks and storage. Because of this nature, it is important that you design your system having this thought in mind. Processes need to be built with this in view.
2. These days, writing in Java could be more efficient from resource standpoint and from the point of view that Java does its own concurrency better. Just because you have several API’s built on SCALA it need not necessarily speed up your execution. Therefore it is worthwhile to think of writing in Java.
3. SPARK architecture, may it be in the cloud or standalone, as it uses the in-memory space for data and executors, think about the heap size. Increasing heap sizes continuously to get it executed may reduce efficiency.
4. Using User Memory is not recommended unless your architecture really demands it for some core extremely high speed streaming needs such as in the case of fraudulent activities where a huge segment is to be detected OR a failure of a system within your APPLICATION cluster.
5.Take advantage of Unified Memory Management. Spark 1.6.x and above needed. This type of management appears to be using memory in a more dynamic way where the executor and data can push the limits if needed rather than a failure.
6.Consider nodes as individual machines. This will help in your infrastructure planning because every Spark executor in an application has the same fixed number of cores and same fixed heap size.
7.Before using Mesos, consider using hadoop/yarn.
8.Architecture is an art. So imagine, understand, absorb,design, travel through the design, re-design and architect, test small, test big, implement by deploying it in cloud;perhaps this is an ideal case and go live.
Meet me at #DreamForce #df16 . Know how would it benefit you and how to fix the meeting at bit.ly/2dau9xq .
Thank you.