Spark SQL-Performance Tuning
In this article I would like to share my understanding and experiences with Spark SQL since we just completed migrating 300+ tables in my domain, from Teradata based ETL processes to Spark SQLs. Backend data storage as HDFS.
Below are the couple of questions out of many. we encountered during this migration:
How can we maintain the distribution of data like we have in Teradata based on Primary Index?
Spark does support this feature using “distribute by” method. As the name reflects it distributes the data based on the…