Apache Spark Integration
You can use Apache Spark to process large CSV and JSON source files in MashZone NextGen Explorer.
Apache Spark is a fast and general-purpose cluster computing system. MashZone NextGen Explorer version 10.0 ships with Apache Spark and integrates it to process large CSV and JSON source files and execute queries on those data sources. XML source files are currently not supported. When dealing with bigger source files, the query execution time is lower compared to MashZone NextGen Explorer default engine. Thus, the overall performance of MashZone NextGen Explorer will benefit in those cases. As a guidance, data sets with 500,000 lines and above should be processed using the now-integrated Spark engine.
By default, the MashZone NextGen Explorer installation runs the Visual Analytics Server without Apache Spark integration. To enable Apache Spark you have to perform the following steps.
Procedure
1. Open the config.json configuration file in a text editor.
2. Add the following code line and save your changes.
"spark.enabled" : true,
3. To run MashZone NextGen Explorer with Spark open a command line program and run <MashZone NextGen Explorer installation>\bin\vaserver --embeddedSpark
The default installation path is C:\SoftwareAG\MashzoneNG\VisualAnalytics.
MashZone NextGen Explorer runs with Apache Spark.