Spark4KNIME - Apache Spark workflows in KNIME

Spark4KNIME is a plugin for graphical "workflow-style" data analysis software KNIME. The plugin allows creating workflows for massively scalable data processing in a user-friendly manner, i.e. by connecting data feed and data processing operators in graphical GUI. Such workflows are executed in a local or a distributed mode via Apache Spark framework, a " fast and general engine for large-scale data processing". In effect, users with little programming experience shall be able to create scalable data processing and analysis workflows.

This project has been developed by Mr. Oleg Pavlov in a practical course in Summer 2015 at PVS.

Download source and executable at GitHub.

MRStreamer is a MapReduce framework which provides an API (essentially) compatible with Apache Hadoop but enhances the latter with some advanced features:

  • online / streaming processing i.e. ability to output results before all input data is processed
  • efficient processing on shared-memory computing systems with capability to run the same user-defined code without changes.

Overview and download page