Skip to main content

NLDSL

NLDSL supports data analysis (and in future, other domains) in Python/R with Domain Specific Languages (DSLs) for common operations. The DSLs are expanded during the editing into Python/R, and do not create any dependencies. NLDSL is currently availabe for Visual Studio Code.

Overview and installation

MRStreamer

MRStreamer is a MapReduce framework which provides an API (essentially) compatible with Apache Hadoop but enhances the latter with some advanced features:

  • online / streaming processing i.e. ability to output results before all input data is processed
  • efficient processing on shared-memory computing systems with capability to run the same user-defined code without changes.

Overview and download page

Spark4KNIME - Apache Spark workflows in KNIME

Spark4KNIME is a plugin for graphical "workflow-style" data analysis software KNIME. The plugin allows creating workflows for massively scalable data processing in a user-friendly manner, i.e. by connecting data feed and data processing operators in graphical GUI. Such workflows are executed in a local or a distributed mode via Apache Spark framework, a " fast and general engine for large-scale data processing". In effect, users with little programming experience shall be able to create scalable data processing and analysis workflows.

This project has been developed by Mr. Oleg Pavlov in a practical course at PVS.

Download source and executable at GitHub.