EMC released a free version of its Greenplum database with analytical tools for scientists and developers.

The Greenplum Database Community Edition (CE) includes its massively parallel processing (MPP) database product for data warehousing and tackling Big Data.

The free EMC Greenplum CE also includes an open source algorithm library, and Madlib, which includes mathematical and statistical tools to manipulate data. It includes analytical models like Naïve-Bayes, Linear regression, logistic regression and K-means clustering.

It also offers Alpine Miner, a graphical workflow model builder for data mining that delivers rapid modeling to scoring features.

The EMC Greenplum database works on a shared-nothing massively parallel processing (MPP) architecture whereby data is allocated to different segment servers and the distinct portion of the data is from the overall data owned and managed separately by the segment server.

Luke Lonnergan, co-founder and CTO of EMC Greenplum said: This project is about empowering developers - they can program using the most popular tools and they have a place to contribute open source extensions to the stack.

The product can be used for free for research and development, but Greenplum commercial licenses will be required if the code is used for internal data processing or on servers larger than a single physical server with up to two CPU sockets or a single virtual machine with up to eight virtual CPU cores.