Big data and Better algorithms


What is H2O?

H2O makes hadoop do math! H2O scales statistics, machine learning and math over Big Data. H2O is extensible and users can build blocks using simple math legos in the core. H2O keeps familiar interfaces like R, Excel & JSON so that big data enthusiasts and experts can explore, munge, model and score data sets using a range of simple to advanced algorithms.

Data collection is easy. Decision making is hard. H2O makes it fast and easy to derive insights from your data through faster and better predictive modeling.

What does H2O do?

Adhoc exploration of big data

  • Slice big data to test and train, verify assumptions in data.
  • R and Python-like syntax for the Data Manipulation Console.

Modeling engine with high-powered math algorithms

  • Classification
    • Random Forest
  • Regressions
    • GLM/GLMnet
    • Parallel grid search on the parameter space of the regression method
  • Clustering
    • K-Means

Real-time scoring

  • Ensembles
  • 100's of Models
  • 100's of Nanoseconds
  • Embeddable online and offline scoring

What is the interface?

REST-API and JSON allows connecting via MS Excel, google-style search bar and integrated R environment for Data Analysis. R syntax is default for statistical functions.

How is H2O related to Hadoop?

H2O brings database-like interactiveness to Hadoop. It can be installed as a standalone or on top of existing Hadoop installation. It leverages data in HDFS and other familiar components in the Hadoop ecosystem such as Hive and Pig.

Why H2O?

Existing Big Data stacks are batch oriented. Search and analytics need to be interactive. Use machines to learn machine-generated data. And more data beats better algorithms.

H2O Datasheet

Find a concise summary of the critical information about H2O on a data sheet available here:  H2O Datasheet