Big data Sandbox

The "Sandbox" was created in 2014 in the context of the Big Data project overseen by the UNECE High-Level Group for the Modernisation of Official Statistics (HLG-MOS). It provides a shared Hadoop platform for statistical organisations to collaborate on evaluating and testing new tools, techniques and sources which could be useful for modern statistical analysis. The initial focus was to understand the importance of "Big Data" for official statistics, but it soon became obvious that the sandbox has many more uses including providing a collaborative platform for international teams of researchers and for training.

130+
Users
29
Different Countries

System Specifications

The Big Data Sandbox is intended to provide an introduction to using the various tools under the Hadoop ecosystem. As such, the hardware is modest in scale but sufficient for proof of concept type projects. The dedicated hardware details and the major components of the supported software stack are listed below:  

  • 4 Data/Compute Nodes, each with
    • 2 x 10core CPUs
    • 128GB RAM
    • 4 x 4TB SATA Disks
    • 56Gbit/s InfiniBand network
  • 2 Management/Login nodes
    • same spec as data nodes
    • + 10 Gbit/s Internet connectivity 
  • Software:
    • Hortonworks Data Platform
    • RStudio
    • Elasticsearch + Kibana

Funding model

The equipment was self-funded from the proceeds of ICHEC's commercial activities. The activity, including cost recovery for the capital investment) and running costs (data centre, software, staff time) is funded through an annual subscription model.