High performance and data-intensive computing and networking technology has become a vital part of large-scale scientific research projects in areas such as high energy physics, astronomy space exploration and human genome projects. One such example is the Large Hadron Collider project at CERN, where four major experiment groups will generate an order of Petabyte of raw data from four big underground particle detectors each year, data acquisition starting in from 2006.Grid Technology will play an essential role in constructing world wide data analysis environments where thousands of physicists will collaborate and compete for the particle physics data analysis at the energy frontier. A multi-tier "Regional Centers "world wide computing model has been studied by the MONARC project. It consists of Tier-0 center at CERN, multiple Tier -1 centers in participating countries tens of Tier-2 centers in participating countries and many Tier-3 centers in universities and institutes.
Grid Data Farm is a Petascale data-intensive computing initiated in Japan. The underlying hardware will be a thousands node scale PC cluster, each node facilitating a near Terabyte of storage, and incoming data of approximately continuous 600 Mbps b/w from CERN will be systematically stored and will be subjected to intensive processing .The grid data farm will facilitate the following features for collider data processing as well as serving as a frame work for other types of data -intensive scientific applications.
Major components of the Grid Data Farm are the Gfarm client, the Gfarm server and the Gfarm (distributed filesystem with Gfarm parallel I/O .The Gfarm filesystem consists of a thousands node scale PC cluster each node with a local disk and possibly distributed over the Grid, and Petascale data are distributed across the disks in the Gfarm filesystem managed by the Meta Data Management System and the Gfarm Filesystem Daemon. The Meta Data Management System provides a mapping from
" Global distributed filesystem for Petabyte scale data,
" Parallel I/O and parallel processing for fast data analysis,
" World-wide group-oriented authentication and access control,
" Thousands-node,wide-area resource management and scheduling,
" Multi-Tier data sharing and efficient access,
" Program sharing and management,
" System monitoring and administration,
" Fault tolerance /dynamic reconfiguration /Automated data regeneration or re-computation
Logical files names to the distributed physical components and also stores meta data such as a replica catalog and a history that is necessary to reproduce the data .The Gfarm file system daemon provides a facility of remote file operations with access control as well as remote program loading and resource monitoring. Large scale distributed data are accessed by the Gfarm parallel I/O and processed in parallel. Grid Farm middleware is based on Grid based RPC (GridRPC), an extended variant of the Ninf system and other lower level Grid service middleware such as Globus.It makes easy for us to register analysis software for large-scale data processing Load balancing, Job Scheduling, Fault Tolerance, and Data Maintenance are transparently or semi transparently handled by the system using simple GUIs or a simple shell front end. More sophisticated client interaction is possible using Grid RPC