The Murchison Widefield Array (MWA) Data Archive consists of dataflow and storage sub-systems distributed across three tiers. At its core is the open source software - the Next-Generation Archive System (NGAS) that was initially developed by Andreas Wicenec and his colleagues at ESO. To meet the MWA data challenge, the MWA Archive team has tailored and optimised NGAS to achieve high-throughput data ingestion, efficient dataflow management, cost-effective data storage/access and processing-aware data migration.The MWA Dataflow consists of three tiers -- Tier 0 is where data are produced, Tier 1 is where data are archived, and Tier 2 is where data are distributed. See the full-size dataflow diagram [Created by Andreas Wicenec, last modified by Chen Wu on 2015-06-03].
Tier 0, co-located with the telescope at the Murchison Radio-telescope Observatory (MRO), consists of Online Processing and Online Archive. Radio frequency voltage samples collected from each tile are transmitted to the receiver, which streams digitised signals at an aggregated rate of 320 Gbps to Online Processing. Online Processing includes FPGA-enabled Polyphase Filter Bank (PFB) and a GPU-enabled software Correlator. The Correlator outputs in-memory “visibilities” that are immediately ingested by the DataCapture sub-system. DataCapture produces memory-resident data files and uses NGAS client to push files to Online Archive managed by the NGAS server.
At Tier 1 (Perth, a city 700 km south of the MRO), the Long-term Archive (LTA) periodically ingests visibility data stream from Online Archive (OA) via a 10Gbps fibre optic link (i.e. the shaded arrow from OA to LTA), which is a part of the Australian high speed National Broadband Network (NBN). The dotted arrow between OA and LTA represents the transfer of metadata on instruments, observations, and monitor and control information. This transfer is an asynchronous continuous stream supported by the cascading replication streaming environment. The current LTA storage facility — the Pawsey Hierarchical Storage Management (HSM) — is a combination of magnetic disks and tape libraries provided by the Pawsey Supercomputing Centre, whose mission is to foster scientific and technological innovation through the provision of supercomputing and eResearch services. Both data and metadata at LTA will be selectively transferred to a set of Mirrored Archives (MA) at Tier 2. The transfer between LTA and MAs is facilitated by Proxy Archive located at the International Centre for Radio Astronomy Research (ICRAR). An additional copy of the data at LTA can be migrated onto Offline Processing for running compute jobs such as calibration and imaging. Pawsey provides the data-intensive GPU-cluster Galaxy which used for offline data processing and which also schedules data movement.
Tier 2 is composed of mirrored archive facilities that subscribe to specific data products with Tier 1, and continuously ingests updated data streams of relevant data types on a regular basis. Currently, Tier 2 data archive facilities exist or are being constructed at: the Massachusetts Institute of Technology (MIT), USA; the Victoria University of Wellington (VUW), New Zealand; Raman Research Institute (RRI), India; and the University of Melbourne, Australia. Data in transit from Tier 1 to Tier 2 is carried over by the Australia Academic and Research Network (AARNET) across the Pacific Ocean. These MAs host a subset of the data products originally available in LTA and provide processing capabilities for local scientists to reduce and analyse data relevant to their research projects. While LTA in Tier 1 periodically pushes data to MAs in Tier 2 in an automated fashion, one can schedule ad-hoc data transfer from LTA to a Tier 2 machine via User Interfaces (UI). Web interfaces and Python APIs are available for MWA scientists to either synchronously retrieve or asynchronously receive raw visibility data. The MWA Archive team is currently developing the interface compliant to IVOA standards (e.g. SIAP, TAP) to access MWA science-ready data products including image cubes and catalogues.
The MWA Archive team currently has four ICRAR staff: Andreas Wicenec (lead), Chen Wu, Dave Pallot and Alessio Checcucci. They have collectively contributed 1 FTE time towards developing the MWA data system since February 2012.