High-Performance Software Solutions for Extreme-Scale Petroleum Seismic Data Processing
Seismic datasets are generally terabytes in size and complicated file formats can make handling The ExSeisDat project addresses the high-performance I/O bottle necks associated with seismic processing, focusing on the Oil and Gas sector, by leveraging state-of-the-art parallel techniques with I/O hardware and software technologies. It seeks to increase I/O performance and general geophysical development velocity by providing simple to use APIs so that end users can focus on domain specific development instead of I/O performance. The consists of a low level parallel I/O library (ExSeisPIOL) and a high level seismic workflow library (ExSeisFlow), both of which have C and C++ implantations and can handle multiple file formats, including SEG-Y and seis. They incorporate hardware specific optimisations, which can drastically increase performance when tuned to utilise existing hardware features. These optimisations are not dependent on any one specific hardware, but can be called during initialisation using compiler flags; the libraries are then internally tuned automatically. The ExSeisDat libraries are designed to handle medium to very large data files (1-10,000s of GB), allowing geophysicists to analyse seismic data on an extreme scale without being limited by I/O.
Both libraries will be released in September 2017 under a BSD license.
ExSeisPIOL addresses parallel I/O in a simple, user friendly manner. The end user just needs to specify the memory limits, data access pattern, and decomposition across processes while the API internally handles parallel I/O details, including constraints on asynchronous vs synchronous and collective vs non-collective I/O. It allows for file-format agnostic access to all data such as individual traces and parameters like source coordinates. The library can be integrated into existing seismic workflows, allowing geophysicist to focus on domain specific development without concerns about I/O specifics and distribution of data across multiple processors.
ExSeisFlow leverages the I/O performance from ExSeisPIOL to perform seismic processing operations, including 4D binning, trace sorting, noise filtering, and interpolation and regularisation. While ExSeisPIOL requires some user input to access data and distribute it across multiple processors, ExSeisFlow implicitly manages all I/O, as well as other operation details. It internally determines and optimises data access patterns, caching, memory management, and operation scalability. It maximises I/O performance by only reading in data relevant to the operations performed. For example, when sorting traces the library only reads in the appropriate header data to perform the sort, drastically decreasing the overall runtime when compared to reading an entire seismic file. I/O optimisation is also supported through a just-in-time model, allowing for multiple operations to be performed in a single workflow and inter-operation optimisation.