Exploiting Cloud Object Storage for High-Performance Analytics
The paper presents the AnyBlob; A cloud based object store download manager based on io-uring for query engines that optimizes throughput while minimizing CPU usage. Due the recently improvements on network bandwidth on cloud providers it is more viable to use remote storages for high-performance query analytics. In 2018 AWS introduce instances with 100 Gbit/s networking, which improves severely the latency and close the gap between local file systems and remote storages. Previously researches focus mostly on OLTP databases and using cache to avoid fetch data from remote storage, Anyblob on the other hand demonstrate that even without caching it achieves similar performance to state-of-the-art cloud data warehouses that cache data on local SSDs while improving resource elasticity. Anyblob use io_uring to manage multiple object store downloads per thread asynchronously. With this model, the system does not have to spawn too many threads to download these objects in parallel which wou...