PostgreSQL send patches over the email for contribution and code reviewing and some times organizing multiple patch versions of different works can be confusing. In this post, I’ll share how I use git worktree to organize the development and reviewing of different PostgreSQL patches. git worktree The git worktree command allows you to organize multiple working directories associated with a single Git repository. This is extremely useful when you need to check out different branches simultaneously without the overhead of cloning the repository multiple times. Each worktree acts as an independent checkout, sharing the same .git directory and object database. Untracked files in one worktree don’t exist in others. That isolation becomes really useful during development and review workflows. If you work on multiple patches at the same time—or if you frequently review patch series—it’s common to accumulate different versions of the same patch. For example, after working on the extensio...
(personal summary based on the paper) The paper presents the AnyBlob; A cloud based object store download manager based on io-uring for query engines that optimizes throughput while minimizing CPU usage. Due the recently improvements on network bandwidth on cloud providers it is more viable to use remote storages for high-performance query analytics. In 2018 AWS introduce instances with 100 Gbit/s networking, which improves severely the latency and close the gap between local file systems and remote storages. Previously researches focus mostly on OLTP databases and using cache to avoid fetch data from remote storage, Anyblob on the other hand demonstrate that even without caching it achieves similar performance to state-of-the-art cloud data warehouses that cache data on local SSDs while improving resource elasticity. Anyblob use io_uring to manage multiple object store downloads per thread asynchronously. With this model, the system does not have to spawn too many threads to downl...
(personal summary based on the paper) A common approach to executing queries in database systems is the classical iterator model introduced in the Volcano paper. This model is simple and flexible, and operates by having each plan node implement a next() method that returns one tuple at a time when called repeatedly. However, this model incurs performance overhead due to the large number of virtual function calls (next()), which often rely on function pointers or interface dispatching. These calls can become expensive when executed millions of times. To address this, some systems reduce the overhead by passing batches of tuples instead of single tuples between operators, enabling vectorized execution and reducing call frequency. The paper introduces a query compilation technique that translates SQL queries into efficient machine code using the LLVM compiler framework. Instead of pulling tuples via next() calls, the execution model pushes data from producers to consumers. This a...