A few days ago, @eatonphil organized a Postgres Internals wehack group on Discord for people interested in understanding and contributing to the PostgreSQL source code. I had previously tried to understand more about how SQL expressions are compiled using LLVM but without much success, with this encouragement I decided to go deeper and understand how PostgreSQL compiles SQL code, and here I will describe my understanding notes. Please let me know if something is not right, I'm just a curious guy trying to understand Postgres source code. How Postgres decide if a SQL should be compiled or not? Postgres decide if a SQL expression should be compiled or not based on the cost of the query, if the estimated cost of the query is less than the setting jit_above_cost , then the query will be compiled using LLVM. Lets see a practical example (make sure that the jit is enabled): set jit_above_cost = 200 ; create table t (a int , b int ); insert into...
(personal summary based on the paper) The paper presents the AnyBlob; A cloud based object store download manager based on io-uring for query engines that optimizes throughput while minimizing CPU usage. Due the recently improvements on network bandwidth on cloud providers it is more viable to use remote storages for high-performance query analytics. In 2018 AWS introduce instances with 100 Gbit/s networking, which improves severely the latency and close the gap between local file systems and remote storages. Previously researches focus mostly on OLTP databases and using cache to avoid fetch data from remote storage, Anyblob on the other hand demonstrate that even without caching it achieves similar performance to state-of-the-art cloud data warehouses that cache data on local SSDs while improving resource elasticity. Anyblob use io_uring to manage multiple object store downloads per thread asynchronously. With this model, the system does not have to spawn too many threads to downl...
PostgreSQL send patches over the email for contribution and code reviewing and some times organizing multiple patch versions of different works can be confusing. In this post, I’ll share how I use git worktree to organize the development and reviewing of different PostgreSQL patches. git worktree The git worktree command allows you to organize multiple working directories associated with a single Git repository. This is extremely useful when you need to check out different branches simultaneously without the overhead of cloning the repository multiple times. Each worktree acts as an independent checkout, sharing the same .git directory and object database. Untracked files in one worktree don’t exist in others. That isolation becomes really useful during development and review workflows. If you work on multiple patches at the same time—or if you frequently review patch series—it’s common to accumulate different versions of the same patch. For example, after working on the extensio...