Navigating Google Cloud Platform, or any cloud platform for that matter, can be draining, primarily when you work with cloud services every day. If you manage multiple projects, just accessing the right resource can feel like a nightmare because of the number of clicks involved.
We had an interesting incident at work recently. We run huge databases on PostgreSQL, and to reduce disk fragmentation, we run pg_repack1 on a regular basis. This tool rewrites tables and indexes so that they are stored in contiguous blocks on disk. It reduces the overhead of disk seeks, improves read performance, and reduces the size of the database.
Before we discuss what VACUUM1 does and its implications, we need to understand how data is actually stored on disk. What happens when a tuple is inserted, updated, or deleted? Understanding this will help us understand what VACUUM does, why it’s needed, and its implications.
The most compelling feature of relational database systems is ACID(Atomicity Consistency Isolation Durability). Isolation is achieved using transactions. Isolation is needed to avoid race conditions when concurrent actors act upon the same row.
SQL standard defines isolation with different levels. Every isolation level offers certain guarantees and possible anomalies that can occur.
Databases have always fascinated me, and I have always dreamt of building a simple, hobby database for fun. I have read several blog posts about building redis, git, compiler, and interpreter, but none about building a database. This post is an outcome of a long and treacherous search of posts on making a database and eventually encountering the Bitcask paper.