What is Copy on Write and where is it used?

Recently while working with one of the databases I came across a method that is widely used by many databases to take a snapshot of the memory into a disk. One of the most famous that you have heard of is Redis. The method is called copy on write. Let’s see what is a copy on write and where is it used?

Problem Statement:

You have a system of 40 GB and you are consuming memory around say 35 GB. Now you have to take a snapshot of the whole memory, how will you do it. Just give it a thought.

Solution:

Maybe you can try reading page by page in small chunks and write it to disk but this will take a long time and there is a high chance that most of the pages will change while you are taking the snapshot of one of the pages in memory.

Copy on write to the rescue.

Now, this does is it can take a snapshot of memory very easily. How?

Lets read about how it can do so. In copy on write, there are few principles that are followed.

Memory being partitioned into units called blocks or pages.

Whenever any process demands to read any information a virtual address is given without telling the process. In this way, the system makes sure that any processes can read the same piece of information without knowing about it and this saves memory in this process.

The problem comes when the process tries to update that information. In that scenario, a copy of the actual block is made and the address of the new copy is given back. The process can now update the block while other is reading the same block from another copy.

What is Copy on Write and where is it used?

Now considering this let’s see how COW will solve our problem. First, you can write a program that can start copying the data. What happens is if you try to copy it you will get the same memory and no extra memory is used. If there is something changed a separate copy is created of that and new data will be written to that. So we actually took the snapshot of the whole memory by using very little memory. Keep in mind this memory consumption will become huge if all our pages get a write. In that scenario, you need to have 35GB of pages that need a total of 70GB of the memory resource. Thus our system will get OOM and it will crash.

In Linux, the fork() actually creates a child process using COW method. Thus child shares the same pages, only when the child needs to make some change to the pages a new copy is given to it to edit.

Where is it used?

First of all, it is used to take snapshots of the process that depends on memory heavily like Redis. Aerospike also uses it. Anywhere you want to optimize memory while reading and writing this can be a good method. Mysql uses it for instant snapshots.

This was very basic of copy on write. You can dig it deeper and let us know what you find more.

You can read about it in Redis here.
https://www.learnsteps.com/redis-bgsave-taking-a-lot-of-memory-here-is-the-reason/

If you like the article please share and subscribe.

Also, we have a book for you if you are preparing for DevOps and SRE interview you can find it here.


Gaurav Yadav

Gaurav is cloud infrastructure engineer and a full stack web developer and blogger. Sportsperson by heart and loves football. Scale is something he loves to work for and always keen to learn new tech. Experienced with CI/CD, distributed cloud infrastructure, build systems and lot of SRE Stuff.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.