What the CRaC ?!

If you've been following the news lately in the Java ecosystem (aside from Java 28th anniversary), you should've heard of CRaC. Two big announcements were revealed this week:

If you are wondering what CRaC is all about,I got you covered, read on  :)

Photo by Josh Calabrese / Unsplash

What is CRaC?!

Explain it like I am 6 years old (Sort of!)

In a world where streaming services are omnipresent, we can stop watching a video whenever we want, and we expect to resume from (almost) the same position where we left off, even on another device. Imagine if we can apply the same analogy to our running applications: Take a snapshot (pause) of the running state, and restore (resume) it in another server.

In more technical terms

CRaC stands for Coordinated Restore at Checkpoint. It is an OpenJDK project, developed by Azule Systems, with the aim to speed up the JVM startup time by capturing/freezing its running state, where all the heavy lifting is performed (loading classes, JIT compilation, code optimizations...), and serializing its state on disk (Checkpoint), to resume it later from that state (Restore) and run it exactly as it was during the time of the freeze.

CRaC uses CRIU technology under the hood to perform its magic. CRIU is a C library facilitating the implementation of checkpoint/restore functionalities for Linux, and the maintainers claim it is the most feature-rich and up-to-date with the kernel for implementing CR in Linux.

CRIU is also the technology behind docker checkpoint experimental command, allowing to make serializable snapshots of a running container, and recreated later (even in another host). Podman has a similar feature with podman container checkpoint. Similarly, CRIU has support for Kubernetes and LXC/LXD as well.

In the Java space, CRIU is also used in OpenJ9 to improve JVM startup time, and empower InstantOn Project from Open Liberty

Showtime

CRaC is only available on Linux, so in order to run this demo you'd need a Linux machine. I tried to use Docker on Mac but had little success and stumbled upon many issues.

I am also using this simple Spring boot code showcasing the upcoming support for CRaC in Spring framework 6.2. Kudos to the Spring team and @sdeleuze for the amazing work.

First, you'd need to install the recently available Zulu JDK with CRaC support. You can either install it manually or use sdkman using:

$ sdk install java 17.42.21-zulu

Next, we'd need to build the project by running the below command. This assume that you already cloned the project and it is your current directory.

$ mvn clean verify

Once it finishes building, we can run our app using

$ java -XX:CRaCCheckpointTo=./crac-files -jar target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar

Notice anything new? The java argument -XX:CRaCCheckpointTo=path indicates to the jvm to enable CRaC and defines the path to store the image.
If everything goes as expected, the app should be running after a few seconds. Make sure to hit it with a few requests to warm up the application:

$ curl localhost:8080
Greetings from Spring Boot!

Now leave your app running (or run it in the background), and in another terminal, we're going to use the jcmd command to trigger checkpoint:

$ jcmd target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar JDK.checkpoint
201931:
CR: Checkpoint ...

201931 represents PID of our running spring boot app, which should now be stopped as indicated in the logs:

2023-05-20T12:02:06.610Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Stopping Spring-managed lifecycle beans before JVM checkpoint
2023-05-20T12:02:06.615Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Stopping beans in phase 2147482623
2023-05-20T12:02:06.617Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Bean 'webServerGracefulShutdown' completed its stop procedure
2023-05-20T12:02:06.617Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Stopping beans in phase 2147481599
2023-05-20T12:02:06.618Z  INFO 201931 --- [Attach Listener] org.eclipse.jetty.server.Server          : Stopped Server@53f3bdbd{STOPPING}[11.0.15,sto=0]
2023-05-20T12:02:06.624Z  INFO 201931 --- [Attach Listener] o.e.jetty.server.AbstractConnector       : Stopped ServerConnector@1a4927d6{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2023-05-20T12:02:06.629Z  INFO 201931 --- [Attach Listener] o.e.j.s.h.ContextHandler.application     : Destroying Spring FrameworkServlet 'dispatcherServlet'
2023-05-20T12:02:06.632Z  INFO 201931 --- [Attach Listener] o.e.jetty.server.handler.ContextHandler  : Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@35399441{application,/,[file:///tmp/jetty-docbase.8080.3095195653033098747/],STOPPED}
2023-05-20T12:02:06.638Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Bean 'webServerStartStop' completed its stop procedure
2023-05-20T12:02:06.639Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Stopping beans in phase -2147483647
2023-05-20T12:02:06.640Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Bean 'springBootLoggingLifecycle' completed its stop procedure
Killed

Inspecting your files now, you should see crar-files folder created with different .img files. Those are all the images that were generated from the checkpoint operation. Those images can be inspected using crit tool. If you are using ubuntu, you can install crit command-line as part of the criu package using apt-get install criu.

crit is pretty handy to check the content of images folder. We can, for example check what process we checkpointed:

$ crit x crarc-files ps
    PID   PGID    SID   COMM
 201931 201931 201381   java

We can inspect checkpoint files descriptors:

$ crit x crarc-files fds
          201931
	      0: TTY.36
	      1: TTY.36
	      2: TTY.36
	      3: /root/.sdkman/candidates/java/17.0.7.crac-zulu/lib/modules
	      4: /home/maboullaite/spring-boot-crac-demo/target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar
	      5: /home/maboullaite/spring-boot-crac-demo/target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar
	      6: /home/maboullaite/spring-boot-crac-demo/target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar
	      7: /home/maboullaite/spring-boot-crac-demo/crac-files/perfdata
	      8: /dev/random
	      9: /dev/urandom
	    cwd: /home/maboullaite/spring-boot-crac-demo
	   root: /

We can even extract image info of one of the images using crit show

$ crit show cracr-files/core-201931.img
{
    "magic": "CORE",
    "entries": [
        {
            "mtype": "X86_64",
            "thread_info": {
                "clear_tid_addr": "0x7f19b70d5550",
                "gpregs": {
                ...}
         "thread_core": {
                "futex_rla": 139748422014304,
                "futex_rla_len": 24,
                "sched_nice": 0,
                "sched_policy": 0,
                "sas": {
                    "ss_sp": 0,
                    "ss_size": 0,
                    "ss_flags": 2
                },
                "signals_p": {},
                "creds": {
                    "uid": 0,
                    "gid": 0,
                    "euid": 0,
                    "egid": 0,
                    "suid": 0,
                    "sgid": 0,
                    "fsuid": 0,
                    "fsgid": 0,
                    "cap_inh": [
                        0,
                        0
                    ],
                    "cap_prm": [
                        4294967295,
                        511
                    ],
                    "cap_eff": [
                        4294967295,
                        511
                    ],
                    "cap_bnd": [
                        4294967295,
                        511
                    ],
                    "secbits": 0,
                    "groups": [
                        0
                    ]
                },
                "comm": "java"
            }
        }
    ]
}

crac-files directory also contains log files, which are pretty handy in issues.
To restore our image and run the app from it's saved state, we simply run:

$ java -XX:CRaCRestoreFrom=./crac-files

Which results in a lightning-speed start compared to the previous start.

But what about AoT and Graal Native images?

Well, it is quite different. While native images achieve very fast startup time and a very small memory footprint, it isn't the cure to all problems.  Native image generation requires that each class you'd need at runtime be made available in build time for the compilation to succeed, which might represent some challenges for java developers. Debugging is also another aspect where native images fall short.

CRaC (and similar tools) allows us to still benefit from JVM capabilities that we're familiar with while benefiting from the fast startup needed for many cloud-native workloads. On the other hand, as Thomas brought to my attention, the size of the snapshot is orders of magnitude bigger compared to the size of the native image.

Final Thoughts

The general availability of CRaC would help boost the adoption of CR technology in the java space, making the java language even more modern and more suitable for the cloud-native world. Exciting times!
Finally, it is worth mentioning that CR technology is not new,  Google uses it to migrate batch jobs in Borg.

Ressources