What the CRaC ?!
If you've been following the news lately in the Java ecosystem (aside from Java 28th anniversary), you should've heard of CRaC. Two big announcements were revealed this week:
- Azul announced earlier this week the general availability of and commercial support for Azul Zulu Builds of OpenJDK for Java 17 including CRaC functionality.
- The next release of Spring framework, 6.1, will add support for CRaC.
If you are wondering what CRaC is all about,I got you covered, read on :)
What is CRaC?!
Explain it like I am 6 years old (Sort of!)
In a world where streaming services are omnipresent, we can stop watching a video whenever we want, and we expect to resume from (almost) the same position where we left off, even on another device. Imagine if we can apply the same analogy to our running applications: Take a snapshot (pause) of the running state, and restore (resume) it in another server.
In more technical terms
CRaC stands for Coordinated Restore at Checkpoint. It is an OpenJDK project, developed by Azule Systems, with the aim to speed up the JVM startup time by capturing/freezing its running state, where all the heavy lifting is performed (loading classes, JIT compilation, code optimizations...), and serializing its state on disk (Checkpoint), to resume it later from that state (Restore) and run it exactly as it was during the time of the freeze.
CRaC uses CRIU technology under the hood to perform its magic. CRIU is a C library facilitating the implementation of checkpoint/restore functionalities for Linux, and the maintainers claim it is the most feature-rich and up-to-date with the kernel for implementing CR in Linux.
CRIU is also the technology behind docker checkpoint
experimental command, allowing to make serializable snapshots of a running container, and recreated later (even in another host). Podman has a similar feature with podman container checkpoint
. Similarly, CRIU has support for Kubernetes and LXC/LXD as well.
In the Java space, CRIU is also used in OpenJ9 to improve JVM startup time, and empower InstantOn Project from Open Liberty
Showtime
CRaC is only available on Linux, so in order to run this demo you'd need a Linux machine. I tried to use Docker on Mac but had little success and stumbled upon many issues.
I am also using this simple Spring boot code showcasing the upcoming support for CRaC in Spring framework 6.2. Kudos to the Spring team and @sdeleuze for the amazing work.
First, you'd need to install the recently available Zulu JDK with CRaC support. You can either install it manually or use sdkman using:
$ sdk install java 17.42.21-zulu
Next, we'd need to build the project by running the below command. This assume that you already cloned the project and it is your current directory.
$ mvn clean verify
Once it finishes building, we can run our app using
$ java -XX:CRaCCheckpointTo=./crac-files -jar target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar
Notice anything new? The java argument -XX:CRaCCheckpointTo=path
indicates to the jvm to enable CRaC and defines the path to store the image.
If everything goes as expected, the app should be running after a few seconds. Make sure to hit it with a few requests to warm up the application:
$ curl localhost:8080
Greetings from Spring Boot!
Now leave your app running (or run it in the background), and in another terminal, we're going to use the jcmd
command to trigger checkpoint:
$ jcmd target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar JDK.checkpoint
201931:
CR: Checkpoint ...
201931
represents PID of our running spring boot app, which should now be stopped as indicated in the logs:
2023-05-20T12:02:06.610Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Stopping Spring-managed lifecycle beans before JVM checkpoint
2023-05-20T12:02:06.615Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Stopping beans in phase 2147482623
2023-05-20T12:02:06.617Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Bean 'webServerGracefulShutdown' completed its stop procedure
2023-05-20T12:02:06.617Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Stopping beans in phase 2147481599
2023-05-20T12:02:06.618Z INFO 201931 --- [Attach Listener] org.eclipse.jetty.server.Server : Stopped Server@53f3bdbd{STOPPING}[11.0.15,sto=0]
2023-05-20T12:02:06.624Z INFO 201931 --- [Attach Listener] o.e.jetty.server.AbstractConnector : Stopped ServerConnector@1a4927d6{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2023-05-20T12:02:06.629Z INFO 201931 --- [Attach Listener] o.e.j.s.h.ContextHandler.application : Destroying Spring FrameworkServlet 'dispatcherServlet'
2023-05-20T12:02:06.632Z INFO 201931 --- [Attach Listener] o.e.jetty.server.handler.ContextHandler : Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@35399441{application,/,[file:///tmp/jetty-docbase.8080.3095195653033098747/],STOPPED}
2023-05-20T12:02:06.638Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Bean 'webServerStartStop' completed its stop procedure
2023-05-20T12:02:06.639Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Stopping beans in phase -2147483647
2023-05-20T12:02:06.640Z DEBUG 201931 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor : Bean 'springBootLoggingLifecycle' completed its stop procedure
Killed
Inspecting your files now, you should see crar-files
folder created with different .img
files. Those are all the images that were generated from the checkpoint operation. Those images can be inspected using crit tool. If you are using ubuntu, you can install crit
command-line as part of the criu
package using apt-get install criu
.
crit
is pretty handy to check the content of images folder. We can, for example check what process we checkpointed:
$ crit x crarc-files ps
PID PGID SID COMM
201931 201931 201381 java
We can inspect checkpoint files descriptors:
$ crit x crarc-files fds
201931
0: TTY.36
1: TTY.36
2: TTY.36
3: /root/.sdkman/candidates/java/17.0.7.crac-zulu/lib/modules
4: /home/maboullaite/spring-boot-crac-demo/target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar
5: /home/maboullaite/spring-boot-crac-demo/target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar
6: /home/maboullaite/spring-boot-crac-demo/target/spring-boot-crac-demo-1.0.0-SNAPSHOT.jar
7: /home/maboullaite/spring-boot-crac-demo/crac-files/perfdata
8: /dev/random
9: /dev/urandom
cwd: /home/maboullaite/spring-boot-crac-demo
root: /
We can even extract image info of one of the images using crit show
$ crit show cracr-files/core-201931.img
{
"magic": "CORE",
"entries": [
{
"mtype": "X86_64",
"thread_info": {
"clear_tid_addr": "0x7f19b70d5550",
"gpregs": {
...}
"thread_core": {
"futex_rla": 139748422014304,
"futex_rla_len": 24,
"sched_nice": 0,
"sched_policy": 0,
"sas": {
"ss_sp": 0,
"ss_size": 0,
"ss_flags": 2
},
"signals_p": {},
"creds": {
"uid": 0,
"gid": 0,
"euid": 0,
"egid": 0,
"suid": 0,
"sgid": 0,
"fsuid": 0,
"fsgid": 0,
"cap_inh": [
0,
0
],
"cap_prm": [
4294967295,
511
],
"cap_eff": [
4294967295,
511
],
"cap_bnd": [
4294967295,
511
],
"secbits": 0,
"groups": [
0
]
},
"comm": "java"
}
}
]
}
crac-files
directory also contains log files, which are pretty handy in issues.
To restore our image and run the app from it's saved state, we simply run:
$ java -XX:CRaCRestoreFrom=./crac-files
Which results in a lightning-speed start compared to the previous start.
But what about AoT and Graal Native images?
Well, it is quite different. While native images achieve very fast startup time and a very small memory footprint, it isn't the cure to all problems. Native image generation requires that each class you'd need at runtime be made available in build time for the compilation to succeed, which might represent some challenges for java developers. Debugging is also another aspect where native images fall short.
CRaC (and similar tools) allows us to still benefit from JVM capabilities that we're familiar with while benefiting from the fast startup needed for many cloud-native workloads. On the other hand, as Thomas brought to my attention, the size of the snapshot is orders of magnitude bigger compared to the size of the native image.
Final Thoughts
The general availability of CRaC would help boost the adoption of CR technology in the java space, making the java language even more modern and more suitable for the cloud-native world. Exciting times!
Finally, it is worth mentioning that CR technology is not new, Google uses it to migrate batch jobs in Borg.