DMTCP: Distributed MultiThreaded CheckPointing
-
Updated
Jun 14, 2026 - C++
DMTCP: Distributed MultiThreaded CheckPointing
Very-Low Overhead Checkpointing System
CERE: Codelet Extractor and REplayer
User space POSIX-like file system in main memory
HOWTO: TCP connection repair
A minimal working example of DMTCP checkpoint-restart inside a Singularity container.
Scans intervals for vampire numbers
Storing process state while Checkpointing and Getting it back at the time of Recovery...
Tools for task-based parallelism with MPI via pbdMPI with automatic checkpoint/restart.
Automated storage and retrieval of results for functions calls
Cairn — recoverable long-horizon AI agents. A framework-agnostic reference implementation and benchmark for agent checkpointing, crash recovery, and re-grounding after context loss. Thesis: Checkpoints Are Compactions.
Fault Tolerance framework for High Performance Computing [Supports ULFM, replication and checkpointing]
Program running on server is checkpointed and closed and gives the task to client to restart it again on the client system. With the help of dmtcp.
Checkpointing and restore functionality for single threaded programs
Runtime detection and control of LLM coherence failures (looping, hallucination, context loss). No fine-tuning. Zero iatrogenic harm. 69 experiments across 5 architectures.
Partner-XOR combined Checkpoint/Restart Library
Production-ready Python pipeline to migrate contacts from Mailchimp to HubSpot CRM batch processing, offset pagination, checkpoint resume, and field transformation built in.
Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)
Add a description, image, and links to the checkpoint-restart topic page so that developers can more easily learn about it.
To associate your repository with the checkpoint-restart topic, visit your repo's landing page and select "manage topics."