CheckpointIO¶

class lightning.fabric.plugins.io.checkpoint_io.CheckpointIO[source]¶

Bases: ABC

Interface to save/load checkpoints as they are saved through the Strategy.

Warning

This is an experimental feature.

Typically most plugins either use the Torch based IO Plugin; TorchCheckpointIO but may require particular handling depending on the plugin.

In addition, you can pass a custom CheckpointIO by extending this class and passing it to the Trainer, i.e Trainer(plugins=[MyCustomCheckpointIO()]).

Note

For some plugins, it is not possible to use a custom checkpoint plugin as checkpointing logic is not modifiable.

abstract load_checkpoint(path, *, state=None, map_location=None, weights_only=None)[source]¶

Load checkpoint from a path when resuming or loading ckpt for test/validate/predict stages.

Parameters:

path¶ (Union[str, Path]) – Path to checkpoint
state¶ (Union[Module, Optimizer, dict[str, Union[Module, Optimizer, Any]], None]) – Optional dict to load the checkpoint into.
map_location¶ (Optional[Any]) – a function, torch.device, string or a dict specifying how to remap storage locations.
weights_only¶ (Optional[bool]) – Defaults to None. If True, restricts loading to state_dicts of plain torch.Tensor and other primitive types. If loading a checkpoint from a trusted source that contains an nn.Module, use weights_only=False. If loading checkpoint from an untrusted source, we recommend using weights_only=True. For more information, please refer to the PyTorch Developer Notes on Serialization Semantics.

Return type:

dict[str, Any]

Returns:

A dictionary containing checkpoint contents that still need to be
applied by the caller, or
None if the checkpoint was fully restored in-place into state.

abstract remove_checkpoint(path)[source]¶

Remove checkpoint file from the filesystem.

Parameters:: path¶ (Union[str, Path]) – Path to checkpoint
Return type:: None

abstract save_checkpoint(checkpoint, path, storage_options=None)[source]¶

Save model/training states as a checkpoint file through state-dump and file-write.

Parameters:

checkpoint¶ (dict[str, Any]) – dict containing model and trainer state
path¶ (Union[str, Path]) – write-target path
storage_options¶ (Optional[Any]) – Optional parameters when saving the model/training states.

Return type:

None

teardown()[source]¶

This method is called to teardown the process.

Return type:: None

property _restore_after_setup: bool¶

Whether checkpoint restoration should be delayed until after the Strategy setup phase.

Some checkpoint implementations require the distributed environment, device placement, or wrapped modules to be fully initialized before loading state. When this returns True, the Trainer/Strategy will restore the checkpoint only after setup has completed.

This is primarily used by distributed checkpointing backends that depend on collective communication during load.

property requires_state_on_load: bool¶

Whether the state argument of load_checkpoint is required for loading the checkpoint.

If True, the Trainer will always pass a state dict containing the current model and optimizer to the load_checkpoint method. This is for plugins that need to do in-place restoration of the checkpoint into the provided state objects instead of returning a new checkpoint dict.