CheckpointIO¶
- class lightning.fabric.plugins.io.checkpoint_io.CheckpointIO[source]¶
Bases:
ABCInterface to save/load checkpoints as they are saved through the
Strategy.Warning
This is an experimental feature.
Typically most plugins either use the Torch based IO Plugin;
TorchCheckpointIObut may require particular handling depending on the plugin.In addition, you can pass a custom
CheckpointIOby extending this class and passing it to the Trainer, i.eTrainer(plugins=[MyCustomCheckpointIO()]).Note
For some plugins, it is not possible to use a custom checkpoint plugin as checkpointing logic is not modifiable.
- abstract load_checkpoint(path, *, state=None, map_location=None, weights_only=None)[source]¶
Load checkpoint from a path when resuming or loading ckpt for test/validate/predict stages.
- Parameters:
state¶ (
Union[Module,Optimizer,dict[str,Union[Module,Optimizer,Any]],None]) – Optional dict to load the checkpoint into.map_location¶ (
Optional[Any]) – a function,torch.device, string or a dict specifying how to remap storage locations.weights_only¶ (
Optional[bool]) – Defaults toNone. IfTrue, restricts loading tostate_dictsof plaintorch.Tensorand other primitive types. If loading a checkpoint from a trusted source that contains annn.Module, useweights_only=False. If loading checkpoint from an untrusted source, we recommend usingweights_only=True. For more information, please refer to the PyTorch Developer Notes on Serialization Semantics.
- Return type:
- Returns:
- A dictionary containing checkpoint contents that still need to be
applied by the caller, or
Noneif the checkpoint was fully restored in-place intostate.
- abstract save_checkpoint(checkpoint, path, storage_options=None)[source]¶
Save model/training states as a checkpoint file through state-dump and file-write.
- property _restore_after_setup: bool¶
Whether checkpoint restoration should be delayed until after the Strategy setup phase.
Some checkpoint implementations require the distributed environment, device placement, or wrapped modules to be fully initialized before loading state. When this returns
True, the Trainer/Strategy will restore the checkpoint only after setup has completed.This is primarily used by distributed checkpointing backends that depend on collective communication during load.
- property requires_state_on_load: bool¶
Whether the
stateargument ofload_checkpointis required for loading the checkpoint.If
True, the Trainer will always pass a state dict containing the current model and optimizer to theload_checkpointmethod. This is for plugins that need to do in-place restoration of the checkpoint into the provided state objects instead of returning a new checkpoint dict.