lixilinx/psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation preconditioner and more)
2 Releases
Latest: 5mo ago
PSGD 2.0 release 2.0Latest
📋 Changes
- [psgd.py](https://github.com/lixilinx/psgd_torch/blob/master/psgd.py): functional APIs providing all the flexibilities.
- [wrapped_as_torch_optimizer_for_ddp.py](https://github.com/lixilinx/psgd_torch/blob/master/wrapped_as_torch_optimizer_for_ddp.py): a basic momentum whitening torch.optim.Optimizer wrapping example for DDP training.
- [wrapped_as_torch_optimizer_for_dtensor.py](https://github.com/lixilinx/psgd_torch/blob/master/wrapped_as_torch_optimizer_for_dtensor.py): one more basic momentum whitening torch.optim.Optimizer wrapping example for DTensor-based distributed training.
archived code1.0
