train_transformer.py: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int' #1

MauriceCalvert · 2023-01-03T19:52:26Z

Steps to reproduce:
Install on Windows 10, Python 3.9.0
generate.py fails, Storage device not recognized: mps (fair enough, Mac<>Windows)
run train_vq_vae.py briefly to fix this (mps remembered in vq_vae\ludovico-mini\state_dict\last ?)
Edit generate.py and tools.py, and change all devices to CUDA (mps not supported on Windows)
Uninstall torch, install torch CUDA
Reduce MIDI files to a small selection to speed things up

train vq-vae:
python train_vq_vae.py
Restored
creating training dataset. progress: 100.00%
creating validation dataset
799 samples in training set
96 samples in validation set
epoch: 0 | progress: 0.13% | recon_error: 0.021541 | perplexity : 3.6868
Validation
validation recon_error: 0.0089
...
epoch: 9 | progress: 0.13% | recon_error: 0.000042 | perplexity : 3.64941
Validation
validation recon_error: 0.0129
epoch: 9 | progress: 0.00% | recon_error: 0.000007 | perplexity : 3.65177

Looks good.

train bachsformer:
python train_transformer.py
100%|███████| 6/6 [00:01<00:00, 4.72it/s]
vocab_size: 16
block_size: 191
number of parameters: 0.82M
running on device cuda
Traceback (most recent call last):
File "D:\Downloads\bachsformer-main\train_transformer.py", line 91, in
trainer.run()
File "D:\Downloads\bachsformer-main\transformer_decoder_only\trainer.py", line 100, in run
logits, self.loss = model(x, y)
File "C:\Users\Momo\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Downloads\bachsformer-main\transformer_decoder_only\model.py", line 279, in forward
loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1), ignore_index=-1)
File "C:\Users\Momo\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\functional.py", line 3026, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int'

P.S. Thanks for sharing this; neat, crisp and very instructive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_transformer.py: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int' #1

train_transformer.py: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int' #1

MauriceCalvert commented Jan 3, 2023

train_transformer.py: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int' #1

train_transformer.py: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int' #1

Comments

MauriceCalvert commented Jan 3, 2023