Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError during training with Squad dataset and T5-small model #6973

Open
ramtunguturi36 opened this issue Jun 16, 2024 · 1 comment
Open

Comments

@ramtunguturi36
Copy link

Describe the bug

I am encountering an IndexError while training a T5-small model on the Squad dataset using the transformers and datasets libraries. The error occurs even with a minimal reproducible example, suggesting a potential bug or incompatibility.

Steps to reproduce the bug

1.Install the required libraries: !pip install transformers datasets
2.Run the following code:
!pip install transformers datasets

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, TrainingArguments, Trainer, DataCollatorWithPadding

Load a small, publicly available dataset

from datasets import load_dataset
dataset = load_dataset("squad", split="train[:100]") # Use a small subset for testing

Load a pre-trained model and tokenizer

model_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Define a basic data collator

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Define training arguments

training_args = TrainingArguments(
output_dir="./results",
per_device_train_batch_size=2,
num_train_epochs=1,
)

Create a trainer

trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
data_collator=data_collator,
)

Train the model

trainer.train()

Expected behavior


IndexError Traceback (most recent call last)
in <cell line: 34>()
32
33 # Train the model
---> 34 trainer.train()

10 frames
/usr/local/lib/python3.10/dist-packages/datasets/formatting/formatting.py in _check_valid_index_key(key, size)
427 if isinstance(key, int):
428 if (key < 0 and key + size < 0) or (key >= size):
--> 429 raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
430 return
431 elif isinstance(key, slice):

IndexError: Invalid key: 42 is out of bounds for size 0

Environment info

transformers version:4.41.2
datasets version:1.18.4
Python version:3.10.12

@arthasking123
Copy link

arthasking123 commented Jun 18, 2024

add remove_unused_columns=False to training_args
#6535 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants