Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation Error with Gemini/VertexAI: InvalidArgument due to Empty Text Content #996

Open
baptiste-pasquier opened this issue May 28, 2024 · 0 comments · May be fixed by #994
Open

Evaluation Error with Gemini/VertexAI: InvalidArgument due to Empty Text Content #996

baptiste-pasquier opened this issue May 28, 2024 · 0 comments · May be fixed by #994
Labels
bug Something isn't working

Comments

@baptiste-pasquier
Copy link

[x] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug

When attempting to evaluate a dataset using Gemini, an InvalidArgument error occurs indicating that the text content is empty. This issue arises specifically when using the evaluate function with the VertexAIEmbeddings from Langchain. Empty strings in the dataset seem to cause the error during the embedding process. The same evaluation process works without errors when using the OpenAI embeddings. The answer_similarity metric is used for evaluation.

Ragas version: 0.1.8
Python version: 3.10.13

Code to Reproduce

from datasets import Dataset
from ragas import evaluate
from ragas.metrics import answer_similarity

data = {
    "answer": [
        "Paris is the capital of France.",
        "",
        "Paris is the capital of France.",
        "",
    ],
    "ground_truth": [
        "Paris",
        "Paris",
        "",
        "",
    ],
}
dataset = Dataset.from_dict(data)

With OpenAI (no error):

from langchain_openai import ChatOpenAI, OpenAIEmbeddings

llm = ChatOpenAI(model="gpt-3.5-turbo")
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

result = evaluate(
    dataset,
    metrics=[answer_similarity],
    llm=llm,
    embeddings=embeddings,
)
result.to_pandas()

Output:

answer ground_truth answer_similarity
Paris is the capital of France. Paris 0.493884
"" Paris 0.175841
Paris is the capital of France. "" 0.101094
"" "" 1.000000

With Gemini (error):

from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings

llm = ChatVertexAI(model_name="gemini-1.5-flash-preview-0514")
embeddings = VertexAIEmbeddings(model_name="text-embedding-004")

result = evaluate(
    dataset,
    metrics=[answer_similarity],
    llm=llm,
    embeddings=embeddings,
)
result.to_pandas()

Error trace

Open error trace
Traceback (most recent call last):
  File "/Users/baptiste.pasquier/Github/ragas/src/ragas/executor.py", line 79, in _aresults
    r = await future
  File "/Users/baptiste.pasquier/.pyenv/versions/3.10.13/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/Users/baptiste.pasquier/Github/ragas/src/ragas/executor.py", line 38, in sema_coro
    return await coro
  File "/Users/baptiste.pasquier/Github/ragas/src/ragas/executor.py", line 112, in wrapped_callable_async
    return counter, await callable(*args, **kwargs)
  File "/Users/baptiste.pasquier/Github/ragas/src/ragas/metrics/base.py", line 125, in ascore
    raise e
  File "/Users/baptiste.pasquier/Github/ragas/src/ragas/metrics/base.py", line 121, in ascore
    score = await self._ascore(row=row, callbacks=group_cm, is_async=is_async)
  File "/Users/baptiste.pasquier/Github/ragas/src/ragas/metrics/_answer_similarity.py", line 65, in _ascore
    embedding_1 = np.array(await self.embeddings.embed_text(ground_truth))
  File "/Users/baptiste.pasquier/Github/ragas/src/ragas/embeddings/base.py", line 23, in embed_text
    embs = await self.embed_texts([text], is_async=is_async)
  File "/Users/baptiste.pasquier/Github/ragas/src/ragas/embeddings/base.py", line 33, in embed_texts
    return await aembed_documents_with_retry(texts)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/_asyncio.py", line 142, in async_wrapped
    return await fn(*args, **kwargs)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/_asyncio.py", line 58, in __call__
    do = await self.iter(retry_state=retry_state)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/_asyncio.py", line 110, in iter
    result = await action(retry_state)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/_asyncio.py", line 78, in inner
    return fn(*args, **kwargs)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 410, in exc_check
    raise retry_exc.reraise()
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 183, in reraise
    raise self.last_attempt.result()
  File "/Users/baptiste.pasquier/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/baptiste.pasquier/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/_asyncio.py", line 61, in __call__
    result = await fn(*args, **kwargs)
  File "/Users/baptiste.pasquier/Github/ragas/src/ragas/embeddings/base.py", line 64, in aembed_documents
    return await self.embeddings.aembed_documents(texts)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/langchain_core/embeddings/embeddings.py", line 21, in aembed_documents
    return await run_in_executor(None, self.embed_documents, texts)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/langchain_core/runnables/config.py", line 514, in run_in_executor
    return await asyncio.get_running_loop().run_in_executor(
  File "/Users/baptiste.pasquier/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/langchain_google_vertexai/embeddings.py", line 380, in embed_documents
    return self.embed(texts, batch_size, "RETRIEVAL_DOCUMENT")
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/langchain_google_vertexai/embeddings.py", line 363, in embed
    embeddings.extend(t.result())
  File "/Users/baptiste.pasquier/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/baptiste.pasquier/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/baptiste.pasquier/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/langchain_google_vertexai/embeddings.py", line 197, in _get_embeddings_with_retry
    return self._get_text_embeddings_with_retry(
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/langchain_google_vertexai/embeddings.py", line 228, in _get_text_embeddings_with_retry
    embeddings = self.instance["get_embeddings_with_retry"](requests)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 330, in wrapped_f
    return self(f, *args, **kw)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 467, in __call__
    do = self.iter(retry_state=retry_state)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 368, in iter
    result = action(retry_state)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 390, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "/Users/baptiste.pasquier/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/baptiste.pasquier/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 470, in __call__
    result = fn(*args, **kwargs)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/vertexai/language_models/_language_models.py", line 2145, in get_embeddings
    prediction_response = self._endpoint.predict(
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/google/cloud/aiplatform/models.py", line 1614, in predict
    prediction_response = self._prediction_client.predict(
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/google/cloud/aiplatform_v1/services/prediction_service/client.py", line 851, in predict
    response = rpc(
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/Users/baptiste.pasquier/Github/ragas/.venv/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 The text content is empty.
@baptiste-pasquier baptiste-pasquier added the bug Something isn't working label May 28, 2024
@baptiste-pasquier baptiste-pasquier changed the title Evaluation Error with Gemini: InvalidArgument due to Empty Text Content Evaluation Error with Gemini/VertexAI: InvalidArgument due to Empty Text Content May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant