feat: Add support for multimodal on ChatMessages with ContentPart #7913

CarlosFerLo · 2024-06-22T18:05:54Z

Related Issues

fixes ChatMessage content being str-only doesn't allow user to pass image #7848

Proposed Changes:

I have added a new data class called 'ContentPart' as suggested in #7848 that holds any kind of content you might want to add to a 'ChatMessage', now only text, image urls and image in base 64 are implemented, as the original goal of this PR was to support Image passing to the OpenAI Chat API. This class allows for serialization and OpenAI specific formatting.

Added support for different content types on 'ChatMessage': 'str', 'ContentPart' and 'List[Union[str, ContentPart]]'. All serialization and OpenAI specific formatting was addressed.

Added support for these new content types for 'ChatMessage' on 'ChatPromptBuilder'.

How did you test it?

Added unit test for any new functionality for both 'ChatMessage' and 'ChatPromptBuilder', and created a new file containing unit tests for 'ContentPart'.

Notes for the reviewer

I have stumbled upon that some of the 'from_user' kind of methods on 'ChatMessage' allow the passing of metadata and others don't, should we add it to all of them?
Do not know if you want me to support any other of content type such as Audio or Video.
'ByteStream' does not ensure that the content it has are bytes, we can pass it a string, and it breaks when calling the 'to_string' method, I patched it, but I believe we should think of a better solution.
When working with base64 images, the 'from_base64' image method just accepts 'ByteStream', but we could make it accept str also and convert it to 'ByteStream' internally.
The OpenAI API Vision documentation (here) says that it supports for other formats different from JPEG, but it does not provide any examples on how to encode the base64 image, see the 'to_openai_format' method of 'ContentPart' for context, I do not know how to check if it works.

Checklist

I have read the contributors guidelines and the code of conduct ✅
I have updated the related issue with new insights and changes ✅
I added unit tests and updated the docstrings ✅
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:. ✅
I documented my code ✅
I ran pre-commit hooks and fixed any issue ✅

… method was not used at init

silvanocerza

Please use the approach I suggested on this comment.

coveralls · 2024-06-26T13:55:46Z

Pull Request Test Coverage Report for Build 9680828964

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
37 unchanged lines in 6 files lost coverage.
Overall coverage decreased (-0.2%) to 89.744%

Files with Coverage Reduction	New Missed Lines	%
components/evaluators/document_map.py	1	96.15%
components/evaluators/document_mrr.py	1	95.45%
tracing/datadog.py	1	94.59%
components/generators/openai.py	2	96.34%
dataclasses/chat_message.py	4	94.94%
components/builders/chat_prompt_builder.py	28	77.05%

Totals
Change from base Build 9615573436:	-0.2%
Covered Lines:	6843
Relevant Lines:	7625

💛 - Coveralls

CarlosFerLo · 2024-06-26T14:02:47Z

Hi @silvanocerza,

Your comment for reference:

We can keep it much simpler.

As of now models can receive and generate the following:

text

image

audio

video

heterogeneous list of all the above

We have all the necessary abstractions to define the above. str obviously for text. haystack.dataclasses.ByteStream for image, audio and video. The list is List[Union[str, ByteStream]] then.

Given that we say that ChatMessage.content type should be Union[str, ByteStream, List[Union[str, ByteStream]]].

This abstracts at an high level all the supported type of data a model receives and generates. If model X needs their input or generates their output in a certain format its Generator will handle the conversion, but that's an implementation detail.

Introducing new classes or new abstractions is not the way to go in my opinion.

I replied to your comment a while ago remarking my concerns on how to know whether a string or 'ByteStream' is one or another data type, how do you suggest we solve that?

I'm not a very experienced coder, so I do not know how to implement this using your method. If you are willing to give me some guidance I have no problem of implementing your solution, as I said, it results in a simpler API, but I cannot think of a way of implementing it.

I will tag other members of our initial conversation so they can give their opinion on the matter.
@lbux @vblagoje

silvanocerza · 2024-06-26T14:31:45Z

@CarlosFerLo
ByteStream has been thought exactly to solve the problem of handling multi modality in Haystack. It's generic enough that can store any kind of binary. The mime_type field can be used to know which kind of data we're dealing with, whether it's an image, audio, video or anything else.

The idea is to keep the abstraction level higher than what is proposed in this PR.

If some Generator need to send the ByteStream data encoded in a way or another it will handle that internally, there's no need to add another abstraction for that.

CarlosFerLo · 2024-06-26T16:39:06Z

Okey, did not know what that field was thought for in the ByteStream. Just one last question, how can we represent image urls? If we go for the simple string, it is more natural, we might run into problems for differentiating urls and raw text, but if we go with a ByteStream we would find ourselves using a tool for a thing it wasn't meant to be used for, I believe. I will close this PR and start implementing your solution right away.

silvanocerza · 2024-06-26T16:46:09Z

ChatMessage has a meta field, that information could be stored there too if necessary.

CarlosFerLo · 2024-06-26T16:48:31Z

@silvanocerza What if we add a prefix or something to the urls? To avoid duplicate information.

silvanocerza · 2024-06-26T16:49:19Z

Like http? 😁

CarlosFerLo · 2024-06-26T16:51:48Z

Like a 'image_url' prefix. 'http' may appear in regular text. We could find a prefix that is not usual, and together with the url structure differentiate between ulrs and text.

add first support for images in messages through ContentPart

e7ce39a

github-actions bot added topic:tests type:documentation Improvements on the docs labels Jun 22, 2024

CarlosFerLo added 6 commits June 23, 2024 19:47

solve a bug that made the to_string method to fail if the from_string…

924d4cf

… method was not used at init

add content part serailization´

0eee9ec

integrate content part in to chat message

0709d7f

add support for content part in init of ChatPromptTemplate

05606e7

add support for content part run in ChatPromptTemplate

a9675db

add reno

f9a8406

CarlosFerLo marked this pull request as ready for review June 26, 2024 13:38

CarlosFerLo requested review from a team as code owners June 26, 2024 13:38

CarlosFerLo requested review from dfokina and Amnah199 and removed request for a team June 26, 2024 13:38

silvanocerza requested changes Jun 26, 2024

View reviewed changes

add license headder

3b4c545

CarlosFerLo closed this Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support for multimodal on ChatMessages with ContentPart #7913

feat: Add support for multimodal on ChatMessages with ContentPart #7913

CarlosFerLo commented Jun 22, 2024 •

edited

Loading

silvanocerza left a comment

coveralls commented Jun 26, 2024 •

edited

Loading

CarlosFerLo commented Jun 26, 2024

silvanocerza commented Jun 26, 2024

CarlosFerLo commented Jun 26, 2024

silvanocerza commented Jun 26, 2024

CarlosFerLo commented Jun 26, 2024

silvanocerza commented Jun 26, 2024

CarlosFerLo commented Jun 26, 2024

feat: Add support for multimodal on ChatMessages with ContentPart #7913

feat: Add support for multimodal on ChatMessages with ContentPart #7913

Conversation

CarlosFerLo commented Jun 22, 2024 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

silvanocerza left a comment

Choose a reason for hiding this comment

coveralls commented Jun 26, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9680828964

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

CarlosFerLo commented Jun 26, 2024

silvanocerza commented Jun 26, 2024

CarlosFerLo commented Jun 26, 2024

silvanocerza commented Jun 26, 2024

CarlosFerLo commented Jun 26, 2024

silvanocerza commented Jun 26, 2024

CarlosFerLo commented Jun 26, 2024

CarlosFerLo commented Jun 22, 2024 •

edited

Loading

coveralls commented Jun 26, 2024 •

edited

Loading