Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure 'name' on initial message #2635

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

marklysze
Copy link
Collaborator

@marklysze marklysze commented May 9, 2024

Why are these changes needed?

When initiating a chat through ConversableAgent's initiate_chat the passed in message in the conversation doesn't get the name of the agent initiating the conversation attached to it. This is, then, not passed through to the LLM and it cannot use that name information.

So, running this block of code:

cathy = ConversableAgent(
    "cathy",
    system_message="Your name is Cathy and you are a part of a duo of comedians.",
    llm_config={"config_list": config_list, "cache_seed": None},
    human_input_mode="NEVER",  # Never ask for human input.
)

joe = ConversableAgent(
    "joe",
    system_message="Your name is Joe and you are a part of a duo of comedians.",
    llm_config={"config_list": config_list, "cache_seed": None},
    human_input_mode="NEVER",  # Never ask for human input.
)

result = joe.initiate_chat(cathy, message="Tell me a joke and include both of our names in it!", max_turns=2)

will result in a list of messages to the LLM like this:

"[
{'content': 'Your name is Cathy and you are a part of a duo of comedians.', 'role': 'system'},
{'content': 'Tell me a joke and include both of our names in it!', 'role': 'user'}
]"

I would expect there to be a name key, with the value joe, attached to the second message as that agent initiated the chat.

The response from the LLM is:
'Why did Cathy and her comedy partner decide to open a bakery together?\n\nBecause they realized they "dough"n\'t just make people laugh, they can also make them delicious pastries!'

... and it can be seen that it doesn't reference Joe's name.

By including the name key/value in the messages, like this:

"[
{'content': 'Your name is Cathy and you are a part of a duo of comedians.', 'role': 'system'},
{'content': 'Tell me a joke and include both of our names in it!', 'role': 'user', 'name': 'joe'}]"

... the LLM can use the name and returns this:
'Why did Cathy and Joe go to the comedy club?\n\nBecause they heard it was a great place for two funny people to crack jokes and make people laugh!'

... indicating that the name field is utilised (and LLMs aren't great at jokes!).

To fix this, the ConversableAgent's _append_oai_message has been updated to add the name field, if it doesn't exist and is not a function/tool message. To support this addition, an is_sending parameter is used on the function to indicate whether the self agent or conversation_id agent is the sender and, hence, the name to attach to the message.

I have not added this to documentation or updated tests for this. Please let me know if it needs specific tests added.

Related issue number

No related issue.

Checks

@marklysze
Copy link
Collaborator Author

I've updated existing test cases that now show the name on the messages.

@ekzhu
Copy link
Collaborator

ekzhu commented May 10, 2024

Thanks! I think this addition is useful! Have you tested on non-OpenAI but OpenAI compatible endpoints? E.g., LiteLLM + Ollama.

@marklysze
Copy link
Collaborator Author

Thanks! I think this addition is useful! Have you tested on non-OpenAI but OpenAI compatible endpoints? E.g., LiteLLM + Ollama.

I haven't yet, I've been trying to clarify with LiteLLM dev and they don't think Ollama takes it in. I will run this test program and see if it does utilise name. If not, then we definitely need to consider getting the name into the content, either through message transforms or some other way.

@ekzhu ekzhu added the llm issues related to LLM label May 11, 2024
@marklysze
Copy link
Collaborator Author

marklysze commented May 11, 2024

Okay, testing with LiteLLM + Ollama...

Name included in messages:

[
{'content': 'Your name is Cathy and you are a part of a duo of comedians.', 'role': 'system'},
{'content': 'Tell me a joke and include both of our names in it!', 'role': 'user', 'name': 'joe'}
]

Output

joe (to cathy):

Tell me a joke and include both of our names in it!

--------------------------------------------------------------------------------
cathy (to joe):

Here's one:

Why did Cathy and I go to the doctor?

Because we were feeling a little "off-beat"! Get it? Off-beat, like a comedy duo that's always trying to be funny!

Your turn, can you come up with a joke about us too?

So, no name included.


Tried again with a more direct question.

joe (to cathy):

Tell me my name.

--------------------------------------------------------------------------------
cathy (to joe):

Your name is Cathy, and you're one half of the hilarious comedy duo, along with your partner in crime (and laughter)!

Again, no name included and it also referred to Joe as Cathy. I noticed this on a couple of different message sets.


Now, will try with adding the name directly into the content.

[
{'content': 'You are a person named Cathy.', 'role': 'system'},
{'content': 'joe said:\nTell me my name.', 'role': 'user', 'name': 'joe'}
]
joe (to cathy):

joe said:
Tell me my name.

--------------------------------------------------------------------------------
cathy (to joe):

Your name is Cathy.

--------------------------------------------------------------------------------

So, adding joe said: in the message didn't help.


Now, adding a more direct system message about there being multiple people in the conversation.

[
{'content': 'You are a person named Cathy. You are in a conversation with other people.', 'role': 'system'},
{'content': 'joe said:\nTell me my name.', 'role': 'user', 'name': 'joe'}
]
joe (to cathy):

joe said:
Tell me my name.

--------------------------------------------------------------------------------
cathy (to joe):

Joe, your friend asked you to tell him his own name? I think there might be some confusion here! But if Joe really wants to know, I can tell him that his name is... Joe!

--------------------------------------------------------------------------------

This shows improvement, but it shows that the cathy agent thinks Joe was asking someone else.


So, we'll try a different set of messages to try and get them to have a conversation with each other. In this one we put in who is speaking to who.

[
{'content': 'You are a person named Cathy. You are in a conversation with other people.', 'role': 'system'},
{'content': 'Joe says to Cathy:\nTell me my name.', 'role': 'user', 'name': 'joe'}
]

And by adding Joe says to Cathy: the result is

joe (to cathy):

Joe says to Cathy:
Tell me my name.

--------------------------------------------------------------------------------
cathy (to joe):

Hi Joe! Your name is Joe, right?

--------------------------------------------------------------------------------
joe (to cathy):

That's correct! My name is indeed Joe. Thanks for reminding me!

--------------------------------------------------------------------------------
cathy (to joe):

You're welcome, Joe! I'm glad I could help you remember your own name.

--------------------------------------------------------------------------------

Looks much better, they act as their namesakes and feel like they are interacting with each other.

I also ran this again and reduced the system messages back to You are a person named Cathy., similar output, it felt like a conversation with each other and they replied with the name Joe.

Okay - so I think we can make the following observations fore LiteLLM+Ollama and likely other non-OpenAI setups:

  1. The name field isn't assisting the LLM in knowing who is speaking.
  2. Without the name field, there needs to be names and, I believe, conversation direction in the content field, such as Joe said to Cathy: or Joe said to you:.

I'll continue to try different system messages and content adjustments, but I think we would need to be able to transform messages and be able to put in the sender name and the recipient name.

Here's the joke one again with Joe says to Cathy: added:

joe (to cathy):

Joe says to Cathy:
Tell me a joke and include both of our names in it!

--------------------------------------------------------------------------------
cathy (to joe):

Cathy: Okay, Joe! Here's one:

Why did I, Cathy, bring my comedy partner, Joe, to the doctor?

Because we were feeling a little "off-beat" and wanted to get to the root of our comedic problems! (get it? off-beat, like in music, but also we're comedians... haha)

How was that, Joe? Did I make you laugh?

--------------------------------------------------------------------------------
joe (to cathy):

Joe: Ahahahaha! Oh man, Cathy, that's a great one! You really "punched" me with that joke! I love how you incorporated our names and the whole off-beat music pun. You're a comedic genius!

By the way, want to come up with another one? Maybe something about our latest comedy routine or our crazy fans?

--------------------------------------------------------------------------------
cathy (to joe):

Cathy: Ahahahaha, thanks Joe! I'm glad I could "hit the right note" with that one!

Oh, absolutely! Let's do it! What about this: Why did we, Cathy and Joe, have to kick our crazy fan out of our comedy club?

(Wait for it...)

Because she was trying to "steal the show"... by stealing all the props and improvising her own routine on stage!

Haha, what do you think? Did I "bring the house down" with that one?

--------------------------------------------------------------------------------

@ekzhu
Copy link
Collaborator

ekzhu commented May 11, 2024

Thanks for the update! This is indeed very interesting. Looks like adding the name field doesn't break the set up. So that's a good.

For endpoints that bark at the name field, we can build a translation layer through model clients.

@ekzhu
Copy link
Collaborator

ekzhu commented May 13, 2024

Does the name field also plays nice with LM Studio and Mistral AI API -- probably not used, but not breaking.

@marklysze
Copy link
Collaborator Author

Does the name field also plays nice with LM Studio and Mistral AI API -- probably not used, but not breaking.

In LM Studio it's accepted, but it's not being used.

For Mistral AI API, I'll check that.

On a side note with LM Studio, it won't accept messages where content is blank, which does happen sometimes with local LLMs which I'm investigating.

@marklysze
Copy link
Collaborator Author

Does the name field also plays nice with LM Studio and Mistral AI API -- probably not used, but not breaking.

@ekzhu, I've tested the Mistral models through Together.ai and the "name" field is being accepted (not breaking), but isn't being used. The only way to get the name known is to inject it into the content itself.

However, testing it through Mistral.ai's API it did fail when it has the name key on any message.

Through the testing, to me it seems that having the name key on the message isn't useful for non-OpenAI scenarios. Was there any plan or ideas on being able to have the name key removed? It seems like a broad requirement.

@ekzhu
Copy link
Collaborator

ekzhu commented May 21, 2024

Through the testing, to me it seems that having the name key on the message isn't useful for non-OpenAI scenarios. Was there any plan or ideas on being able to have the name key removed? It seems like a broad requirement.

I feel one way to do this is to have a built-in client for each API endpoint, similar to AzureOpenAI so user can specify the API type using api_style.

@marklysze
Copy link
Collaborator Author

@ekzhu, I've created a PR #2748 regarding Mistral.AI.

Separate to that, the changes in this PR would be good to have reviewed as the code is currently missing adding name to the initial message created through initiate_chat.

Copy link
Collaborator

@ekzhu ekzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The analysis of alt APIs are super helpful. We can merge this once the #2748 is addressed -- so that the changes in this PR is not going to break two agent chats for Mistral AI API.

@marklysze
Copy link
Collaborator Author

Included "name" on messages in select speaker nested chat, as per this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llm issues related to LLM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants