Skip thinking section in Claude tool call response #226

jackmpcollins · 2024-05-23T06:37:54Z

Skip the <thinking> / chain-of-thought section when parsing the response from Claude. ~~This seems to always be included when tools are provided in the request.~~ The models do not reliably start their answers with the <thinking> tag, so the changes here are only a partial fix for this. Seems like the only full solution is to not use streaming responses so the whole response can inform how to parse.

This is only an issue for return type annotations that union str/StreamedStr with a structured object or FunctionCall/ParallelFunctionCall, because the potential for string output means a tool call cannot be forced.

Potential full solution: if a union of tool call and string return type is given, then stream the whole response before determining how to parse it. This would essentially disable streaming in this case, which means StreamedStr/Iterable[T]/ParallelFunctionCall would arrive all at once instead of as the parts are generated.

Issue #220

In future, an AnthropicAssistantMessage could be added with a thinking: str attribute. This could be registered with message_to_anthropic_message so the thinking string is persisted and serialized back to the model. This could be exposed to users for use in @chatprompt (non-anthropic models would treat it like AssistantMessage and ignore thinking).

jackmpcollins · 2024-05-28T01:22:13Z

When using tools, Claude will often show its “chain of thought”, i.e. the step-by-step reasoning it uses to break down the problem and decide which tools to use. The Claude 3 Opus model will do this if tool_choice is set to auto (this is the default value, see Forcing tool use), and Sonnet and Haiku can be prompted into doing it.

It’s important to note that while the tags are a common convention Claude uses to denote its chain of thought, the exact format (such as what this XML tag is named) may change over time. Your code should treat the chain of thought like any other assistant-generated text, and not rely on the presence or specific formatting of the tags.

https://docs.anthropic.com/en/docs/tool-use#error-handling

mnicstruwig · 2024-05-28T10:26:18Z

Not sure if it's related, but I'm also unable to manually handle function calls (although as you'll see below, it would also likely apply to prompt_chain if it were working correctly):

from magentic import ParallelFunctionCall, StreamedStr, prompt, prompt_chain, chatprompt, SystemMessage, UserMessage, AssistantMessage, FunctionResultMessage, FunctionCall
from magentic.chat_model.anthropic_chat_model import AnthropicChatModel

def get_weather(city: str) -> str:
    return f"The weather in {city} is 20°C."


function_call = FunctionCall(function=get_weather, city="Cape Town")

messages = [
    SystemMessage("You are helpful."),
    UserMessage("What's the weather like in Cape Town?"),
    AssistantMessage(function_call),
    FunctionResultMessage(function_call=function_call, content="The weather in Cape Town is 20°C.")
]

@chatprompt(
    *messages,
    functions=[get_weather],
    model=AnthropicChatModel(
        model="claude-3-opus-20240229",
        temperature=0.2,
    )
)
def _llm() -> FunctionCall | StreamedStr: ...

response = _llm()
response

Which leads to the following Anthropic API error:

anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.2.content.0.tool_result.content.0: Input should be a valid dictionary or object to extract fields from'}}

What's bizarre is that if I modify the way the function results get handled, to return a string, rather than converting into an object / dict:

@message_to_anthropic_message.register(FunctionResultMessage)
def _(message: FunctionResultMessage[Any]) -> ToolsBetaMessageParam:
    function_schema = function_schema_for_type(type(message.content))

    return {
        "role": AnthropicMessageRole.USER.value,
        "content": [
            {
                "type": "tool_result",
                "tool_use_id": message.function_call._unique_id,
                "content": function_schema.serialize_args(message.content) #json.loads(function_schema.serialize_args(message.content)),
            }
        ],
    }

Then it works. The Anthropic docs seems to suggest that the answers can now be specified as a string, or as a list of nested content blocks.

I'm wondering if this changed with the public beta of tool use? Either way, it seems to work this way now.

Would probably be a good test case to include in the future!

Skip thinking section in Claude tool call response

b2dcbb1

jackmpcollins self-assigned this May 23, 2024

jackmpcollins mentioned this pull request May 23, 2024

Confusing function calling with AnthropicChatModel #220

Open

jackmpcollins added 9 commits May 25, 2024 22:19

Merge branch 'main' into skip-anthropic-claude-thinking

17d5c97

Refactor into _extract_thinking functions

05ebe7e

Add peek, apeek, adropwhile functions

6b087c8

Reduce iteration code using new functions

41e84d7

Fix: in -> == comparison

92fe834

Add tests for AnthropicChatModel with thinking

afe73f2

Test all claude-3 models for thinking

30a2cf2

Merge branch 'main' into skip-anthropic-claude-thinking

3f0d497

Merge branch 'main' into skip-anthropic-claude-thinking

20e5533

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip thinking section in Claude tool call response #226

Skip thinking section in Claude tool call response #226

jackmpcollins commented May 23, 2024 •

edited

Loading

jackmpcollins commented May 28, 2024

mnicstruwig commented May 28, 2024 •

edited

Loading

Skip thinking section in Claude tool call response #226

Are you sure you want to change the base?

Skip thinking section in Claude tool call response #226

Conversation

jackmpcollins commented May 23, 2024 • edited Loading

jackmpcollins commented May 28, 2024

mnicstruwig commented May 28, 2024 • edited Loading

jackmpcollins commented May 23, 2024 •

edited

Loading

mnicstruwig commented May 28, 2024 •

edited

Loading