Improve LLM model tool annotation #659

dghirardo · 2024-06-05T22:09:01Z

Issue #637

This pull request implements the tool_tailor gem (https://github.com/kieranklaassen/tool_tailor) to automate the generation of tool annotations.

Changes:

All previously used JSON files for tool annotations have been removed.
Tool-specific directories have been eliminated, resulting in a cleaner and more straightforward directory structure.
The :ANNOTATIONS_PATH constant has been replaced with :FUNCTIONS, an array of symbols corresponding to the tool's functions.
Methods for validating constants (:NAME, :FUNCTIONS) have been implemented to ensure consistency and correctness.

andreibondarev · 2024-06-09T00:53:30Z

lib/langchain/tool/base.rb

-          self.class.const_get(:ANNOTATIONS_PATH)
-        )
-      )
+      functions.map do |function|


What do you think about just using all of the public_instance_methods?

Yes, that could be a good idea, I had already considered it. However, I noticed the execute_search method in google_search.rb is not private, even though it's not meant to be used directly (it wasn't present in the old JSON annotations file).

This error could easily be repeated, so I thought explicitly defining the methods would make it difficult to introduce errors and provide a clear overview of available tools. What do you think?

@dghirardo In the case of google_search.rb specifically, I'm not even convinced that we need the def self.execute_search method?

Yes, after looking at the code, it seems we don't need it. So, should we use public_instance_methods or keep the FUNCTIONS array?

lib/langchain/tool/calculator.rb

andreibondarev · 2024-06-09T01:29:18Z

Before:

irb(main):003> Langchain::Tool::NewsRetriever.new(api_key: ENV["NEWS_API_KEY"]).to_openai_tools
=>
[{"type"=>"function",
  "function"=>
   {"name"=>"news_retriever__get_everything",
    "description"=>"News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs.",
    "parameters"=>
     {"type"=>"object",
      "properties"=>
       {"q"=>
         {"type"=>"string",
          "description"=>
           "Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded."},
        "search_in"=>{"type"=>"string", "description"=>"The fields to restrict your q search to.", "enum"=>["title", "description", "content"]},
        "sources"=>
         {"type"=>"string",
          "description"=>"A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index."},
        "domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to restrict the search to."},
        "exclude_domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to remove from the results."},
        "from"=>{"type"=>"string", "description"=>"A date and optional time for the oldest article allowed. This should be in ISO 8601 format."},
        "to"=>{"type"=>"string", "description"=>"A date and optional time for the newest article allowed. This should be in ISO 8601 format."},
        "language"=>{"type"=>"string", "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for.", "enum"=>["ar", "de", "en", "es", "fr", "he", "it", "nl", "no", "pt", "ru", "sv", "ud", "zh"]},
        "sort_by"=>{"type"=>"string", "description"=>"The order to sort the articles in.", "enum"=>["relevancy", "popularity", "publishedAt"]},
        "page_size"=>{"type"=>"integer", "description"=>"The number of results to return per page (request). 5 is the default, 100 is the maximum."},
        "page"=>{"type"=>"integer", "description"=>"Use this to page through the results if the total results found is greater than the page size."}}}}},

After:

irb(main):013> Langchain::Tool::NewsRetriever.new(api_key: ENV["NEWS_API_KEY"]).to_openai_tools
=>
[{"type"=>"function",
  "function"=>
   {"name"=>"news_retriever__get_everything",
    "description"=>"Retrieve all news",
    "parameters"=>
     {"type"=>"object",
      "properties"=>
       {"q"=>{"type"=>"string", "description"=>"Keywords or phrases to search for in the article title and body."},
        "search_in"=>{"type"=>"string", "description"=>"The fields to restrict your q search to. The possible options are: title, description, content."},
        "sources"=>
         {"type"=>"string",
          "description"=>"A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index."},
        "domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to restrict the search to."},
        "exclude_domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to remove from the results."},
        "from"=>{"type"=>"string", "description"=>"A date and optional time for the oldest article allowed. This should be in ISO 8601 format."},
        "to"=>{"type"=>"string", "description"=>"A date and optional time for the newest article allowed. This should be in ISO 8601 format."},
        "language"=>{"type"=>"string", "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for. Possible options: ar, de, en, es, fr, he, it, nl, no, pt, ru, se, ud, zh."},
        "sort_by"=>{"type"=>"string", "description"=>"The order to sort the articles in. Possible options: relevancy, popularity, publishedAt."},
        "page_size"=>{"type"=>"integer", "description"=>"The number of results to return per page. 20 is the API's default, 100 is the maximum. Our default is 5."},
        "page"=>{"type"=>"integer", "description"=>"Use this to page through the results."}},
      "required"=>[]}}},

Upon a quick glance -- the enum: param is missing. I wonder if we can annotate enum options in YARD docs 🤔

andreibondarev · 2024-06-09T18:58:04Z

@dghirardo Btw -- @palladius told me that you guys met at the RubyDay! 🫶

dghirardo · 2024-06-13T20:26:10Z

@andreibondarev I am currently working on a pull request for the tool_tailor gem to add support for the enum parameter. What do you think about the proposed solution?

dghirardo · 2024-06-28T21:12:52Z

Hi @andreibondarev, tool_tailor has merged my pull request, so now the enum property is supported, updating the gem. However, I noticed another issue. When you have a parameter that is a collection (like an array or a hash), you can't describe its internal structure.

Example (Tavily tool):

# @param exclude_domains [Array<String>] A list of domains to specifically exclude from the search results. Default is None, which doesn't exclude any domains.

While [Array] is supported, [Array<String>] is not.
So, it’s not possible to:

Define that the array items must be Strings.
Describe extra properties for the String values, such as enum.

You can only use the description field.

Even if we find a solution, I think nested parameters with multiple levels are not easy to describe with YARD comments.

andreibondarev · 2024-06-28T21:40:41Z

Hi @andreibondarev, tool_tailor has merged my pull request, so now the enum property is supported, updating the gem. However, I noticed another issue. When you have a parameter that is a collection (like an array or a hash), you can't describe its internal structure.

Example (Tavily tool):
# @param exclude_domains [Array<String>] A list of domains to specifically exclude from the search results. Default is None, which doesn't exclude any domains.
While [Array] is supported, [Array<String>] is not. So, it’s not possible to:

Define that the array items must be Strings.

Describe extra properties for the String values, such as enum.

You can only use the description field.

Even if we find a solution, I think nested parameters with multiple levels are not easy to describe with YARD comments.

Hmm... I think we may want to consider some sort of a decorator pattern, a la:

llm_callable :get_everything,
  description: "News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs",
  properties: {
    q: {
      type: :string,
      description: "Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded."
  }

def get_everything(q:, ...)
  ...

The really tricky thing here is figuring out a developer-friendly interface.

dghirardo added 2 commits June 5, 2024 23:02

Integrated tool_tailor for automated tool annotations

8eccb9d

Removed tool folders and json annotations

d7332be

dghirardo changed the title ~~Feature/auto tool annotations~~ Improve LLM model tool annotation Jun 5, 2024

Merge branch 'main' into feature/auto-tool-annotations

417ed63

andreibondarev reviewed Jun 9, 2024

View reviewed changes

lib/langchain/tool/calculator.rb Show resolved Hide resolved

dghirardo mentioned this pull request Jun 13, 2024

Add Support for enum parameter via custom YARD @values tag kieranklaassen/tool_tailor#1

Closed

dghirardo added 2 commits June 25, 2024 22:06

Merge branch 'main' into feature/auto-tool-annotations

52d118d

Fixed standardrb linting error

92c9bd0

Merge branch 'main' into feature/auto-tool-annotations

fd04d18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve LLM model tool annotation #659

Improve LLM model tool annotation #659

dghirardo commented Jun 5, 2024 •

edited

Loading

andreibondarev Jun 9, 2024

dghirardo Jun 9, 2024

andreibondarev Jun 9, 2024

dghirardo Jun 13, 2024

andreibondarev commented Jun 9, 2024 •

edited

Loading

andreibondarev commented Jun 9, 2024

dghirardo commented Jun 13, 2024

dghirardo commented Jun 28, 2024

andreibondarev commented Jun 28, 2024 •

edited

Loading

Improve LLM model tool annotation #659

Are you sure you want to change the base?

Improve LLM model tool annotation #659

Conversation

dghirardo commented Jun 5, 2024 • edited Loading

Issue #637

andreibondarev Jun 9, 2024

Choose a reason for hiding this comment

dghirardo Jun 9, 2024

Choose a reason for hiding this comment

andreibondarev Jun 9, 2024

Choose a reason for hiding this comment

dghirardo Jun 13, 2024

Choose a reason for hiding this comment

andreibondarev commented Jun 9, 2024 • edited Loading

andreibondarev commented Jun 9, 2024

dghirardo commented Jun 13, 2024

dghirardo commented Jun 28, 2024

andreibondarev commented Jun 28, 2024 • edited Loading

dghirardo commented Jun 5, 2024 •

edited

Loading

andreibondarev commented Jun 9, 2024 •

edited

Loading

andreibondarev commented Jun 28, 2024 •

edited

Loading