-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve LLM model tool annotation #659
base: main
Are you sure you want to change the base?
Improve LLM model tool annotation #659
Conversation
self.class.const_get(:ANNOTATIONS_PATH) | ||
) | ||
) | ||
functions.map do |function| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about just using all of the public_instance_methods
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that could be a good idea, I had already considered it. However, I noticed the execute_search
method in google_search.rb
is not private, even though it's not meant to be used directly (it wasn't present in the old JSON annotations file).
This error could easily be repeated, so I thought explicitly defining the methods would make it difficult to introduce errors and provide a clear overview of available tools. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dghirardo In the case of google_search.rb
specifically, I'm not even convinced that we need the def self.execute_search
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, after looking at the code, it seems we don't need it. So, should we use public_instance_methods
or keep the FUNCTIONS
array?
Before: irb(main):003> Langchain::Tool::NewsRetriever.new(api_key: ENV["NEWS_API_KEY"]).to_openai_tools
=>
[{"type"=>"function",
"function"=>
{"name"=>"news_retriever__get_everything",
"description"=>"News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs.",
"parameters"=>
{"type"=>"object",
"properties"=>
{"q"=>
{"type"=>"string",
"description"=>
"Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded."},
"search_in"=>{"type"=>"string", "description"=>"The fields to restrict your q search to.", "enum"=>["title", "description", "content"]},
"sources"=>
{"type"=>"string",
"description"=>"A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index."},
"domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to restrict the search to."},
"exclude_domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to remove from the results."},
"from"=>{"type"=>"string", "description"=>"A date and optional time for the oldest article allowed. This should be in ISO 8601 format."},
"to"=>{"type"=>"string", "description"=>"A date and optional time for the newest article allowed. This should be in ISO 8601 format."},
"language"=>{"type"=>"string", "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for.", "enum"=>["ar", "de", "en", "es", "fr", "he", "it", "nl", "no", "pt", "ru", "sv", "ud", "zh"]},
"sort_by"=>{"type"=>"string", "description"=>"The order to sort the articles in.", "enum"=>["relevancy", "popularity", "publishedAt"]},
"page_size"=>{"type"=>"integer", "description"=>"The number of results to return per page (request). 5 is the default, 100 is the maximum."},
"page"=>{"type"=>"integer", "description"=>"Use this to page through the results if the total results found is greater than the page size."}}}}}, After: irb(main):013> Langchain::Tool::NewsRetriever.new(api_key: ENV["NEWS_API_KEY"]).to_openai_tools
=>
[{"type"=>"function",
"function"=>
{"name"=>"news_retriever__get_everything",
"description"=>"Retrieve all news",
"parameters"=>
{"type"=>"object",
"properties"=>
{"q"=>{"type"=>"string", "description"=>"Keywords or phrases to search for in the article title and body."},
"search_in"=>{"type"=>"string", "description"=>"The fields to restrict your q search to. The possible options are: title, description, content."},
"sources"=>
{"type"=>"string",
"description"=>"A comma-seperated string of identifiers (maximum 20) for the news sources or blogs you want headlines from. Use the /sources endpoint to locate these programmatically or look at the sources index."},
"domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to restrict the search to."},
"exclude_domains"=>{"type"=>"string", "description"=>"A comma-seperated string of domains (eg bbc.co.uk, techcrunch.com, engadget.com) to remove from the results."},
"from"=>{"type"=>"string", "description"=>"A date and optional time for the oldest article allowed. This should be in ISO 8601 format."},
"to"=>{"type"=>"string", "description"=>"A date and optional time for the newest article allowed. This should be in ISO 8601 format."},
"language"=>{"type"=>"string", "description"=>"The 2-letter ISO-639-1 code of the language you want to get headlines for. Possible options: ar, de, en, es, fr, he, it, nl, no, pt, ru, se, ud, zh."},
"sort_by"=>{"type"=>"string", "description"=>"The order to sort the articles in. Possible options: relevancy, popularity, publishedAt."},
"page_size"=>{"type"=>"integer", "description"=>"The number of results to return per page. 20 is the API's default, 100 is the maximum. Our default is 5."},
"page"=>{"type"=>"integer", "description"=>"Use this to page through the results."}},
"required"=>[]}}}, Upon a quick glance -- the |
@dghirardo Btw -- @palladius told me that you guys met at the RubyDay! 🫶 |
@andreibondarev I am currently working on a pull request for the |
Hi @andreibondarev, Example (Tavily tool): # @param exclude_domains [Array<String>] A list of domains to specifically exclude from the search results. Default is None, which doesn't exclude any domains. While
You can only use the description field. Even if we find a solution, I think nested parameters with multiple levels are not easy to describe with YARD comments. |
Hmm... I think we may want to consider some sort of a decorator pattern, a la: llm_callable :get_everything,
description: "News Retriever: Search through millions of articles from over 150,000 large and small news sources and blogs",
properties: {
q: {
type: :string,
description: "Keywords or phrases to search for in the article title and body. Surround phrases with quotes (\") for exact match. Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Must be URL-encoded."
}
def get_everything(q:, ...)
... The really tricky thing here is figuring out a developer-friendly interface. |
Issue #637
This pull request implements the
tool_tailor
gem (https://github.com/kieranklaassen/tool_tailor) to automate the generation of tool annotations.Changes:
:ANNOTATIONS_PATH
constant has been replaced with:FUNCTIONS
, an array of symbols corresponding to the tool's functions.:NAME
,:FUNCTIONS
) have been implemented to ensure consistency and correctness.