Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPTSwarm (Graph-based Workflow) #2460

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mczhuge
Copy link
Contributor

@mczhuge mczhuge commented Jun 16, 2024

Sorry for the late response. I have been on vacation for the last 10 days.

I have merged GPTSwarm (ICML 2024 Oral Presentation) into this codebase. More details can be found in the Paper and the Code.

Despite this progress, there are five areas that require further enhancement to improve users' experiences with OpenDevin:

  1. agentskill Functionality: The agentskill function currently returns nothing. This issue arises because OpenDevin receives observations from the terminal. GAIA requires numerous file readers to retrieve return information. Consequently, the current OD GPTSwarm v1.0 is impacted. For instance, the original GPTSwarm can easily achieve 32.07% in GAIA, but due to the lack of multiple file readers, the best performance of OD GPTSwarm is also 32.07%.

  2. Stateless to Stateful Transition: The original GPTSwarm is stateless, whereas the current OD GPTSwarm could be improved by making it stateful. This improvement requires collaboration with others. It is possible that Q1 could be resolved by this transition, as OD receives observations from terminal output rather than a definitive return from a function. Does this assessment seem accurate?

  3. Web Navigation Skills: My initial use of web navigation skills resulted in very long context returns, which negatively impacted GAIA's performance. Consequently, I refrained from using it. However, as stated in the original paper and our previous paper (which also emphasized the importance of web navigation for performance improvement but did not include it), web navigation can significantly enhance performance.

  4. Web Searching Solutions: In line with the original GPTSwarm paper, I spent $40 to acquire searchapi for web searching. Could you recommend any free and high-quality web searching alternatives?

  5. Autonomous Pattern and SOP Integration: The current agent operates in a fully autonomous manner, but at times, an SOP or a human-predefined workflow (similar to MetaGPT) proves beneficial. For example, CodeActAgent resolves a GAIA problem using $5, whereas GPTSwarm solved 53 problems for under $5. Further improvements to OD GPTSwarm could enable OpenDevin to incorporate graph-based SOP and related functionalities.

I think better improving OD GPTSwarm can make OpenDevin has graph-based SOP and related functions.

@tobitege
Copy link
Collaborator

Just about your last question, have you looked at Perplexity.ai? They also have online search and and API etc.

@yufansong
Copy link
Collaborator

The agentskill function currently returns nothing, which makes it hard for GPTSwarm to get a return value. This significantly influences the performance.

Can you explain more on this? What do you mean agentskill function currently returns nothing.

but since lacking many file readers

For example what file readers are need?

GPTSwarm is stateless, which requires further improvement with others' help

Will take a look on this, but I may check your paper firstly. I find you didn't implememnt the step function.

it can easily achieve 32.07% using original GPTSwarm, but since lacking many file readers, the best performance of OD GPTSwarm is 32.07%.

Actually I am thinking, if currently agent skill and web navigation are curently not well done. What is the good point that opendevin can help? I am supprise that opendevin can achive that performance under above problem 🤔

I hope others can continue to develop and improve.

Will help recently when I am free.

@yufansong
Copy link
Collaborator

yufansong commented Jun 16, 2024

@mczhuge If you have no time in the future, maybe I will directly push on your PR. Or I can only add some review comments. WDYT?

Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks interesting! I hope we can get this agent incorporated.

However, I am a bit worried about all the dependencies that we would inherit from gptswarm:
https://github.com/metauto-ai/GPTSwarm/blob/ac058b80158027da8ffad8ed0bd464dc952eaa49/pyproject.toml#L23-L66

Maybe we can separate out gptswarm to be a different group in poetry that is optionally installed only if we want to use gptswarm.

@neubig
Copy link
Contributor

neubig commented Jun 16, 2024

@mczhuge or @yufansong , would either of you be able to see this to completion now?

@mczhuge
Copy link
Contributor Author

mczhuge commented Jun 17, 2024

Just about your last question, have you looked at Perplexity.ai? They also have online search and and API etc.

Thanks, any simple code demo to use?

@mczhuge
Copy link
Contributor Author

mczhuge commented Jun 17, 2024

The agentskill function currently returns nothing, which makes it hard for GPTSwarm to get a return value. This significantly influences the performance.

Can you explain more on this? What do you mean agentskill function currently returns nothing.

but since lacking many file readers

For example what file readers are need?

GPTSwarm is stateless, which requires further improvement with others' help

Will take a look on this, but I may check your paper firstly. I find you didn't implememnt the step function.

it can easily achieve 32.07% using original GPTSwarm, but since lacking many file readers, the best performance of OD GPTSwarm is 32.07%.

Actually I am thinking, if currently agent skill and web navigation are curently not well done. What is the good point that opendevin can help? I am supprise that opendevin can achive that performance under above problem 🤔

I hope others can continue to develop and improve.

Will help recently when I am free.

I re-writte the comments a bit: #2460 (comment)

Regarding The `agentskill` function currently returns nothing, which makes it hard for GPTSwarm to get a return value. This significantly influences the performance:

I think the mechanism of OD is to output information in the terminal, and then CodeActAgent gets such information as an observation from the terminal. Right? Currently, I do not follow this and just want to directly get a return value from a function. However, I think OD GPTSwarm could also follow the CodeActAgent observation style.

Regarding For example, what file readers are needed?:

In the original GPTSwarm readers, many different readers are used, but in #1914 and the following #2016, some of them have been deleted. This likely influences more than the potentially solvable 2-5 questions, making them unsolvable in GAIA.

Regarding GPTSwarm is stateless (me) and I find you didn't implement the `step` function:

Current OD GPTSwarm is followed some predefined human SOP or workflow in a graph-based design. I think OD GPTSwarm could fully follow the agenthub style (it may not be hard), but I am personally not 100% familiar with this, so I think help is needed here.

Regarding if currently agent skill and web navigation are not well done:

I think the current web navigation is good but specific to some benchmarks. For GAIA or many applications, just returning compressed but informative observations is enough.

@mczhuge
Copy link
Contributor Author

mczhuge commented Jun 17, 2024

@mczhuge If you have no time in the future, maybe I will directly push on your PR. Or I can only add some review comments. WDYT?

Of course you can!! 🥰 The first several days as an intern may require some time for onboarding and settling in. I also have time to discuss more and may also contribute some code. If you could help push this PR forward, that would be super great! Two potential new features may come to OpenDevin if we finish this PR:

  • Introduce Graph Agent Design in OD: This is also present in GPTSwarm, LangGraph, and DsPy.
  • Potential Codebase for OD Self-Evolution: This would enable the OpenDevin agent to evolve similarly to GPTSwarm (as in the paper, give the opportunities for node and edge optimizations), giving the OD framework greater potential.

@mczhuge
Copy link
Contributor Author

mczhuge commented Jun 17, 2024

This looks interesting! I hope we can get this agent incorporated.

However, I am a bit worried about all the dependencies that we would inherit from gptswarm: https://github.com/metauto-ai/GPTSwarm/blob/ac058b80158027da8ffad8ed0bd464dc952eaa49/pyproject.toml#L23-L66

Maybe we can separate out gptswarm to be a different group in poetry that is optionally installed only if we want to use gptswarm.

Hi @neubig , some of them has already merged in OD. See #1914 and #2016. Currently, we only need to improve the usage.

@tobitege
Copy link
Collaborator

Just about your last question, have you looked at Perplexity.ai? They also have online search and and API etc.

Thanks, any simple code demo to use?

There is an actual Python package that would make it easier:
https://pypi.org/project/PerplexiPy/

Their sample code on that page reads like this:

client = PerplexityClient() \
print(client.query('What is the meaning of 42?') \
for result in client.queryStreamable('List of all US presidents'): \
    print(result)

Hope, this helps. 😃

@mczhuge mczhuge changed the title Add GPTSwarm Add GPTSwarm (Graph-based Workflow) Jun 17, 2024
@yufansong
Copy link
Collaborator

@mczhuge or @yufansong , would either of you be able to see this to completion now?

If @mczhuge have no time, I am glad to take this task.

@mczhuge
Copy link
Contributor Author

mczhuge commented Jun 17, 2024

@mczhuge or @yufansong , would either of you be able to see this to completion now?

If @mczhuge have no time, I am glad to take this task.

Great Yufan! Please go ahead, I am on the intern orientation

NOTE: GPTSwarm need manually export...
TODO: Fix it
export SANDBOX_ENV_OPENAI_API_KEY="sk-***"
"""
OPENAI_API_KEY = os.getenv(
'OPENAI_API_KEY', os.getenv('SANDBOX_ENV_OPENAI_API_KEY', '')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the sandbox, this would be exported again without the prefix, i.e. as "OPENAI_API_KEY".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants