Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C specific harness generation logic #337

Open
DavidKorczynski opened this issue Jun 15, 2024 · 1 comment
Open

Add C specific harness generation logic #337

DavidKorczynski opened this issue Jun 15, 2024 · 1 comment
Assignees

Comments

@DavidKorczynski
Copy link
Collaborator

DavidKorczynski commented Jun 15, 2024

The default C++ logic has some limitations for C projects that's causing failures during e.g. builds. A C-specific flow would benefit in alleviating these and also open the door up on how to add more harness-generation workflows. Majority of this will need changes in prompt_builder by simply adding a new prompt class.

I'd like to take the following high-level steps to achieve this:

  1. Add a new flow with specific C logic that fits into the current system without being intrusive (i.e. existing experiments will continue the same), where the C specific flow shows improvements in local runs in comparison to the existing default builder.
  2. Integrate in the CI so we can run experiments with the C-specific flow.
  3. Continue expanding on the C specific flow.
  4. Migrate so we can run multiple different prompt on the same experiment. This will be interesting, e.g. I expect we will have a situation where there is no clear general winner but each prompt will have their own set of targets they perform well in. We can use this to guide research further. I think there should be a larger spread to account for prompts not necessarily being a 1-dimensional comparison (which is better/worse) but rather a multi-dimensional (x performs better in scenarios m,z,v and y performs better in a,b,c).
@DavidKorczynski
Copy link
Collaborator Author

The first step has been achieved with #338

DavidKorczynski added a commit that referenced this issue Jun 18, 2024
This implements the first step of
#337

Adds a harness generation flow that, in comparison to the existing
default builder:
- Provides repository link for the target project.
- Is C-specific, uses no CPP code language or similar.
- Includes post-processing on the generated code to add certain header
files we always want in the harnesses.
- Adds constraints on header files the LLM should include in the
harnesses. Does this by providing absolute paths to header files in the
OSS-Fuzz containers.
- Uses some new fuzz introspector APIs to help with context.

This PR was made to have no intrusion on the existing workflow, i.e.
experiments can continue as they are running now. However, there are
several improvements that can be made and I prefer to have these in
follow-up PRs:

1) Fixing logic relies on the default prompt builder. This is because
the code fixer creates a new prompt builder
https://github.com/google/oss-fuzz-gen/blob/09d2235f3957c4d43367ecbd7f3f88147b487abf/llm_toolkit/code_fixer.py#L408
This in fact means that the C++ default logic is used for fixing JVM
targets. I would like to change the flow here in the medium term such
that the code fixing logic reuses the one we used for main harness
generation. I think this should be changed so the prompt builder comes
closer to a "harness generator" abstraction and has more knowledge of
the target under analysis. But, I prefer to do this later as the PR is
already big.
2) Integrate so we can run experiments in the CI with bother or either
harness generation flows.
3) Add new features to the prompt builder.

Ref: #337

---------

Signed-off-by: David Korczynski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant