Add C specific harness generation logic #337

DavidKorczynski · 2024-06-15T18:36:50Z

The default C++ logic has some limitations for C projects that's causing failures during e.g. builds. A C-specific flow would benefit in alleviating these and also open the door up on how to add more harness-generation workflows. Majority of this will need changes in prompt_builder by simply adding a new prompt class.

I'd like to take the following high-level steps to achieve this:

Add a new flow with specific C logic that fits into the current system without being intrusive (i.e. existing experiments will continue the same), where the C specific flow shows improvements in local runs in comparison to the existing default builder.
Integrate in the CI so we can run experiments with the C-specific flow.
Continue expanding on the C specific flow.
Migrate so we can run multiple different prompt on the same experiment. This will be interesting, e.g. I expect we will have a situation where there is no clear general winner but each prompt will have their own set of targets they perform well in. We can use this to guide research further. I think there should be a larger spread to account for prompts not necessarily being a 1-dimensional comparison (which is better/worse) but rather a multi-dimensional (x performs better in scenarios m,z,v and y performs better in a,b,c).

The text was updated successfully, but these errors were encountered:

DavidKorczynski · 2024-06-15T21:15:40Z

The first step has been achieved with #338

This implements the first step of #337 Adds a harness generation flow that, in comparison to the existing default builder: - Provides repository link for the target project. - Is C-specific, uses no CPP code language or similar. - Includes post-processing on the generated code to add certain header files we always want in the harnesses. - Adds constraints on header files the LLM should include in the harnesses. Does this by providing absolute paths to header files in the OSS-Fuzz containers. - Uses some new fuzz introspector APIs to help with context. This PR was made to have no intrusion on the existing workflow, i.e. experiments can continue as they are running now. However, there are several improvements that can be made and I prefer to have these in follow-up PRs: 1) Fixing logic relies on the default prompt builder. This is because the code fixer creates a new prompt builder https://github.com/google/oss-fuzz-gen/blob/09d2235f3957c4d43367ecbd7f3f88147b487abf/llm_toolkit/code_fixer.py#L408 This in fact means that the C++ default logic is used for fixing JVM targets. I would like to change the flow here in the medium term such that the code fixing logic reuses the one we used for main harness generation. I think this should be changed so the prompt builder comes closer to a "harness generator" abstraction and has more knowledge of the target under analysis. But, I prefer to do this later as the PR is already big. 2) Integrate so we can run experiments in the CI with bother or either harness generation flows. 3) Add new features to the prompt builder. Ref: #337 --------- Signed-off-by: David Korczynski <[email protected]>

DavidKorczynski self-assigned this Jun 15, 2024

This was referenced Jun 15, 2024

Add C-specific prompt #338

Merged

project_src: fix C harness identification #333

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C specific harness generation logic #337

Add C specific harness generation logic #337

DavidKorczynski commented Jun 15, 2024 •

edited

Loading

DavidKorczynski commented Jun 15, 2024

Add C specific harness generation logic #337

Add C specific harness generation logic #337

Comments

DavidKorczynski commented Jun 15, 2024 • edited Loading

DavidKorczynski commented Jun 15, 2024

DavidKorczynski commented Jun 15, 2024 •

edited

Loading