Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt Preparation of Kosmos-2 Object Detection Fine-tuning #1576

Open
KevinHooah opened this issue Jun 14, 2024 · 1 comment
Open

Prompt Preparation of Kosmos-2 Object Detection Fine-tuning #1576

KevinHooah opened this issue Jun 14, 2024 · 1 comment

Comments

@KevinHooah
Copy link

KevinHooah commented Jun 14, 2024

Describe
Model I am using: Kosmos-2

Hi! I am working on fine-tuning the Kosmos-2 model for my own application. In short, the target may appear multiple times in the image (e.g., cars in a parking lot), and the cases can be there is only one target in the image as well.

Right now, I am preparing the dataset like following:

    if len(bboxes) > 1:
        text = "<grounding>" + "<phrase> several {target}s</phrase>"
    else:
        text = "<grounding>" + "<phrase> a {target}</phrase>"
    data_list.append({'bbox': [bboxes], 'image': image, 'text': text})

In this code, the bboxes is the human annotated bounding box, the format is list of list of tuples. The [target] is the placeholder for my target (which is a noun word.)

When I train the model with such prompts, it still output one and only one bounding box for the target, even there are multiple targets in the image.

For example, let's say the target is "car", the model will only output a bounding box for one of multiple cars in the image.

May I ask how can I solve this issue?

Note
"Car" is an random example, the target is something we believe it's rare in the Kosmos-2 pre-training data.

@pengzhiliang
Copy link
Contributor

Hello, as you know, we haven't fine-tuned this model on any specific object detection dataset, so we cannot control how many bboxes the model will generate; it could be one or multiple.
Perhaps you can try some different prompts:
Describe this image in detailed. a {target} / several {targets}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants