[Question]: Extending NER tags of Hunflair #3447

skywalker2202 · 2024-04-26T14:26:15Z

Question

I wanted to fine-tune the Hunflair-gene model and extend the tags in the original model. The Hunflair gene contains the following items - ['', 'O', 'S-Gene', 'B-Gene', 'I-Gene', 'E-Gene', '', ''].

However, when I do "previous_tag_dictionary.span_labels()" gives "AttributeError: 'Dictionary' object has no attribute 'span_labels'"

previous_tagger = SequenceTagger.load("hunflair-gene") previous_tag_dictionary = previous_tagger.label_dictionary previous_tag_dictionary.get_items()

outputs ['<unk>', 'O', 'S-Gene', 'B-Gene', 'I-Gene', 'E-Gene', '<START>', '<STOP>'].

I have my annotated corpus which contains 2 tags - LIG and REC. I have converted them to a column-corpus and created a new tag dictionary from it.
columns = {0: 'text', 1: 'ner'} corpus = ColumnCorpus(config["data_folder"], columns, train_file='train.txt', dev_file='val.txt', test_file="test.txt") new_tag_dictionary = corpus.make_label_dictionary(label_type='ner', add_unk=False) new_tag_dictionary.get_items()

Which outputs

`2024-04-26 16:16:18,169 Dictionary created for label 'ner' with 2 values: LIG (seen 719 times), REC (seen 296 times)

['LIG', 'REC']
`
I want to finetune the hunflair-gene on the new dataset. As per my understanding, I need to create a new tag dictionary. When I try the following

for old_tag in previous_tag_dictionary.get_items():
new_tag_dictionary.add_item(str(old_tag))

print(f"Updated tag dictionary : {new_tag_dictionary}")
it outputsUpdated tag dictionary : Dictionary with 10 tags: LIG, REC, , O, S-Gene, B-Gene, I-Gene, E-Gene, ,

However, when I do

tagger_new = SequenceTagger( hidden_size=256, embeddings=previous_tagger.embeddings, tag_dictionary=new_tag_dictionary, tag_type='ner', )
it outputs

2024-04-26 16:16:31,545 SequenceTagger predicts: Dictionary with 37 tags: O, S-LIG, B-LIG, E-LIG, I-LIG, S-REC, B-REC, E-REC, I-REC, S-O, B-O, E-O, I-O, S-S-Gene, B-S-Gene, E-S-Gene, I-S-Gene, S-B-Gene, B-B-Gene, E-B-Gene, I-B-Gene, S-I-Gene, B-I-Gene, E-I-Gene, I-I-Gene, S-E-Gene, B-E-Gene, E-E-Gene, I-E-Gene, S-<START>, B-<START>, E-<START>, I-<START>, S-<STOP>, B-<STOP>, E-<STOP>, I-<STOP>
These are too many tags. Any help will me appreciated.

The text was updated successfully, but these errors were encountered:

skywalker2202 added the question Further information is requested label Apr 26, 2024

skywalker2202 changed the title ~~[Question]: Extending NER tags of Hunflair~~ [Issue]: Extending NER tags of Hunflair Apr 26, 2024

skywalker2202 changed the title ~~[Issue]: Extending NER tags of Hunflair~~ [Question]: Extending NER tags of Hunflair Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Extending NER tags of Hunflair #3447

[Question]: Extending NER tags of Hunflair #3447

skywalker2202 commented Apr 26, 2024 •

edited

Loading

[Question]: Extending NER tags of Hunflair #3447

[Question]: Extending NER tags of Hunflair #3447

Comments

skywalker2202 commented Apr 26, 2024 • edited Loading

Question

skywalker2202 commented Apr 26, 2024 •

edited

Loading