You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to fine-tune the Hunflair-gene model and extend the tags in the original model. The Hunflair gene contains the following items - ['', 'O', 'S-Gene', 'B-Gene', 'I-Gene', 'E-Gene', '', ''].
However, when I do "previous_tag_dictionary.span_labels()" gives "AttributeError: 'Dictionary' object has no attribute 'span_labels'"
I have my annotated corpus which contains 2 tags - LIG and REC. I have converted them to a column-corpus and created a new tag dictionary from it. columns = {0: 'text', 1: 'ner'} corpus = ColumnCorpus(config["data_folder"], columns, train_file='train.txt', dev_file='val.txt', test_file="test.txt") new_tag_dictionary = corpus.make_label_dictionary(label_type='ner', add_unk=False) new_tag_dictionary.get_items()
Which outputs
`2024-04-26 16:16:18,169 Dictionary created for label 'ner' with 2 values: LIG (seen 719 times), REC (seen 296 times)
['LIG', 'REC']
`
I want to finetune the hunflair-gene on the new dataset. As per my understanding, I need to create a new tag dictionary. When I try the following
for old_tag in previous_tag_dictionary.get_items():
new_tag_dictionary.add_item(str(old_tag))
print(f"Updated tag dictionary : {new_tag_dictionary}") it outputsUpdated tag dictionary : Dictionary with 10 tags: LIG, REC, , O, S-Gene, B-Gene, I-Gene, E-Gene, ,
However, when I do
tagger_new = SequenceTagger( hidden_size=256, embeddings=previous_tagger.embeddings, tag_dictionary=new_tag_dictionary, tag_type='ner', )
it outputs
2024-04-26 16:16:31,545 SequenceTagger predicts: Dictionary with 37 tags: O, S-LIG, B-LIG, E-LIG, I-LIG, S-REC, B-REC, E-REC, I-REC, S-O, B-O, E-O, I-O, S-S-Gene, B-S-Gene, E-S-Gene, I-S-Gene, S-B-Gene, B-B-Gene, E-B-Gene, I-B-Gene, S-I-Gene, B-I-Gene, E-I-Gene, I-I-Gene, S-E-Gene, B-E-Gene, E-E-Gene, I-E-Gene, S-<START>, B-<START>, E-<START>, I-<START>, S-<STOP>, B-<STOP>, E-<STOP>, I-<STOP>
These are too many tags. Any help will me appreciated.
The text was updated successfully, but these errors were encountered:
Question
I wanted to fine-tune the Hunflair-gene model and extend the tags in the original model. The Hunflair gene contains the following items - ['', 'O', 'S-Gene', 'B-Gene', 'I-Gene', 'E-Gene', '', ''].
However, when I do "previous_tag_dictionary.span_labels()" gives "AttributeError: 'Dictionary' object has no attribute 'span_labels'"
previous_tagger = SequenceTagger.load("hunflair-gene") previous_tag_dictionary = previous_tagger.label_dictionary previous_tag_dictionary.get_items()
outputs
['<unk>', 'O', 'S-Gene', 'B-Gene', 'I-Gene', 'E-Gene', '<START>', '<STOP>'].
I have my annotated corpus which contains 2 tags - LIG and REC. I have converted them to a column-corpus and created a new tag dictionary from it.
columns = {0: 'text', 1: 'ner'} corpus = ColumnCorpus(config["data_folder"], columns, train_file='train.txt', dev_file='val.txt', test_file="test.txt") new_tag_dictionary = corpus.make_label_dictionary(label_type='ner', add_unk=False) new_tag_dictionary.get_items()
Which outputs
`2024-04-26 16:16:18,169 Dictionary created for label 'ner' with 2 values: LIG (seen 719 times), REC (seen 296 times)
['LIG', 'REC']
`
I want to finetune the hunflair-gene on the new dataset. As per my understanding, I need to create a new tag dictionary. When I try the following
for old_tag in previous_tag_dictionary.get_items():
new_tag_dictionary.add_item(str(old_tag))
print(f"Updated tag dictionary : {new_tag_dictionary}")
it outputs
Updated tag dictionary : Dictionary with 10 tags: LIG, REC, , O, S-Gene, B-Gene, I-Gene, E-Gene, ,However, when I do
tagger_new = SequenceTagger( hidden_size=256, embeddings=previous_tagger.embeddings, tag_dictionary=new_tag_dictionary, tag_type='ner', )
it outputs
2024-04-26 16:16:31,545 SequenceTagger predicts: Dictionary with 37 tags: O, S-LIG, B-LIG, E-LIG, I-LIG, S-REC, B-REC, E-REC, I-REC, S-O, B-O, E-O, I-O, S-S-Gene, B-S-Gene, E-S-Gene, I-S-Gene, S-B-Gene, B-B-Gene, E-B-Gene, I-B-Gene, S-I-Gene, B-I-Gene, E-I-Gene, I-I-Gene, S-E-Gene, B-E-Gene, E-E-Gene, I-E-Gene, S-<START>, B-<START>, E-<START>, I-<START>, S-<STOP>, B-<STOP>, E-<STOP>, I-<STOP>
These are too many tags. Any help will me appreciated.
The text was updated successfully, but these errors were encountered: