Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Custom trained model not correctly loaded when using FlairEmbeddings #3459

Open
petermartens1992 opened this issue May 19, 2024 · 2 comments · May be fixed by #3477
Open

[Bug]: Custom trained model not correctly loaded when using FlairEmbeddings #3459

petermartens1992 opened this issue May 19, 2024 · 2 comments · May be fixed by #3477
Labels
bug Something isn't working

Comments

@petermartens1992
Copy link

petermartens1992 commented May 19, 2024

Describe the bug

The F-scores shown after training a SequenceTagger model with FlairEmbeddings are way higher than when loading the model from file and doing an evaluation on it. Also the predictions from a loaded model seem way off compared to the initial F-scores.

The issue doesn't appear when initializing a new model and only loading the state from file:

sequence_tagger.load_state_dict(SequenceTagger.load(model_path).state_dict())

Also, when removing the FlairEmbeddings the F-scores between training and evaluation correctly align again.

See here the code to reproduce the issue (using flair 0.13.1):

import sys
import logging
import os

from typing import List
from pathlib import Path

from flair.datasets import UD_ENGLISH
from flair.data import Corpus
from flair.models import SequenceTagger
from flair.embeddings import TokenEmbeddings, WordEmbeddings, StackedEmbeddings, FlairEmbeddings, CharacterEmbeddings, OneHotEmbeddings, TransformerWordEmbeddings
from flair.datasets import ColumnCorpus
from flair.trainers import ModelTrainer

language = "nl"
model_type = "upos"

configs["train"] = {
    "language": language,
    "model_type": model_type,
    "tag_format": "BIOES",
    "rnn_hidden_size": 256,
    "rnn_layers": 1,
    "use_crf": True,
    "word_dropout": 0.2,
    "dropout": 0.2,
    "learning_rate": 0.1,
    "mini_batch_size": 32,
    "mini_batch_chunk_size": 32,
    "train_initial_hidden_state": True,
    "embeddings_storage_ratio": 1.0,
    "fine_tune_flair_embeddings": False,
}

output_model_name = configs["train"]["language"] + "-flair-" + configs["train"]["model_source"] + "-example"

output_path = "~/models/flair/" + configs["train"]["model_type"] + "/" + output_model_name
model_path = output_path + "/best-model.pt"
logger.info("model path: " + model_path)

# 1. what label do we want to predict?
label_type = configs["train"]["model_type"]

# 2. load the corpus
corpus = UD_ENGLISH().downsample(0.1)
print(corpus)

# 3. make the label dictionary from the corpus
label_dict = corpus.make_label_dictionary(label_type=label_type)
print(label_dict)

# 4. initialize embeddings
embedding_types: List[TokenEmbeddings] = [
    WordEmbeddings(configs["train"]["language"] + "-wiki"),

    CharacterEmbeddings(),

    FlairEmbeddings(configs["train"]["language"] + "-forward", fine_tune=configs["train"]["fine_tune_flair_embeddings"]),
    FlairEmbeddings(configs["train"]["language"] + "-backward", fine_tune=configs["train"]["fine_tune_flair_embeddings"])
]
embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
sequence_tagger = SequenceTagger(hidden_size=256,
                                 embeddings=embeddings,
                                 tag_dictionary=label_dict,
                                 tag_type=label_type)

# 6. initialize trainer
trainer = ModelTrainer(sequence_tagger, corpus)

# 7. start training
trainer.train(
    output_path,
    learning_rate=0.1,
    mini_batch_size=configs["train"]["mini_batch_size"],
    mini_batch_chunk_size=configs["train"]["mini_batch_chunk_size"],
    max_epochs=5
)

logger.info("evaluating model via load_state_dict...")
sequence_tagger.load_state_dict(SequenceTagger.load(model_path).state_dict())
logger.info(f'Model: "{sequence_tagger}"')

result = sequence_tagger.evaluate(
    data_points=corpus.test,
    gold_label_type=label_type,
    mini_batch_size=configs["train"]["mini_batch_size"],
    mini_batch_chunk_size=configs["train"]["mini_batch_chunk_size"],
    return_loss=False,
)

logger.info("detailed results:")
logger.info(result.detailed_results)
logger.info("main score:")
logger.info(str(result.main_score))
logger.info("classification report:")
logger.info(str(result.classification_report))
logger.info("scores:")
logger.info(str(result.scores))

logger.info("evaluating model via load...")
sequence_tagger = SequenceTagger.load(model_path)  # bad results when loading the model in the default way if the model contains flair embeddings
result = sequence_tagger.evaluate(
    data_points=corpus.test,
    gold_label_type=label_type,
    mini_batch_size=configs["train"]["mini_batch_size"],
    mini_batch_chunk_size=configs["train"]["mini_batch_chunk_size"],
    return_loss=False,
)

logger.info("detailed results:")
logger.info(result.detailed_results)
logger.info("main score:")
logger.info(str(result.main_score))
logger.info("classification report:")
logger.info(str(result.classification_report))
logger.info("scores:")
logger.info(str(result.scores))

To Reproduce

.

Expected behavior

.

Logs and Stack traces

No response

Screenshots

No response

Additional Context

No response

Environment

.

@petermartens1992 petermartens1992 added the bug Something isn't working label May 19, 2024
@petermartens1992
Copy link
Author

See here the relevant logs:

2024-05-19 22:43:56,556 Reading data from /Users/peter/.flair/datasets/ud_english
2024-05-19 22:43:56,556 Train: /Users/peter/.flair/datasets/ud_english/en_ewt-ud-train.conllu
2024-05-19 22:43:56,556 Dev: /Users/peter/.flair/datasets/ud_english/en_ewt-ud-dev.conllu
2024-05-19 22:43:56,556 Test: /Users/peter/.flair/datasets/ud_english/en_ewt-ud-test.conllu
2024-05-19 22:44:00,014 Warning: An empty Sentence was created! Are there empty strings in your dataset?
2024-05-19 22:44:00,205 Warning: An empty Sentence was created! Are there empty strings in your dataset?
2024-05-19 22:44:00,397 Warning: An empty Sentence was created! Are there empty strings in your dataset?
Corpus: 1254 train + 200 dev + 208 test sentences
2024-05-19 22:44:00,400 Computing label dictionary. Progress:
0it [00:00, ?it/s]
1254it [00:00, 66655.56it/s]
2024-05-19 22:44:00,432 Dictionary created for label 'upos' with 17 values: NOUN (seen 3195 times), PUNCT (seen 2235 times), VERB (seen 2213 times), PRON (seen 1803 times), ADP (seen 1707 times), DET (seen 1525 times), PROPN (seen 1257 times), AUX (seen 1183 times), ADJ (seen 1169 times), ADV (seen 950 times), CCONJ (seen 635 times), PART (seen 515 times), SCONJ (seen 447 times), NUM (seen 418 times), X (seen 84 times), SYM (seen 80 times), INTJ (seen 65 times)
Dictionary with 17 tags: NOUN, PUNCT, VERB, PRON, ADP, DET, PROPN, AUX, ADJ, ADV, CCONJ, PART, SCONJ, NUM, X, SYM, INTJ
2024-05-19 22:44:00,433 : INFO : loading KeyedVectors object from /Users/peter/.flair/embeddings/nl-wiki-fasttext-300d-1M
2024-05-19 22:44:01,882 : INFO : loading vectors from /Users/peter/.flair/embeddings/nl-wiki-fasttext-300d-1M.vectors.npy with mmap=None
2024-05-19 22:44:02,014 : INFO : setting ignored attribute vectors_norm to None
2024-05-19 22:44:04,410 SequenceTagger predicts: Dictionary with 17 tags: NOUN, PUNCT, VERB, PRON, ADP, DET, PROPN, AUX, ADJ, ADV, CCONJ, PART, SCONJ, NUM, X, SYM, INTJ
/Users/peter/virtualenvs/learner/lib/python3.11/site-packages/flair/trainers/trainer.py:84: UserWarning: There should be no best model saved at epoch 1 except there is a model from previous trainings in your training folder. All previous best models will be deleted.
  warnings.warn(
2024-05-19 22:44:04,517 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:04,518 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'nl-wiki'
      (embedding): Embedding(871022, 300)
    )
    (list_embedding_1): CharacterEmbeddings(
      (char_embedding): Embedding(275, 25)
      (char_rnn): LSTM(25, 25, bidirectional=True)
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.1, inplace=False)
        (encoder): Embedding(7632, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_3): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.1, inplace=False)
        (encoder): Embedding(7632, 100)
        (rnn): LSTM(100, 2048)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4446, out_features=4446, bias=True)
  (rnn): LSTM(4446, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=19, bias=True)
  (loss_function): ViterbiLoss()
  (crf): CRF()
)"
2024-05-19 22:44:04,525 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:04,526 Corpus: 1254 train + 200 dev + 208 test sentences
2024-05-19 22:44:04,526 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:04,526 Train:  1254 sentences
2024-05-19 22:44:04,526         (train_with_dev=False, train_with_test=False)
2024-05-19 22:44:04,526 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:04,526 Training Params:
2024-05-19 22:44:04,526  - learning_rate: "0.1"
2024-05-19 22:44:04,526  - mini_batch_size: "32"
2024-05-19 22:44:04,526  - max_epochs: "5"
2024-05-19 22:44:04,526  - shuffle: "True"
2024-05-19 22:44:04,526 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:04,526 Plugins:
2024-05-19 22:44:04,526  - AnnealOnPlateau | patience: '3', anneal_factor: '0.5', min_learning_rate: '0.0001'
2024-05-19 22:44:04,526 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:04,526 Final evaluation on model from best epoch (best-model.pt)
2024-05-19 22:44:04,526  - metric: "('micro avg', 'f1-score')"
2024-05-19 22:44:04,526 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:04,526 Computation:
2024-05-19 22:44:04,526  - compute on device: cpu
2024-05-19 22:44:04,526  - embedding storage: cpu
2024-05-19 22:44:04,526 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:04,526 Model training base path: "/Users/peter/my_workspace/apps/kbo/test/data/models/flair/upos/nl-flair-ma-example"
2024-05-19 22:44:04,526 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:04,526 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:10,450 epoch 1 - iter 4/40 - loss 3.11703381 - time (sec): 5.92 - samples/sec: 383.87 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:14,846 epoch 1 - iter 8/40 - loss 2.99080354 - time (sec): 10.32 - samples/sec: 401.19 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:20,196 epoch 1 - iter 12/40 - loss 2.87754428 - time (sec): 15.67 - samples/sec: 387.12 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:25,859 epoch 1 - iter 16/40 - loss 2.75815843 - time (sec): 21.33 - samples/sec: 387.43 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:29,887 epoch 1 - iter 20/40 - loss 2.68082195 - time (sec): 25.36 - samples/sec: 400.89 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:33,934 epoch 1 - iter 24/40 - loss 2.57787456 - time (sec): 29.41 - samples/sec: 412.85 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:39,227 epoch 1 - iter 28/40 - loss 2.49197920 - time (sec): 34.70 - samples/sec: 406.96 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:43,540 epoch 1 - iter 32/40 - loss 2.41119918 - time (sec): 39.01 - samples/sec: 413.96 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:48,207 epoch 1 - iter 36/40 - loss 2.34644491 - time (sec): 43.68 - samples/sec: 412.01 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:52,580 epoch 1 - iter 40/40 - loss 2.29401442 - time (sec): 48.05 - samples/sec: 405.40 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:44:52,580 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:52,580 EPOCH 1 done: loss 2.2940 - lr: 0.100000
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.01s/it]
2024-05-19 22:44:56,614 DEV : loss 1.432646632194519 - f1-score (micro avg)  0.5663
2024-05-19 22:44:56,617  - 0 epochs without improvement
2024-05-19 22:44:56,617 saving best model
2024-05-19 22:44:58,177 ----------------------------------------------------------------------------------------------------
2024-05-19 22:44:59,135 epoch 2 - iter 4/40 - loss 1.53838072 - time (sec): 0.96 - samples/sec: 1857.98 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:00,217 epoch 2 - iter 8/40 - loss 1.51262140 - time (sec): 2.04 - samples/sec: 1882.93 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:01,382 epoch 2 - iter 12/40 - loss 1.48790458 - time (sec): 3.20 - samples/sec: 1875.50 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:02,513 epoch 2 - iter 16/40 - loss 1.46567892 - time (sec): 4.33 - samples/sec: 1829.43 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:03,558 epoch 2 - iter 20/40 - loss 1.42265719 - time (sec): 5.38 - samples/sec: 1874.88 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:04,631 epoch 2 - iter 24/40 - loss 1.40934408 - time (sec): 6.45 - samples/sec: 1865.27 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:05,678 epoch 2 - iter 28/40 - loss 1.39057144 - time (sec): 7.50 - samples/sec: 1865.12 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:06,912 epoch 2 - iter 32/40 - loss 1.36770397 - time (sec): 8.73 - samples/sec: 1836.66 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:07,901 epoch 2 - iter 36/40 - loss 1.35132077 - time (sec): 9.72 - samples/sec: 1831.59 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:08,690 epoch 2 - iter 40/40 - loss 1.33653705 - time (sec): 10.51 - samples/sec: 1853.19 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:08,690 ----------------------------------------------------------------------------------------------------
2024-05-19 22:45:08,690 EPOCH 2 done: loss 1.3365 - lr: 0.100000
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11.95it/s]
2024-05-19 22:45:09,034 DEV : loss 1.0176340341567993 - f1-score (micro avg)  0.6583
2024-05-19 22:45:09,037  - 0 epochs without improvement
2024-05-19 22:45:09,037 saving best model
2024-05-19 22:45:10,331 ----------------------------------------------------------------------------------------------------
2024-05-19 22:45:11,378 epoch 3 - iter 4/40 - loss 1.14705407 - time (sec): 1.05 - samples/sec: 2073.74 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:12,608 epoch 3 - iter 8/40 - loss 1.16093182 - time (sec): 2.28 - samples/sec: 1856.18 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:13,705 epoch 3 - iter 12/40 - loss 1.14477524 - time (sec): 3.37 - samples/sec: 1886.09 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:14,779 epoch 3 - iter 16/40 - loss 1.12281497 - time (sec): 4.45 - samples/sec: 1888.73 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:15,929 epoch 3 - iter 20/40 - loss 1.08546015 - time (sec): 5.60 - samples/sec: 1862.61 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:16,970 epoch 3 - iter 24/40 - loss 1.06956752 - time (sec): 6.64 - samples/sec: 1856.05 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:17,963 epoch 3 - iter 28/40 - loss 1.06257897 - time (sec): 7.63 - samples/sec: 1856.63 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:18,929 epoch 3 - iter 32/40 - loss 1.05611767 - time (sec): 8.60 - samples/sec: 1882.86 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:19,978 epoch 3 - iter 36/40 - loss 1.04633613 - time (sec): 9.65 - samples/sec: 1870.56 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:20,723 epoch 3 - iter 40/40 - loss 1.04027266 - time (sec): 10.39 - samples/sec: 1874.55 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:20,723 ----------------------------------------------------------------------------------------------------
2024-05-19 22:45:20,724 EPOCH 3 done: loss 1.0403 - lr: 0.100000
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11.97it/s]
2024-05-19 22:45:21,067 DEV : loss 0.8021050691604614 - f1-score (micro avg)  0.7358
2024-05-19 22:45:21,070  - 0 epochs without improvement
2024-05-19 22:45:21,070 saving best model
2024-05-19 22:45:22,321 ----------------------------------------------------------------------------------------------------
2024-05-19 22:45:23,512 epoch 4 - iter 4/40 - loss 0.92632577 - time (sec): 1.19 - samples/sec: 1655.39 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:24,656 epoch 4 - iter 8/40 - loss 0.92491874 - time (sec): 2.33 - samples/sec: 1708.90 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:25,599 epoch 4 - iter 12/40 - loss 0.91281804 - time (sec): 3.28 - samples/sec: 1796.95 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:26,599 epoch 4 - iter 16/40 - loss 0.89991942 - time (sec): 4.28 - samples/sec: 1831.22 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:27,758 epoch 4 - iter 20/40 - loss 0.90474803 - time (sec): 5.44 - samples/sec: 1825.32 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:28,758 epoch 4 - iter 24/40 - loss 0.89691188 - time (sec): 6.44 - samples/sec: 1840.54 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:29,845 epoch 4 - iter 28/40 - loss 0.90225204 - time (sec): 7.52 - samples/sec: 1854.31 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:30,880 epoch 4 - iter 32/40 - loss 0.89831665 - time (sec): 8.56 - samples/sec: 1882.58 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:31,891 epoch 4 - iter 36/40 - loss 0.88889293 - time (sec): 9.57 - samples/sec: 1893.29 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:32,649 epoch 4 - iter 40/40 - loss 0.88337025 - time (sec): 10.33 - samples/sec: 1886.31 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:32,649 ----------------------------------------------------------------------------------------------------
2024-05-19 22:45:32,649 EPOCH 4 done: loss 0.8834 - lr: 0.100000
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 12.38it/s]
2024-05-19 22:45:32,982 DEV : loss 0.6276795268058777 - f1-score (micro avg)  0.7951
2024-05-19 22:45:32,985  - 0 epochs without improvement
2024-05-19 22:45:32,985 saving best model
2024-05-19 22:45:34,207 ----------------------------------------------------------------------------------------------------
2024-05-19 22:45:35,325 epoch 5 - iter 4/40 - loss 0.77127447 - time (sec): 1.12 - samples/sec: 1793.88 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:36,589 epoch 5 - iter 8/40 - loss 0.77054641 - time (sec): 2.38 - samples/sec: 1689.22 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:37,772 epoch 5 - iter 12/40 - loss 0.78947517 - time (sec): 3.56 - samples/sec: 1728.50 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:38,709 epoch 5 - iter 16/40 - loss 0.79137096 - time (sec): 4.50 - samples/sec: 1793.57 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:39,783 epoch 5 - iter 20/40 - loss 0.78869078 - time (sec): 5.58 - samples/sec: 1823.11 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:40,954 epoch 5 - iter 24/40 - loss 0.78397897 - time (sec): 6.75 - samples/sec: 1847.86 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:41,856 epoch 5 - iter 28/40 - loss 0.78268129 - time (sec): 7.65 - samples/sec: 1868.55 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:42,925 epoch 5 - iter 32/40 - loss 0.77881323 - time (sec): 8.72 - samples/sec: 1854.61 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:43,898 epoch 5 - iter 36/40 - loss 0.77771590 - time (sec): 9.69 - samples/sec: 1868.88 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:44,674 epoch 5 - iter 40/40 - loss 0.77460620 - time (sec): 10.47 - samples/sec: 1861.30 - lr: 0.100000 - momentum: 0.000000
2024-05-19 22:45:44,674 ----------------------------------------------------------------------------------------------------
2024-05-19 22:45:44,674 EPOCH 5 done: loss 0.7746 - lr: 0.100000
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11.58it/s]
2024-05-19 22:45:45,030 DEV : loss 0.5352615118026733 - f1-score (micro avg)  0.8308
2024-05-19 22:45:45,034  - 0 epochs without improvement
2024-05-19 22:45:45,034 saving best model
2024-05-19 22:45:47,636 ----------------------------------------------------------------------------------------------------
2024-05-19 22:45:47,636 Loading model from best epoch ...
2024-05-19 22:45:53,663
Results:
- F-score (micro) 0.8416
- F-score (macro) 0.6953
- Accuracy 0.8416

By class:
              precision    recall  f1-score   support

        NOUN     0.7795    0.8407    0.8090       408
       PUNCT     0.9834    0.9944    0.9889       357
        VERB     0.7199    0.7893    0.7530       280
        PRON     0.9504    0.9312    0.9407       247
       PROPN     0.8519    0.7897    0.8196       233
         DET     0.9559    0.9701    0.9630       201
         AUX     0.8549    0.9538    0.9016       173
         ADP     0.9573    0.8441    0.8971       186
         ADJ     0.6899    0.5000    0.5798       178
         ADV     0.5267    0.6930    0.5985       114
       CCONJ     1.0000    0.9412    0.9697        85
        PART     0.8143    0.9344    0.8702        61
         NUM     0.9302    0.8511    0.8889        47
       SCONJ     0.6944    0.5556    0.6173        45
           X     0.0000    0.0000    0.0000         9
         SYM     0.5000    0.1429    0.2222         7
        INTJ     0.0000    0.0000    0.0000         8

    accuracy                         0.8416      2639
   macro avg     0.7182    0.6901    0.6953      2639
weighted avg     0.8405    0.8416    0.8381      2639

2024-05-19 22:45:53,663 ----------------------------------------------------------------------------------------------------
2024-05-19 22:45:53,665 : INFO : model trained
2024-05-19 22:45:53,665 : INFO : evaluating model via load_state_dict...
2024-05-19 22:45:54,676 : INFO : Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'nl-wiki'
      (embedding): Embedding(871022, 300)
    )
    (list_embedding_1): CharacterEmbeddings(
      (char_embedding): Embedding(275, 25)
      (char_rnn): LSTM(25, 25, bidirectional=True)
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.1, inplace=False)
        (encoder): Embedding(7632, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_3): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.1, inplace=False)
        (encoder): Embedding(7632, 100)
        (rnn): LSTM(100, 2048)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4446, out_features=4446, bias=True)
  (rnn): LSTM(4446, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=19, bias=True)
  (loss_function): ViterbiLoss()
  (crf): CRF()
)"
2024-05-19 22:46:00,338 : INFO : detailed results:
2024-05-19 22:46:00,338 : INFO :
Results:
- F-score (micro) 0.8416
- F-score (macro) 0.6953
- Accuracy 0.8416

By class:
              precision    recall  f1-score   support

        NOUN     0.7795    0.8407    0.8090       408
       PUNCT     0.9834    0.9944    0.9889       357
        VERB     0.7199    0.7893    0.7530       280
        PRON     0.9504    0.9312    0.9407       247
       PROPN     0.8519    0.7897    0.8196       233
         DET     0.9559    0.9701    0.9630       201
         AUX     0.8549    0.9538    0.9016       173
         ADP     0.9573    0.8441    0.8971       186
         ADJ     0.6899    0.5000    0.5798       178
         ADV     0.5267    0.6930    0.5985       114
       CCONJ     1.0000    0.9412    0.9697        85
        PART     0.8143    0.9344    0.8702        61
         NUM     0.9302    0.8511    0.8889        47
       SCONJ     0.6944    0.5556    0.6173        45
           X     0.0000    0.0000    0.0000         9
         SYM     0.5000    0.1429    0.2222         7
        INTJ     0.0000    0.0000    0.0000         8

    accuracy                         0.8416      2639
   macro avg     0.7182    0.6901    0.6953      2639
weighted avg     0.8405    0.8416    0.8381      2639

2024-05-19 22:46:00,338 : INFO : evaluating model via load...
2024-05-19 22:46:06,972 : INFO : detailed results:
2024-05-19 22:46:06,972 : INFO :
Results:
- F-score (micro) 0.551
- F-score (macro) 0.4142
- Accuracy 0.551

By class:
              precision    recall  f1-score   support

        NOUN     0.5299    0.5858    0.5565       408
       PUNCT     0.8939    0.6134    0.7276       357
        VERB     0.4667    0.4750    0.4708       280
        PRON     0.6534    0.6640    0.6586       247
         DET     0.6263    0.8756    0.7303       201
         ADJ     0.3226    0.5056    0.3939       178
         ADP     0.5973    0.7258    0.6553       186
         AUX     0.5749    0.6879    0.6263       173
         ADV     0.1569    0.2807    0.2013       114
       PROPN     0.7719    0.1888    0.3034       233
       CCONJ     0.9000    0.8471    0.8727        85
         NUM     0.6000    0.4468    0.5122        47
        PART     0.6000    0.0492    0.0909        61
       SCONJ     0.5385    0.1556    0.2414        45
        INTJ     0.0000    0.0000    0.0000         8
           X     0.0000    0.0000    0.0000         9
         SYM     0.0000    0.0000    0.0000         7

    accuracy                         0.5510      2639
   macro avg     0.4843    0.4177    0.4142      2639
weighted avg     0.6004    0.5510    0.5444      2639

@helpmefindaname
Copy link
Collaborator

Hi @petermartens1992
I think #3477 should fix this issue, can you please check this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants