Skip to content

Deep Character CNN LSTM Encoder with Classification and Similarity Models

License

Notifications You must be signed in to change notification settings

chiragjn/deep-char-cnn-lstm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Character CNN LSTM Encoder with Classification and Similarity Models

In Keras

Overall Idea:

  • Convolve over character embeddings with different kernel sizes
  • Concat them to get the char-word embedding
  • Pass them through a Dense layer with Residual connection
  • Optionally concat them with separate word embedding
  • Pass sequence of obtained word embeddings through a LSTM encoder
  • Train with a constrastive loss function (see References)

Work in Progress

  • TODO: Add loading utils
  • TODO: Add preprocessing and padding utils
  • TODO: Add batching utils
  • TODO: Add model training code
  • TODO: Add model continue-training code
  • TODO: Test Similarity implementation on Quora similar pair dataset
  • TODO: Test Classification implementation on Kaggle Toxic internet comments dataset
  • TODO: Tune Hyperparameters and try different modifications to architectures
  • TODO: Take Hyperparameters using argparse
  • TODO: Add tensorboard and tfdbg support

Example Usage:

from model import ClassifierModel, SimilarityModel

classifier = ClassifierModel(vocab_size=10000,
                             charset_size=100,
                             num_classes=5,
                             mode=ClassifierModel.MULTILABEL,
                             char_kernel_sizes=(3,),
                             encoder_hidden_units=128,
                             bidirectional=False)
classifier.compile_model()

similarity_model = SimilarityModel(vocab_size=10000,
                                   charset_size=100,
                                   num_negative_samples=1)
similarity_model.compile_model()

References:

Overall Idea

  1. Siamese Recurrent Architectures for Learning Sentence Similarity (2016)

Encoder architecture heavily inspired from

  1. Character-Aware Neural Language Models (2015), Kim et. al.
  2. dpressel/baseline

Loss function taken from

  1. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval (2014)

Other Contrastive Loss functions to try

  1. StarSpace: Embed All The Things! (2017) Wu et. al.
  2. Comparision of loss functions for deep embedding

About

Deep Character CNN LSTM Encoder with Classification and Similarity Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages