Character Based Language Models Through Variational Sentence and Word Embeddings

NLP, Language Model, 2018

Language models have come of age recently with the introduction of Long-Short-Term-Memory based encoders, decoders and the advent of the attention mechanism. These models however work by generating one word at a time and cannot account for character level similarities and differences. In this project we propose a novel character based hierarchical variational autoencoder framework that can learn the word and sentence embeddings at the same time. We couple this with an attention mechanism over the latent word embeddings to realize the end-to-end autoencoder framework.

Access the GitHub Repo.

Read the Report.