Living in the age of smartphones and enterprise chatrooms, most information in companies is not distributed via verbal communication but rather through emails, databases and internal portals. Due to the increasing number of text data being produced, text analysts have used text summarization techniques as a way to understand the contents of a text collection without reading all of it. In this project, I propose the use of text generation neural networks as a way to label topics based not only on the keywords but also using the content of the documents associated with the topic. I use an encoder-decoder architecture where first the encoder receives as input the keywords and a sample of the documents and outputs a context vector that will be an intermediate representation of the “meaning” for the topic.
The model is trained using a mix of datasets including Wikipedia and news datasets, where the goal is to generate the title of the article given the article content and a set of keywords. This model is used in a text analysis system to provide a more concise summary of the collection to the analyst, showing only the labels rather than lists of keywords. The model will considerably improve searching large complex text collections and help analysts to visualize, explore and understand large collections of texts. In addition, it will assist the authors in researching and writing texts by providing background information and suggesting links to relevant websites.