Abstract
Recent advancement of computational linguistics, machine learning,
including a variety of toolboxes for Natural Language Processing (NLP),
help facilitate analysis of vast electronic corpuses for a multitude of
objectives. Research papers published as electronic text files in
different journals offer windows into trending topics and developments,
and NLP allows us to extract information and insight about these trends.
This project applies Latent Dirichlet Allocation (LDA) Topic Modeling
for bibliometric analyses of all abstracts in selected high-impact
(Impact Factor > 0.9) journals in hydrology. Topic modeling
uses statistical algorithms to extract semantic information from a
collection of texts and has become an emerging quantitative method to
assess substantial textual data. The resulting generated topics are
interpretable based on our prior knowledge of hydrology and related
sub-disciplines. Comparative topic trend, term, and document level
cluster analyses based on different time periods was performed. These
analyses revealed topics such as climate change research gaining
popularity in Hydrology over the last decade. An inter-topic correlation
analysis also revealed the nature of information exchange and absorption
between various communities within the hydrology domain. The primary
objective of this work is to allow researchers to explore new branches
and connections in the Hydrology literature, and to facilitate
comprehensive and inclusive literature reviews. We aim to use these
results combined with probability distribution between topics, journals
and authors to create an ontology that is useful for scientists and
environmental consultants for exploring relevant literature based on
topics and topic relationships.