Understanding and tracing semantics of concepts to application domains emerging from source code, documentation, and tests

Zaki Pauzi; Andrea Capiluppi; Cezar Sas

doi:10.22541/au.172527329.98954722/v1

loading page

Understanding and tracing semantics of concepts to application domains emerging from source code, documentation, and tests

Zaki Pauzi,
Andrea Capiluppi,
Cezar Sas

Abstract

As software artifacts continuously evolve and increase in number, the need for automated traceability increases due to the complexity of trace links. Besides tracing components across different artifacts, the need for tracing to application domains is critical to understand the classification of semantics and the coverage (i.e., which application domain is present in each artifact?). In this paper, we propose the notion of using NLP to map concepts emerging from software artifacts to application domains, and tracing these between artifacts. We extracted the corpus keywords from source code, documentation, and tests. We ran an optimised Latent Dirichlet Allocation (LDA) to generate the concepts emerging from each artifact. We then calculated the similarity scores of each concept against each application domain, and ranked the difference of these scores between pairwise artifacts. Results show that the ranking of the inverse of the difference represents the strength of tracing in semantics, and different embeddings show varying results. We observed the strong applicability of our method and its replicability by other researchers and practitioners, particularly in detecting synchronised application domains that are traced between artifacts.

27 Aug 2024Submitted to Journal of Software: Evolution and Process

Show details

Hide details

28 Aug 2024Submission Checks Completed

28 Aug 2024Assigned to Editor

12 Dec 2024Reviewer(s) Assigned

Abstract

Peer review status:UNDER REVIEW