Natural Language Processing to Assure the Quality of User Stories (Michiel Blommaert)


In Agile Software Engineering, working software is valued over comprehensive documentation ( Currently, practitioners of agile development practices often misinterpret this as ‘no documentation’. Consequently, user stories are almost the only kind of requirements documentation maintained in the application of agile development methods.

A user story describes a feature required to be provided by a software system in terms of a standard template structure. User stories are always formulated from the perspective of the person that desires the feature. A typical template for formulating user stories is:

As a type of user, I want some goal so that some reason

For example, related to requirements for a car navigation system, a user story could be:

As a driver, I want to be notified of traffic jams on my route, so that I avoid losing time.

Past research has shown that the documentation as a separate user story of each system feature desired by some system stakeholder might cause problems when at a later stage the agile development team loses the high-level view of the various system functions and their interaction. The problem gets magnified when the composition of the team changes, e.g., when new members join the team and need to comprehend the high-level architecture of the system under development. Even for moderately complex software, the number of user stories easily exceeds human capacity of overview and understanding. Furthermore, the more user stories are created, the more details are required in the acceptance criteria which are used for testing the system. Acceptance criteria are metrics that define when the system’s functionality is acceptable in response to a user story. If user stories are not integrated and aligned with each other, they create ambiguity, making it harder to decide when a system meets its requirements.

To alleviate the problems mentioned above, the use of conceptual models is suggested. Conceptual models are visual representations that are commonly used for understanding, analyzing and communicating system requirements. In ‘traditional’ (i.e., not agile) software development methods, a conceptual model is a graphical model with well-defined syntax and semantics that describes the domain to be supported by a software system (e.g., the domain of car navigation). This description is independent of the manner in which the software system is implemented. In other words, a conceptual model captures the function of a domain (e.g., how to use car navigation to achieve a purpose) and its behavior (e.g., how car navigation is used during a car trip), as well as its construction (e.g., what components does car navigation consist of and how are they related -car, GPS satellite network, GPS receiver, etc.) and its operation (e.g., GPS satellites broadcast signals about their current time and position), but does not prescribe decisions that should be left to the discretion of system developers (e.g., which track algorithm the GPS receiver runs to calculate the location, speed and direction of the car).

What is currently missing in the state-of-the-art is the automated generation of conceptual models that show both the functional (de)composition of the domain (i.e., function and behavior)) and the constructional (de)composition of the domain (i.e., construction and operation). Further, there is no empirical study that investigates the costs (e.g., effort to understand the conceptual model) and benefits (e.g., increased understanding of dependencies and other relationships between user stories) of using conceptual models generated from user stories. In other words, showing feasibility of automated generation of conceptual models from user stories is one thing; demonstrating the economic value of such approach in terms of increased quality and productivity of agile software development is something else, at least equally important for the valorization of the research.

Thesis description

At UGentMIS we initiated a line of research in which we investigate, together with researchers from Saint Louis University (USA) and a high-tech USA company, whether Natural Language Processing (NLP) techniques could be used to automatically generate conceptual models from user stories. The NLP-based generation of models has proven to be a challenging area, giving rise to many research questions, some of which we wish to address via master dissertation research. The current thesis topic results from this initiative.

This thesis will look into techniques to assure the quality of user stories. The success of NLP heuristics used for the recognition of concepts in user stories that are needed to build conceptual models, strongly depends on the quality of the user stories. In this context, quality refers to how well the user story fits the user story template, i.e., the correctness and completeness of filling the user story template. A starting point is the technique developed by Lucassen et al. (2016) and later formalised in the GRIMM method developed at Utrecht University (contacts will be provided by the thesis supervisor).

The purpose of the thesis is to evaluate how this technique (and possibly other methods that are discovered through literature study) can be applied as a pre-processing step to the US2CM (User Story To Conceptual Model) text-to-model algorithm that we are developing at UGentMIS – SLU. The eventual result of the thesis is a tailoring of the quality assurance technique to the specific text-to-model algorithm that we develop and an assessment of the improvement in terms of model quality that results from the pre-processing (as compared to a situation without pre-processing). We expect results to be publishable and invite the thesis student to co-author any paper resulting from the thesis.


Lucassen, G., Dalpiaz, F., van der Werf, J.M.E., and S. Brinkkemper. 2016. Improving agile requirements: the quality user story framework and tool. Requirements Engineering (21:3), pp. 383-403

By Geert Poels

Geert Poels is head of UGentMIS, which he founded in 2005. He is Professor of Management Information Systems (since October 1, 2012) and member of the professorial staff of the Faculty of Economics and Business Administration, Ghent University (since October 1, 2002). He is member of the University Research Council. He also teaches and directs master dissertation research at IC Institute (Beersel, Belgium). Within UGentMIS he coordinates the Enterprise Modelling research cluster. His personal research and research together with academic scholars, companies, post-doctoral researchers, PhD students and Master students relates to six lines of research: (1) Design of a Business Process Architecture Description Language – with colleagues from UCLM, Spain; (2) Automated modelling of User Stories and Behaviour-Driven Design scenarios – with colleagues from SLU, United States & UGent PhD students Abhimanyu Gupta and Anis Amna; (3) Ontological analysis and design of Value Models and their integration into Enterprise Architecture – with colleagues from The Open University of The Netherlands and UNIBZ, Italy; in collaboration with the Dutch company VDMbee; and with Master students from IC Institute; (4) Redesigning the conceptual model of COBIT for IT Governance – with colleagues from Antwerp Management School; in collaboration with PwC and ISACA & UGent PhD student Dirk Steuperaert; (5) Enterprise Modelling for tactical information systems design, the 'informal enterprise', and capability management – with professors Renata Petrevska and Mijalche Santa from the universities of Bitola/Prilep and Skopje, Macedonia, respectively, and with Master students from IC Institute; (6) Enterprise Modelling for digital innovation – with UGent/VUB post-doctoral researcher Michaël Verdonck, UGent Master students and colleagues from the Vrije Universiteit Amsterdam. Apart from these research lines, Geert Poels supervises UGent PhD research on digital marketplaces (Thomas Derave), technology disruption (Patrick Luyts), cybersecurity (Hossein Abroshan, Steve Ahouanmenou), enterprise systems (Adnan Kraljic, Tarik Kraljic), and GDPR (Georgios Georgiadis, Abdel-Jaouad Aberkane). In the period 2005-2020 he was promoter of 13 completed PhD research projects (11 at UGent and 2 at KU Leuven), while currently he is promoter of 11 ongoing PhD projects. Mid 2020 he has 126 publications listed in Web of Science with an h-index of 15. His Scopus h-index is 19. His Google Scholar h-index is 32 with over 3600 citations recorded.

Leave a comment