An instrument for reviewing the completeness of experimental plans for controlled experiments using human subjects in software engineering

FONSECA, Liliane Sheyla da Silva

An instrument for reviewing the completeness of experimental plans for controlled experiments using human subjects in software engineering

Detalhes bibliográficos
Ano de defesa:	2016
Autor(a) principal:	FONSECA, Liliane Sheyla da Silva
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Universidade Federal de Pernambuco UFPE Brasil Programa de Pos Graduacao em Ciencia da Computacao
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Engenharia de software Engenharia de software experimental Experimentos controlados Fatores humanos
Link de acesso:	https://repositorio.ufpe.br/handle/123456789/22421
Resumo:	It is common sense in software engineering that well made experimental plans are recipes for successful experiments, and they help experimenters to avoid interferences during experiments. Although a number of tools are available to help researchers with writing experiments reports for scientific publications, few studies focus on how to assess study protocols with respect to completeness and scientific quality. As a result, designing controlled experiments using subjects has been a challenge for many experimenters in software engineering because of a large variety of factors that should be present in it to avoid introducing bias in controlled experiments. The main aim of this thesis is to define an instrument to help experimenters, specially beginners, to review their experimental planning for assessing whether they produced an experimental plan that is complete and includes all possible factors to minimize bias and issues. The instrument is a checklist whose design is based on experimental best practices and experts’ experience in planning and conducting controlled experiments using subjects. To collect the best practices, a systematic mapping study was conducted to identify support mechanisms (processes, tools, guidelines, among others.) used to plan and conduct empirical studies in the empirical software engineering community, and an informal literature review was carried out in order to find which support mechanisms are generally used in other fields. Moreover, we performed a qualitative study for understanding how empirical software engineering experts plan their experiments. The instrument has been evaluated through four empirical studies. Each one was explored from different perspectives by Software Engineering researchers at different levels of experience. The instrument was assessed regarding items that they find useful, inter-rater agreement, inter-rater reliability, and criterion validity using fully crossed design. Two controlled experiments were performed to assess if the usage of the instrument can reduce the chance of forgetting to include something important during the experiment planning phase compared to the usage of ad hoc practices. Additionally, the acceptance of the instrument was assessed by the four studies. In total, we had 35 participants who participated in four different kinds of assesment of the instrument. In the first study, 75.76% of the items were judged useful by two experts. The remaining items were discussed and adjusted. The second study revealed that the usage of the instrument helped beginners to assess experimental plans in the same way as the experts. We found a strong correlation between the overall completeness scores of the experimental plans and the recommendation that the experiment should proceed or not, and whether it is likely to be successful. In Studies 3 and 4, the proportion of the correct items found by participants using the instrument was greater than the results from participants using the ad hoc practices. The instrument has high acceptance from participants. Although the results are positive, performing more assessments including different settings is required to generalize these results. The usage of the instrument by experimenters, specially beginners, helps them to review the key factors included in the experimental plan, thus contributing to reduce potential confounding factors in the experiment. Revising an experimental plan is not a direct evaluation of the quality of the experiment itself but it allows changes to be made to improve the experiment before it is performed.

An instrument for reviewing the completeness of experimental plans for controlled experiments using human subjects in software engineering

Registros relacionados