Data Sets

Study Data (replication)

The study data can be used (e.g. for replication purpose) as long as it is not re-distributed and any use cites/acknowledges the original TSE publication:

Walid Maalej, Martin P. Robillard. Patterns of Knowledge in API Reference Documentation. IEEE Transactions on Software Engineering, 39 (9), 2013.

You may share the data with collaborators under your direct supervision provided you can ensure they respect the terms of this simple agreement (non-distribution, credit).

Please note that Java and .Net licenses prohibit us to redistribute the actual documentation texts. Therefore the dataset includes a pointer (URL) to the documentation unit analyzed (in addition metadata and the analysis results).

Your can download the data here (.csv file).

The schema is flat:
id -> a meaningless key
api -> .NET or Java
name -> name of the element
url -> URL of the docs
type -> Class, Method, Field, etc.
coder1 -> ID of one coder
coder2 -> ID of the other coder
fun1 -> whether coder 1 said the doc contains functionality
fun2 -> whether coder 2 said the doc contains functionality
funR -> our resolution in case of disagreement (see paper)
… -> same deal with all remaining knowledge types

Data Model

The following diagram shows the data structure of the API documentation and the included knowledge types.

CAD010_Data-Model_xparent