During the past week I have been filling an ethics form for my research project (that now goes under the name: “Visualizing Execution of Database Queries”). One of the main sections of this form is about the purpose and background of the project and, since it is a clear description of the project and our future plans, I am pasting it here:
Software developers use diagrams to understand existing code, to create or modify designs, and to communicate with each other. During solo explorations or peer-to-peer meetings, developers tend to use informal, ad-hoc notations in which the meanings of diagram elements depend on the context. One of the few exceptions is the field of Data Management, where the use of entity-relationship diagrams to represent the structure of relational databases is very common. Such diagrams usually conform to one of a handful of notational variants derived from the one originally introduced in Chen’s 1976 paper “The Entity-Relationship Model: Toward a Unified View of Data”. These notations are also commonly used by automatic tools that extract and display structural information from databases. We refer to the output diagrams of these tools as reverse engineering visualizations to differentiate them from diagrams created manually using a white board, paper, or computer-based tools.
In contrast, instructors and textbooks in introductory database courses usually explain the behaviour of database queries by showing a series of intermediate tables, each with a few rows of data. As well as showing the logical steps in the execution of the query, such tables also display information about the structure of the part of the database that the query is accessing. This notation is widely believed to be essential in training novices to reason about how queries will behave, and to get them to the point where, like more experienced developers, they only need the query and details about the database’s structure in order to predict a query’s output.
We therefore propose a two-stage study. The first stage will explore how developers with varying levels of experience visualize query execution when no constraints are imposed. The questions we seek to answer are:
* What notations (formal or otherwise) do developers choose to use to visualize the execution of a database query?
* How do notational choices change if the developers are encouraged to use a particular notation, e.g., entity-relationship diagrams or intermediate tables?
* Do the notations change if the developer do not usually represent queries as SQL statements? (It is increasingly common for developers to interact with databases through some sort of abstraction layer in software, such as an object/relational mapping library.)
* Which parts of the initial information necessary to reason about a database query can be easily represented?
* In what order are the different parts of the visualization drawn?
* How comprehensible are one developer’s drawings to another developer? How comprehensible are a developer’s drawings to the same developer after some time (hours or days) has elapsed?
The answers to these questions will be used to drive the design of a simple query visualization tool for use in teaching and application development. The second part of the study will consist of evaluating this tool by having developers use it to answer questions about the execution of queries analogous to those used in the first half of the study.
Any comments will be appreciated (and if anyone wants to participate in the study I will be very happy to schedule a meeting). Suggestions about how to make the study fun for potential participants are very welcome (I am considering free cookies and coffee).