Purpose and background of my research project

During the past week I have been filling an ethics form for my research project (that now goes under the name: “Visualizing Execution of Database Queries”). One of the main sections of this form is about the purpose and background of the project and, since it is a clear description of the project and our future plans, I am pasting it here:

Software developers use diagrams to understand existing code, to create or modify designs, and to communicate with each other. During solo explorations or peer-to-peer meetings, developers tend to use informal, ad-hoc notations in which the meanings of diagram elements depend on the context. One of the few exceptions is the field of Data Management, where the use of entity-relationship diagrams to represent the structure of relational databases is very common. Such diagrams usually conform to one of a handful of notational variants derived from the one originally introduced in Chen’s 1976 paper “The Entity-Relationship Model: Toward a Unified View of Data”. These notations are also commonly used by automatic tools that extract and display structural information from databases. We refer to the output diagrams of these tools as reverse engineering visualizations to differentiate them from diagrams created manually using a white board, paper, or computer-based tools.

In contrast, instructors and textbooks in introductory database courses usually explain the behaviour of database queries by showing a series of intermediate tables, each with a few rows of data. As well as showing the logical steps in the execution of the query, such tables also display information about the structure of the part of the database that the query is accessing. This notation is widely believed to be essential in training novices to reason about how queries will behave, and to get them to the point where, like more experienced developers, they only need the query and details about the database’s structure in order to predict a query’s output.

We therefore propose a two-stage study. The first stage will explore how developers with varying levels of experience visualize query execution when no constraints are imposed. The questions we seek to answer are:

* What notations (formal or otherwise) do developers choose to use to visualize the execution of a database query?

* How do notational choices change if the developers are encouraged to use a particular notation, e.g., entity-relationship diagrams or intermediate tables?

* Do the notations change if the developer do not usually represent queries as SQL statements? (It is increasingly common for developers to interact with databases through some sort of abstraction layer in software, such as an object/relational mapping library.)

* Which parts of the initial information necessary to reason about a database query can be easily represented?

* In what order are the different parts of the visualization drawn?

* How comprehensible are one developer’s drawings to another developer? How comprehensible are a developer’s drawings to the same developer after some time (hours or days) has elapsed?

The answers to these questions will be used to drive the design of a simple query visualization tool for use in teaching and application development. The second part of the study will consist of evaluating this tool by having developers use it to answer questions about the execution of queries analogous to those used in the first half of the study.

Any comments will be appreciated (and if anyone wants to participate in the study I will be very happy to schedule a meeting). Suggestions about how to make the study fun for potential participants are very welcome (I am considering free cookies and coffee).

3 responses to this post.

  1. […] tools don’t meet their real needs. In a similar vein, Zuzel Vera Pacheco plans to look at how developers visualize the execution of SQL queries for her Master’s thesis. When you see a query like: SELECT left.name, right.name FROM people […]

    Responder

  2. After talking with graduate students from the Software Engineering group (the group in which I am working at the University of Toronto), and a couple of meetings with my supervisor, this is the last version that was submitted the last Wednesday to the Office of Research Ethics:

    Rationale: “Instructors and textbooks in introductory database courses usually explain the behavior of database queries by showing a series of intermediate tables, each with a few rows of data. As well as showing the logical steps in the execution of the query, such tables also display information about the structure of the part of the database that the query is accessing. This notation is widely believed to be essential in training novices to reason about how queries will behave, and to get them to the point where, like more experienced programmers, they only need the query and details about the database’s structure in order to predict a query’s output.

    In the field of Data Management, entity-relationship diagrams are the standard notation to represent the structure of relational databases. Such diagrams usually conform to one of a handful of notational variants derived from the one originally introduced in Chen’s 1976 paper “The Entity-Relationship Model: Toward a Unified View of Data”. These notations are also commonly used by automatic tools that extract and display structural information from databases. In contrast, there is no standard diagram notation to represent the execution of the database queries that can be used by reverse-engineering tools. 

    Despite the existence of standard notations, the use of ad-hoc notations is common during software development activities. Thus, the study of these ad-hoc notations can be particularly important in the design of diagram notations to be used by reverse-engineering tools.

    We therefore propose a two-stage study. The first stage will explore how programmers, with varying levels of experience, visualize query execution to communicate with each other. The questions we seek to answer are:

    What notations (formal or otherwise) do programmers choose to use to visualize the execution of a database query?
    Do the notations change if the programmer does not usually represent queries as SQL statements? (It is increasingly common for programmers to interact with databases through some sort of abstraction layer in software, such as an object/relational mapping library.)
    In what order are the different parts of the query represented?
    How comprehensible are one programmer’s drawings to another programmer?

    The answers to these questions will be used to drive the design of a simple query visualization tool for use in application development. The second part of the study will consist of evaluating this tool by having programmers use it to answer questions about the execution of queries analogous to those used in the first half of the study.

    Methods: “The data for this study will be gathered in two stages:

    Stage 1 – Participants will be presented with a database query in SQL, or a programming language like Ruby or Python, and be asked to draw a diagram that represents the execution of the query. This will be repeated a number of times, using different queries (see Appendix C). The data collected on this stage will be analyzed in order to find the common characteristics of the notations used, as well as what information is included and the order in which the different parts of the diagrams are drawn. The result of this stage will drive the design of a query visualization tool can be configured to use different notations.

    Stage 2 – Another group of participants will be presented with a diagram produced by the query visualization tool. Based on a short description of the notation used, they will be asked to determine what query is represented by the diagram. This will be repeated a number of times, using diagrams that correspond to different notations and queries (see Appendix D). The results of this stage will be used to evaluate the notations used by the query visualization tool.

    The interaction with the participants in both stages will be done in person and the diagrams will be drawn in a tablet PC that allows us to record the order in which the different parts of the diagrams are drawn. Depending on the availability of a tablet PC, the drawings will be photographed or recorded.”

    Thanks for the help!

    Responder

  3. […] by zuzelvp in M.Sc. Thesis. Leave a Comment Last Wednesday I summited the ethics forms for the two-stages study that I will be conducing between May and December. During the rest of […]

    Responder

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s

A %d blogueros les gusta esto: