Analysing the Data I

In the past month I have been going through the data using different strategies. Some of the strategies consist on detailed analyses of the notations while others try to derive broad-view conclusions.

Example of a detailed analysis:

By counting the number of diagrams/participants that presented a few related characteristics, it is possible to derive results like the following:

  • In at least one of the queries with a WHERE COLUMN_NAME OPERATOR VALUE filter:
    • 15 participants represented the SQL equal operator as =.
    • 16 participants omitted the equal operator (i.e. only the column name and the value were used).
    • 6 participants represented the SQL equal operator with symbols other than equal (i.e. ≡, ==, :, ||).
    • 5 participants represented the SQL equal operator with a word (i.e. is or IS).
    • 20 participants represented the SQL greater than operator as >
    • 2 participants represented the SQL greater than operator as < (in both cases it was an error).
    • 5 participants represented the SQL equal operator with a word (i.e. greater than,  GREATER THAN, is greater than, over, or older than).
  • In the first query of the experiment (where both the equal and the greater than operators are included in the WHERE clause):
    • 8 participants chose equal as the default operator (i.e. equal omitted and greater than present).
    • 21 participants explicitly represented both the equal and the greater than SQL operators.
  • From the above counts, we can conclude that, even if 53% of the participants omitted the equal operator, we can not consider it as a default comparison operator. For queries that only had equal as comparison operator, both its omission and its representation as = are common, but when used with another comparison operator the equal operator was explicitly represented by most of the participants.

Example of a broad-view analysis:

To analyse the general structure of the diagrams it is useful to have all of them visible at the same time. For example the diagrams representing the first query of the experiment are arranged by similarity in the following picture:

Notation
SQL equal operator represented as =
SQL equal operator represented as another symbol (≡, ==, :, װ)
SQL equal operator represented with a word (is, IS)
Omitted SQL equal operator
SQL greater than operator represented as >
SQL greater than operator represented as < (error)
SQL greater than operator represented with words (greater than, GREATER THAN, is greater than, over)
SQL equal operator represented as older than (data specific)
SQL equal operator as default comparison operator (> present and = omitted in the same diagram)

Since the picture does not have enough resolution let’s use an equivalent diagram.

In that diagram a red circle is used to mark which participants represented the WHERE COLUMN_NAME OPERATOR VALUE filters with a table like the following:

The diagram also shows the distribution of participants that in the first query represented the structure of the database with a table (blue), those that represented it with a list of field names inside a box (pink), and those that did not represent it (green).

Among the diagrams collected in the experiment there is not a single pair that uses the same notation. This confirms how diverse are the adhoc notations used by programmers. However, it is possible to identify groups of around 10 participants that used similar notations.

Thoughts?

Anuncio publicitario

First preliminary results

Today I presented the first preliminary results of my research at the last meeting of the Scientific Writing Course. You can download the slides from here, or see them online at SlideShare.

Random draw!

Thanks a lot to all the people who participated in the study, or helped to publicize it. The collection of data for the first stage is finished 🙂

Enjoy the video!

This was an amazing experience for me, and I hope you enjoyed it. Right now I am preparing the first slides with preliminary results, which I will be presenting at the Scientific Writing Course on July 6th. The slides will be posted here in the blog in two weeks. My task for what is left of June and July is to analyze all the data you have provided, so we can determine how programmers represent database queries using diagrams.

Thanks to everyone 😀

Précis: Comparative ease of use of a diagrammatic vs. an iconic query language

A. N. Badre, T. Catarci, A. Massari, and G. Santucci: Comparative ease of use of a diagrammatic vs. an iconic query language. In Interfaces to Databases. Electronic Series Workshop in Computing, Springer, pages 1-14 (1996).

Few experiments in the database field have validated the influence of visual query systems with respect to accuracy and time scores. The authors carried out a study to compare QBD* and QBI, specifically for the query writing task.

Two groups of sixteen participants were formed based on the results of a background questionnaire. Each group attended to a short training section and then utilized one of the visual query systems to represent six queries. The ANOVA test reported significant differences in the time scores from the following sets of data: all the participants; the participants familiar with databases; and the queries with cycles or at least four entities. The participants who used QBD* spent less time, except when the query contained cycles. Comments from the participants pointed out that the use of AND as a default operator was unclear in QBD*.

The authors concluded that, when more than three entities were involved, QBI performance was affected because the query was not constructed in steps. The effects of cycles in the query for participants working with QBD* was determined to be due to the representation of correspondences between attributes and entities. When an entity occurred multiple times a number was added to the corresponding attribute names. The authors conjecture that the results favor the use of interfaces that offer multiple notations and interaction mechanisms. However, it is not clear how the user will react to a hybrid system because each participant used only one of the visual query systems.

Précis: Visual Query Systems — A taxonomy

As I mentioned last week, Tiziana Catarci, Maria F. Costabile, Stefano Levialdi and Carlo Batini did a lot of work on the area of visual query systems. They were mainly interested in the task of writing queries using visual systems, but their research has a lot in common with my thesis project. Though I cannot write a précis for each of the papers, I will mention some details before including today’s précis.

Let’s start with the paper that provided the big picture about their research: What happened when database researchers met usability? wrote by Tiziana Catarci in 2000. It is interesting to note that  at the beginning of their Ph.D. they intended to use entity-relationship diagrams as a database query interface which was the origin behind Query by Diagram (QBD or QBD*). This brought me back to the initials ideas about my thesis project. At that point the authors focused on the kinds of queries that the system will be able to express, but later on they conducted empirical studies to compare users’ performance while writing queries in SQL, QBD* and QBI. Next week’s précis — which will be the last one mandatory for the scientific writing course — will cover the comparison between QBD and QBI (a diagrammatic vs. an iconic system).

Today I will be looking at a taxonomy published by these authors in 1992. I am planning to finish reading a longer paper published by them in 1997 that also covers the classification of visual query systems. It is important to note that not only the papers are relevant to my research, but also the references provided in them. Without any more introduction, here is today’s précis:

Batini, C., Catarci, T., Costabile, M. F., and Levialdi, S.: Visual Query Systems: A Taxonomy. In Proceedings of the IFIP TC2/WG 2.6 Second Working Conference on Visual Database Systems II. E. Knuth and L. M. Wegner, Eds. IFIP Transactions, vol. A-7. North-Holland Publishing Co., Amsterdam, The Netherlands, pages 153-168 (1992).

Query systems make possible the representation of data models and requests. There is a clear division between query systems which use programming-like languages and those that utilize visual representations. The authors propose a taxonomy of visual query systems that will serve to analyze the influence of its features on HCI. The taxonomy is based on the operators available in the query language, notations, and classes of users.

According to their notation, visual query systems are classified as tabular, diagrammatic, iconic, or hybrid. Tabular representations are used by QBE, ESCHER, R2 and EMBS to display queries in 2D. The diagrammatic approach usually expresses the database schema through geometrical figures and connections, and the queries are represented by the selection of relevant elements and connections. The authors present QBD* as an example of system which uses this kind of diagram. The use of icons in query systems is illustrated with the description of ICONICBROWSER. A relevant aspect of these iconic systems is that the data model is not explicit. The authors also describe SICON, which is a hybrid system combining both diagrams and icons. Unfortunately, the figures with examples from these languages are not visible in the digital version of the paper.

Categories related to the availability of query operators and the classes of users are also discussed. However, the relationship between these categories and the examples of existing query systems is not analyzed in detail.

Précis: Why a Diagram is (Sometimes) …

Next week I will be posting the précis of one paper from the field of Visual Query Systems. I found that the work done by Tiziana Catarci, Maria F. Costabile, Stefano Levialdi and Carlo Batini is closely related to my research project. I will be able to compare: the notations used by the visual query systems they classified, with the notations used by the participants of my study.

Following is this week précis. This paper has been cited by many authors in the area of visual representations.

Jill H. Larkin, Herbert A. Simon: Why a Diagram is (Sometimes) Worth Ten Thousand Words. Cognitive Science, Vol. 11, No. 1, pages 65-100 (1987).

Diagrams are used to assist in the solution of problems in physics and engineering. The authors compare sentential and diagrammatic representations. Their main objective is to analyze the computational efficiency of informationally equivalent representations in terms of search, recognition and inference cost.

The authors define a sentential representation as a sequence of expressions; in contrast, the elements of a diagrammatic representation are located in a plane in which the concept of adjacency is richer. To illustrate their analysis, they use two examples, one from physics and the other from geometry. They modify the problem definitions starting with natural language versions, followed by the sentential representations and then the diagrammatic representations.

The main conclusions clearly point to the benefits of diagrammatic representations in the recognition and search processes, emphasizing their ability to reduce the use of identifying labels. No differences were found in the inference process. Unfortunately, detailed analyses were conducted mainly for problems with considerable spatial information; other kinds of problems were described only briefly.

A major contribution of this paper is the analysis framework. The use of data structures and inferential rules made possible a detailed analysis of efficiency similar to those applied to determine computational complexity of algorithms. However it was necessary to use a simplified model of the focus transitions between parts of the representations.

Starting to Analyze the Data

At the moment I have data from 20 participants (thanks a lot to those that gave me a bit of their time). I can not describe the diagrams yet to avoid influencing the next 10 participants. However, I can start talking about the first step of data analysis. During the next weeks, I will be extracting short descriptions (labels) that characterize the diagrams. To separate casual aspects of the notations from significant regularities, I will go through the data several times until I have created a table like the following:

Characteristic Number of Participants Number of Queries Number of Diagrams Number of First Attempts Number of Second Attempts
A
B
C
Z

I am eager to see the final version of this table 🙂

Précis: Usability Analysis of Visual Programming Environments …

Before changing topic, here is a paper that goes a more into cognitive dimensions.

T. R. G. Green and M. Petre: Usability Analysis of Visual Programming Environments: a ‘cognitive dimensions’ framework. Journal of Visual Languages and Computing, Vol. 7, No. 2, pages 131-174 (1996).

The complexity of programming environments is not handled well by HCI techniques, which take into account low-level details. In an attempt to illustrate cognitive dimensions approach, the authors perform an exhaustive analysis of the visual programming languages LabView and Prograph. These are also compared with Basic to emphasize differences with respect to text-based languages.

The authors characterize LabView, Prograph and Basic in terms of thirteen cognitive dimensions. The way information is presented is described by the following dimensions: closeness of mapping, abstraction gradient, role-expressiveness, secondary notation, hidden dependencies and diffuseness. Other dimensions related more to the effort required from the user include consistency, visibility, error-proneness, hard mental operations, premature commitment, progressive evaluation, and viscosity. In general the dimensions are utilized to organize the analysis but this does not diminish the need for references or experimental data.

The interrelations between different dimensions are explored, specifically, the influence of the level of abstraction on resistance to local changes; the distance between the notation and the problem domain; the complexity of mental operations; and the visibility of code portions and dependencies. However, future work is necessary to define the relations between dimensions precisely and to decrease the degree of intersection between them.

The authors conclude that cognitive dimensions should be used in combination with GOMS and programming walking analyses. These two other approaches will make it possible to evaluate low-level details of interaction and to assess how much knowledge is required from users.

Précis: Cognitive dimensions of notations

This week précis is not related to how developers use diagrams but to possible methods to evaluate notations according to their usability.

Green, T. R. G.: Cognitive dimensions of notations. In Proceedings of the Fifth Conference of the British Computer Society, Human-Computer interaction Specialist Group on People and Computers V (Univ. of Nottingham), pages 443 – 460. A. Sutcliffe and L. Macaulay, Eds. Cambridge University Press (1989).

Usability studies have been described in a significant number of papers, but the need for general methodologies persists. The author proposes the concept of cognitive dimensions as a partially conceived tool for evaluating in the context of programming languages how well a notation can assist its users.

Cognitive dimensions are defined as attributes that depict the structure in which information is presented. The principal examples of dimensions discussed in this paper are related to the visibility of dependencies, the resistance to local changes, the risk of premature commitments, the differentiation between roles, and the inability to avoid complex mental operations. However, this set of dimensions is not complete and should be refined taking into account the iterative and unpredictable order that governs the designing process.

Based on the characterization of object oriented languages through cognitive dimensions, the author concludes that these dimensions can be utilized to argue about practical problems. Moreover, he emphasizes that the environment supporting the notation must be taken into account.

This is an introductory paper that illustrates with examples the implications of particular features in programming language notations. Though the methodology behind cognitive dimensions is not clearly defined, the vocabulary presented is still in use.

Précis: Let’s Go to the Whiteboard …

The following is the first précis I had to write on the Scientific Writing course that I am taking this summer:

Cherubini, M., Venolia, G., DeLine, R., and Ko, A. J.: Let’s go to the whiteboard: how and why software developers use drawings. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 557-566, published by ACM. (2007)

Regardless of existing knowledge about the use of diagrams in other fields, their application in software development has not been explored. The authors carried out a study to determine the reasons which lead developers to draw diagrams, and to reveal details about the context in which the diagrams are produced. This study was conducted at Microsoft Corporation and included nine interviews with developers who use diagrams often. To validate the feedback, the authors developed a survey and administered it to 427 employees of this corporation.

During the interviews, eight software development tasks were identified as those in which diagrams are typically utilized: the comprehension of existing code; the design previous to the implementation of a feature or a bug fix; discussions about complex changes; spontaneous meetings among developers; the training of novices; interactions with customers; discussions with other interested parties; and the creation of documentation.

Based on an analysis of quantitative data from the survey, the authors concluded that there was infrequent use of standard notation, computerized drawing tools, and reverse engineering tools. For the eight software tasks, whiteboards were the most common means for producing diagrams. However, among all the activities, the use of reverse engineering tools was more frequent in the creation of documentation, analysis of complex designs, training of novices, and the study of existing code.

Unfortunately, other results derived from the interviews will need to be validated because the number of participants was small.

See: