In the past month I have been going through the data using different strategies. Some of the strategies consist on detailed analyses of the notations while others try to derive broad-view conclusions.
Example of a detailed analysis:
By counting the number of diagrams/participants that presented a few related characteristics, it is possible to derive results like the following:
- In at least one of the queries with a WHERE COLUMN_NAME OPERATOR VALUE filter:
- 15 participants represented the SQL equal operator as =.
- 16 participants omitted the equal operator (i.e. only the column name and the value were used).
- 6 participants represented the SQL equal operator with symbols other than equal (i.e. ≡, ==, :, ||).
- 5 participants represented the SQL equal operator with a word (i.e. is or IS).
- 20 participants represented the SQL greater than operator as >
- 2 participants represented the SQL greater than operator as < (in both cases it was an error).
- 5 participants represented the SQL equal operator with a word (i.e. greater than, GREATER THAN, is greater than, over, or older than).
- In the first query of the experiment (where both the equal and the greater than operators are included in the WHERE clause):
- 8 participants chose equal as the default operator (i.e. equal omitted and greater than present).
- 21 participants explicitly represented both the equal and the greater than SQL operators.
- From the above counts, we can conclude that, even if 53% of the participants omitted the equal operator, we can not consider it as a default comparison operator. For queries that only had equal as comparison operator, both its omission and its representation as = are common, but when used with another comparison operator the equal operator was explicitly represented by most of the participants.
Example of a broad-view analysis:
To analyse the general structure of the diagrams it is useful to have all of them visible at the same time. For example the diagrams representing the first query of the experiment are arranged by similarity in the following picture:
Notation |
SQL equal operator represented as = |
SQL equal operator represented as another symbol (≡, ==, :, װ) |
SQL equal operator represented with a word (is, IS) |
Omitted SQL equal operator |
SQL greater than operator represented as > |
SQL greater than operator represented as < (error) |
SQL greater than operator represented with words (greater than, GREATER THAN, is greater than, over) |
SQL equal operator represented as older than (data specific) |
SQL equal operator as default comparison operator (> present and = omitted in the same diagram) |
Since the picture does not have enough resolution let’s use an equivalent diagram.
In that diagram a red circle is used to mark which participants represented the WHERE COLUMN_NAME OPERATOR VALUE filters with a table like the following:
The diagram also shows the distribution of participants that in the first query represented the structure of the database with a table (blue), those that represented it with a list of field names inside a box (pink), and those that did not represent it (green).
Among the diagrams collected in the experiment there is not a single pair that uses the same notation. This confirms how diverse are the adhoc notations used by programmers. However, it is possible to identify groups of around 10 participants that used similar notations.
Thoughts?