As part of my research, I may have to conduct a survey among Debian contributors. The word “survey” usually elicits frowns because surveys are often misconducted. MJ has taken the time to draft up some advice to surveyors.
Problems with surveys generally fall into one of two categories: content and presentation. I’ll refrain from making statements about content (Wikipedia has some stuff on questionaire construction) and instead concentrate on presentation in the following.
Commonly in the digital age, surveys are administered via a web page or e-mail. In my recent Ph.D. transfer report, I identified a number of shortcomings with these approaches:
Asking Debian contributors to click radio buttons on a web page is a bit like expecting a mountain biking champion to ride a tricycle across a paddock: painful, if not offensive. Furthermore, web surveys can only be taken while on-line, when most of us have better things to do.
E-mail surveys address some of these problems, but create new ones: answers cannot be constrained to a domain (think multiple-choice), character set and formatting issues make evaluation difficult, and it’s impossible to prevent users from attaching comments or modifying responses.
In thinking about the issue, I came up with a third means to administer a survey: a console tool. Think of a Debian package which provides a console application controlled by a study-specific data file. The data file specifies the questions and their answer domains, and the tool presents those to the participant. Since most of Debian happens on the console anyway, such an approach to surveys seems more appropriate.
Interaction with the survey tool would be as easy as pressing
the 2 or 4 keys to select one of the
multiple choices, and the tool would immediately move on to the
next question (and not wait for the user to hit
enter). Obviously, n and p
should allow navigation back and forth across the set, and
c would spawn a text editor to give the user a chance
to attach a comment to his/her current response, in which s/he
might criticise the question or provide additional information.
Finally, the tool should be able to pick up where it left off,
should the user chose to exit/suspend the survey for now.
Integration with debconf or another
interface abstraction is also worth consideration.
There is more to it: people change their minds and should thus
be able to amend responses. With their consent, it might be
valuable to track such changes and inquire about their motivations.
As I was thinking about how to realise this, I suddenly arrived at
version control: use Git as a
backend storage. The set of cool features this would enable seems
to be endless: it works off-line and can be used to track
aforementioned changes, but also offers the possibility to create a
squashed result in case the participant prefers to submit only the
final result. Furthermore, it’s a trivial change between anonymous
submissions, and submissions authenticated by a GPG
signature.
In addition, the survey tool should be able to display questions according to previous responses (control flow). For instance, if the survey determines that a given user is a contributor to the bug tracking system, but not a project member, it wouldn’t make sense to ask when s/he received his/her Debian account. Furthermore, questions could be dynamically creatable from context, so that the survey can drill into depth depending on previous responses, rather than asking the same questions to all participants.
I am currently applying for funding to outsource the development of such a tool. If you are interested in coding it up and getting paid for it, speak to me. Here are some more specifications to keep in mind before jumping on:
- the result must be released under a Free licence.
- the tool should be implemented in Python and use PyGit. Missing Git bindings should be implemented in and contributed to PyGit.
- the logic should be in a reusable module, and the application a thin layer on top of that.
- data files must be able to specify at least multiple-choice, Likert-scale and free-form-answer questions.
- data files should be able to encode conditional flow.
- data files should encode whether submissions can/must/mustn’t be anonymous.
- data files should encode policy whether changes can/must/mustn’t be tracked.
- it would be nice if data files could encode dynamic questions assembled at run-time.
- questions and answers must be translatable using standard
.pofiles. - in addition to the console interface, a debconf interface would be nice.
- every response results in a Git commit object, and commit messages may be automated or queried from the user, depending on context and configuration.
- data files should provide parent/seed SHA-1 hashes such that every participant essentially commits to a branch off the same parent, using e.g. uuencoded bundles.
- submission should take place via
git-pushorgit-send-email, depending on whether a net connection exists or not.
These are likely to be incomplete, but should convey the basic picture. Feedback is always welcome!
NP: Oceansize: Frames
Update: James Andrewartha pointed me to purity, which asks multiple-choice questions on the console. It has the kind of interface which I envision.
Also, Chris Lamb suggested this personality survey as a base line. Well, actually he just suggested I look into it.

