SoSci Survey
Translations of this page:

Additional Variables in the Data Set

There are additional variables before (left) and after (right) of your question's variables. This chapter will shortly describe their meanings.

Note: Some variables must explicitly be enabled before starting the download.

Note: For privacy reasons, recording data from the user's browser (browser, referer, IP address, etc.) needs to be to activated before collecting data.

Interview Identification

  • CASE Unique number for the interview. These numbers are provided in in order of interview beginnings (also see PHP function caseNumber()).
    Note: A new number is provided each time someone retrieves the survey. If the person does not click next or immediately retrieves the survey another time, this results in a void data case. By default such data cases are deleted. Further, numbers are provided when testing the questionnaire. Therefore case numbers may not start with one (1) and may not be consecutive (e.g., 123, 125, 130, 131, 132, …).
    Note:​ Testing the questionnaire during development also creates case numbers. Therefore, the CASE Will usually not start with 1. To ensure unique case numbers within a survey, the counter cannot be reset. Actually, this would not have any advantages for analysis, anyway.
  • SERIAL If the survey was started using a personalized link (authentification code), the participant's code is enlisted here (also see PHP function caseSerial()).
  • REF If the questionnaire link contained a reference (The Questionnaire's URL), the reference's text will be stored there (also see PHP function reference()).
  • QUESTNNR The ID of the questionnaire handled. This ID is set when assembling the questionnaire. A value “del:<number>” means that the questionnaires used to collect the data has been deleted.
  • MODE Tells something about how the interview was started:
    • “interview” means that someone visited the survey URL
    • “pretest” flags cases from the pretest (with question IDs visible and feedback option)
    • “orgtest” flags cases from the pretest using the final layout
    • “admin” means that the survey has been started by the project administrator as preview (Starten)
    • “debug” flags cases started by the project administrator in debug mode (Im Debug-Modus starten)
  • LANGUAGE Language of the interview. This variable is included in multi-language surveys or if the option Download variables which have not been used in the questionnaire was selected. Should the interview language change during the interview, this is the language used at last.
  • STARTED Time when the participant started the interview.

Interview Progress

These variables are placed at the data set's end.

  • LASTDATA Time when the participant most recently clicked the “next” button and, thereby, updated the data case. The interval between STARTED and LASTDATA may deviate from the sum of handling times as it fully comprises the webserver processing times.
  • FINISHED Did the participant reach the goodbye page (1) or not (0).
  • LASTPAGE The page most recently answered (and sent via Next) by the participant. The number is equivalent to the page number in the questionnaire (Questionnaire Assembling).
  • MAXPAGE The greatest page answered by the participant. This number is usually identical to LASTPAGE but wont be reduced (a) if the participant uses the back button (e.g., to check the welcome page for contact details after doing the questionnaire) and (b) if the questionnaire uses backlinks via goToPage().
  • MAXPAGE The last page in the questionnaire that the participant reached (at any time of the interview). This is not necessarily the page that the participant handled at last, especially if the back button has been used.
  • MISSING The percentage of answers omitted by the participant (0 to 100). Only such questions and items are counted that have been shown to the participant – therefore someone dropping out early may have answered all questions (to this page, 0% missing). This variable is useful to identify participants that just viewed the questionnaired.
    • Please note that no click in a checkbox question (multiple selection) is a valid answer. Therefore even void cases may not reach 100%.
    • “Don't know” options are counted as valid answers as well.
    • When using text inputs, an invalid answer is counted, if the respondent types nothign (or spaces, only). Please remember this, when optionally asking for texts (e.g., when the respondent may leave the text field empty instead of writing a zero).
    • When using Free text inputs within a selection (single or multiple choice selection), a option's void text input (e.g., “Other: ___”) is only counted as invalid data, if the appropriate option in the selection was selected.
  • MISSREL Percentage of missing answers weighted by the other participants answering behavior. Questions that are rarely answered (e.g., voluntary text questions) are mostly irrelevant for this value, questions that most participants have answered weight worse. The linear weighting factor for a question/item is the number of answers given to this question/item divided by how often the question/item has been asked.
    Note: This value may vary, depeding on the subset of data retreived.

Dwell Times

The handling times for each page are available if the option Download the time spent per page has been selected in variable selection.

  • TIMEnnn The variables TIME001 etc. store the time (in seconds) that a participant stayed on a questionnaire page. If the participant visits the page multiple times (e.g., using the back button) these times are summed up. Generally, dwell times are rather impreciseas they contain loading times.
  • TIME_SUM The sum of dwell times (in seconds) after correction for breaks. If the participant suspends the interview and returns later, this seems like he or she stayed on the page for a long time (hours or even days). Such times are replaced by the other participants page median. Dwell times are identified as break if
    • they are longer than 2 hours or
    • they exceed the page's dwell time media by more than 3 inter quartile ranges (IQR) divided by 1.34 (equals more than 3 stanrd deviations in a normally distributed sample)

Quality Indicators

Data quality in online surveys is usually quite good. Data cleaning, however, is necessary in mostly every survey. When using the option Variables selectionDownload data quality parameters SoSci Survey provides variables to support data cleaning:

  • DEG_MISS Negative points for missing answers
  • DEG_TIME Negative points for extremely fast completion
  • DEGRADE The sum of negative points DEG_MISS and DEG_TIME (the sum is calculated before rounding and may therefore deviate from the rounded values' sum)

The points system is normed in such way that values of more than 100 points (DEGRADE) indicate low-quality data. Data quality, however, is no dichotomous attribute. Therefore the points are continuously distributed as well, showing a “long tail” characteristic. If you prefer a more strict filtering, a threshold of 75 or even 50 points may as well be useful as a threshold of 200 for more liberal filtering.

There are at least two important sources for low-quality data:

  • People that just want to view the questionnaire
  • Participants that loose motivation after few pages

There are some indicators to identify such cases. First of all, LASTPAGE and FINISHED show if the participant dropped out early. The percentage of missing answers (MISSING or MISSREL) is another indicator for the participant's carefulness and for data cases that stem from “just looking”. The time required to do the survey is an inaccurare indicator for data quality – but it reliably identifies cases where the participants did not even read the questions.

A detailed documentation on the indicators calculation is currently available in German, only: Maluspunkte

Note, that the quality indicators DEG_TIME and DEG_MISS have proven non-optimal during further research (especially their mean DEGRADE). In future, a more elaborate quality indicator, as described in this working paper, shall become available in SoSci Survey: Too Fast, too Straight, too Weird

External Information

The following variables will only be included in the data set if the appropriate option was enabled. Further, these data is recorded only if set so in the Project SettingsPrivacy options.

Include the variables when downloading the data set

  • S_IP IP address of the participant [REMOTE_ADDR]. This may allow inferences on the location, but is completely useless to identify people who did the questionnaire twice.
  • S_LANG The language (e.g., “en” or “de”) as set in the browser [HTTP_ACCEPT_LANGUAGE].
    Note: This is nothing more than a browser setting that does not necessarily indicate the user's true language or residence.
  • S_REFERR Referer -– where did the participant came from [HTTP_REFERER]? Where did he or she find the link to the survey?
  • S_BROWSR The ID sent by the browser [HTTP_USER_AGENT]. Note that the participant could easily manipulate the browser ID.
en/results/variables.txt · Last modified: 11.01.2015 15:39 by admin
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Driven by DokuWiki