Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:create:functions:statistic [21.01.2016 13:54] – [Mode: Value that Occurs Most Frequently] hesslabor | en:create:functions:statistic [29.01.2025 08:51] (current) – admin |
---|
====== statistic() ====== | ====== statistic() ====== |
| |
''mixed **statistic**(string //statistic//, array|string //variables//, mixed //option//, [boolean //alldata//])'' | ''mixed **statistic**(string //statistics//, array|string //variables//, mixed //option//, [boolean //AllData//])'' |
| |
The function statistic() can determine specific univariate data from the data record (across all previous questionnaires). | The statistic() function can be used to obtain univariate characteristic values from the data set (across all previous interviews). |
| |
| * //Statistic//\\ Which statistic should be determined? |
| * ''%%'count'%%'' -- Count the frequency of the value given as ''//option//''. |
| * ''%%'percent'%%'' -- Percentage of the value specified as ''//Option//''. |
| * ''%%'frequencies'%%'' -- frequencies for all response codes in the dataset (as an array). |
| * '''crosscount''' -- Count the frequency of co-occurrence of two values in two variables. The two variables are to be specified as an array (or separated by a comma), as are their values specified as ''//option//''. |
| * ''%%'mode'%%'' -- most frequently occurring value. |
| * ''%%'min'%%'' -- Smallest value. |
| * ''%%'max'%%'' -- Largest value. |
| * ''%%'mean'%%'' -- Arithmetic mean of the values. |
| * ''%%'sd'%%'' -- Standardabweichung der Werte (mit Bessel-Korrektur). |
| * ''%%'groupmean'%%'' -- Arithmetic mean of the values of a subgroup defined by ''//Option//'', specified as a string consisting of variable name and code for the cases to be counted '''AB01=2'''. |
| * ''%%'filter'%%'' -- Specifies which cases to use in further calls to the function ''statistic()'' (see [[#teildatensaetze_auswerten|bottom]] for details). |
| |
* //statistic//\\ Which statistic should be calculated? | * //Variables//\\ Specifies for which variable(s) the statistic is to be calculated. The identifiers of the individual variables can be found in the **Variables overview**. If the statistic requires several variables, these can be specified either as a comma-separated string or as an array. |
* '''count''' -- counts the frequency of the value specified as ''//option//''. | * //Option//\\ Some statistics require or allow a third specification, which is given with this parameter (see below). |
* '''percent''' -- percentage of the value specified as ''//option//''. | * //AllData//\\ This optional specification determines that not only the completed interviews but all interviews are included in the statistics. |
* '''crosscount''' -- counts the frequency of the joint occurrence of two values in two variables. The two variables should be specified as an array (or separated with a comma), as well as their values that are specified as ''//option//''. | |
* '''mode''' -- most commonly occurring value. | |
* '''min''' -- lowest value. | |
* '''max''' -- highest value. | |
* '''mean''' -- arithmetic mean of the values. | |
* //variables//\\ Determines which variable(s) the statistic should be calculated for. The IDs of the individual variables can be found in the **Variables Overview**. If the statistic requires multiple variables, these can be given as a comma-separated string or as an array. | |
* //option//\\ Some statistics call for or allow a third entry which is set with this parameter (see below). | |
* //alldata//\\ This entry is optional and determines that all questionnaires be entered into the statistics; not just those that have been completed. | |
| |
**Note:** If ''true'' is not explicitly specified for the parameter //alldata//, only completed questionnaires are included when calculating the statistical values. | ===== Notes ===== |
| |
**Note:** Test data collected during the developing of the questionnaire and pretesting is only included if the current questionnaire is a part of the test as well. If the questionnaire is being carried out as part of the regular data collection, ''statistic()'' only counts data from the regular data collection. | **Important:** Only completed interviews are included in the calculation of statistical values if ''true'' is not explicitly specified for the parameter //allData//. |
| |
| **Important:** Test data from questionnaire development and pretest are only counted if the current interview is also part of the test. If the interview is conducted as part of the regular data collection, ''statistic()'' only counts data from the regular data collection. |
| |
===== Frequency Count ===== | **Note:** The data from the current interview is not taken into account by ''statistic()''. |
| |
When counting the frequency (''count''), a third argument can be specified: which value the frequency should be determined for. If a third value is not given, the number of valid responses is output. Missing data is not counted. | **Note:** The use of ''statistic()'' may be inefficient. If the questionnaire has to search the whole dataset several times in several ''statistic()'' calls, a warning will be displayed first. If there are more than 10 computationally intensive calls, ''statistic()'' will no longer return results. Use ''statistic('load', ...)'' to load the data in advance to avoid this problem. |
| |
For example, in the questionnaire there is a question where the respondent selects their gender (1=female, 2=male, -9=no input). The number of women who entered the third value ''1'' can be determined like so: | **Tip:** The function ''statistic()'' can be used to close the questionnaire after reaching a predefined quota ([[:en:survey:quota]]) and either display a message to further participants or redirect them to the quota stop link of a panel provider. |
| |
| **Tip:** If you do not want to count all completed interviews (e.g. if dropouts were redirected to another page using ''[[:en:create:functions:redirect]]''), it makes sense to copy the variable to be counted further back in the questionnaire into an [[:en:create:questions:internal]]. |
| |
| |
| ===== Frequency Count I ===== |
| |
| As a third argument in a frequency count ('''count'''), you can specify for which value you want to determine the frequency. If you do not specify a third value, the number of valid answers is output. Missing data are not counted. |
| |
| For example, if you have a selection for the gender (1=female, 2=male, -9=not specified), you can determine the number of women by specifying the third value ''1'': |
| |
<code php> | <code php> |
$numberwomen = statistic('count', 'SD01', 1); // frequency of women (1) | $countWomen = statistic('count', 'SD01', 1); // frequency women (1) |
$numbermen = statistic('count', 'SD01', 2); // frequency of men (2) | $countMen = statistic('count', 'SD01', 2); // frequency men (2) |
$numbercompleted = statistic('count', 'SD01'); // number of valid data | $countDone = statistic('count', 'SD01'); // number of valid dates |
$numberall = statistic('count', 'SD01', false, true); // all data records | $countAll = statistic('count', 'SD01', false, true); // All records |
html(' | html(' |
<p>So far,'.$numberall.' people | <p>So far in this survey '.$countAll.' Persons |
specified their gender in this survey, but the questionnaire was | have provided information about their gender, but the |
only completed in '.$numbercompleted.' cases.</p> | interview was completed only in '.$countDone.' Cases.</p> |
<p>The questionnaires completed are made up of '. | <p>The completed interviews include '. |
$numberwomen.' women and '. | $countWomen.' Women and '. |
$numbermen.' men.</p> | $countMen.' men.</p> |
'); | '); |
question('SD01'); // question about the respondent's gender | question('SD01'); // Question about one's own gender |
</code> | </code> |
| |
| |
===== Multivariate Frequency ===== | ===== Frequency Count II ===== |
| |
The '''crosscount''' statistic counts the cases (like in cross-tabulations) in which multiple variables apply. | The ''%%'frequencies'%%'' statistic returns all possible values with one call. |
| |
Instead of a single variable, two or more variables are specified as an array or separated with a comma ('',''). The values being counted for each variable are specified as the third parameter //option//. Only cases which have specified the first value for the first variable, the second value for the second variable and so on are counted. | **Note:** Note that the array only contains entries for the response codes whose responses are present at least once in the data set. Therefore, check whether the array key is present. This is possible, for example, with the ''??'' operator. |
| |
<code php> | <code php> |
$nYoungFemale = statistic('crosscount', 'SD01,SD02', '2,1'); // variables and values in a list with commas ... | $freq = statistic('frequencies', 'SD01'); // frequencies |
$nGrownFemale = statistic('crosscount', array('SD01','SD02'), array(2,2)); // ... or in arrays | $numberWomen = ($freq[1] ?? 0); |
| $numberMen = ($freq[2] ?? 0); |
html(' | html(' |
<p>So far, '.$nYoungFemale.' people have stated in this survey | <p>The completed interviews include '. |
that they are female and in age group 1 (up to 18 years old). | $numberWomen.' Women and '. |
'.$nGrownFemale.' women stated they were older than 19 years old.</p> | $numberMen.' men.</p> |
'); | '); |
question('SD01'); // question about the respondent's gender | question('SD01'); // Question about one's own gender |
question('SD02'); // question about the respondent's age | |
</code> | </code> |
| |
| |
===== Valid Percent ===== | ===== Multivariate frequency ===== |
| |
The output is the percentage of a value within all valid data. The value to be counted must be given as the third argument. | With the '''crosscount''' statistic, one can count (as in a crosstab) the cases where several variables apply. |
| |
| Instead of a single variable, specify 2 or more variables as an array or separated by a comma ('',''). The third parameter ''//option//'' is used to specify which values are counted for each variable. Only cases are counted that have specified the first value for the first variable, the second value for the second variable, and so on. |
| |
<code php> | <code php> |
$numberwomen = statistic('percent', 'SD01', 1); // percentage of women | $nYoungFemale = statistic('crosscount', 'SD01,SD02', '2,1'); // Variables and values as comma list ... |
| $nGrownFemale = statistic('crosscount', ['SD01','SD02'], [2,2]); // ... or as arrays |
html(' | html(' |
<p>So far, '. | <p>So far in this survey '.$nYoungFemale.' People |
$numberwomen.' women have taken part in this survey.</p> | have indicated that they are female and in age group 1 (up to 18 years). |
| '.$nGrownFemale.' Females indicated an age of 19 years or older.</p> |
'); | '); |
question('SD01'); // question about the respondent's gender | question('SD01'); // Question about one's own gender |
| question('SD02'); // Question about one's own age |
</code> | </code> |
| |
| |
===== Mode: Value that Occurs Most Frequently ===== | |
| |
This returns the value that has been selected most frequently so far. If multiple values have been selected equally often then these are returned separated by a comma. | ===== Valid percent ===== |
| |
As a third argument (in this instance a Boolean), it is possible to specify if invalid values (no answer etc.) should also be counted. | The output is the share of a value in all valid entries. The third argument must be the value to be counted. |
| |
<code php> | <code php> |
$mode = statistic('mode', 'AB01_02', true); | $proportionWomen = statistic('percent', 'SD01', 1); // proportion of women. |
$modes = explode(',', $mode); // separate multiple values | html(' |
if (count($modes) > 1) { | <p>So far in this survey '. |
// multiple values stated most frequently | $proportionWomen.' Women participated.</p> |
| '); |
| question('SD01'); // Frage nach dem eigenen Geschlecht |
| </code> |
| |
| |
| ===== Mode, most frequently specified value ===== |
| |
| Returns the value that has been selected most often so far. If multiple values have been selected equally often, then they are returned separated by a comma. |
| |
| As a third argument (in this case of type Boolean) you can specify whether invalid values (no response, etc.) are also counted. |
| |
| <code php> |
| $modus = statistic('mode', 'AB01_02', true); |
| $modi = explode(',', $modus); // Separate multiple values |
| if (count($modi) > 1) { |
| // Several most frequently mentioned values |
html(' | html(' |
<p>Multiple answers were selected equally often.</p> | <p>Several answers were chosen equally often.</p> |
'); | '); |
} else { | } else { |
// answer options text (statistic() only provides the numeric code) | // Texts of the answer options (statistic() returns only the numeric code) |
$text = getValueText('AB01_02', $mode); | $text = getValuetext('AB01_02', $modus); |
html(' | html(' |
<p>The most common answer for this question was: '.$text.'.</p> | <p>The most common answer to this question was: '.$text.'.</p> |
'); | '); |
} | } |
| |
| |
===== Min, Max and Mean of the Valid Data ===== | |
| |
The statistics '''min''', '''mean''' und '''max''' only calculate a correct value if numerical values exist for the question. Data in a text input is ignored if it is not a number -- unless is it is specified that invalid values should also be entered into the statistics (''true'') as the third parameter. | ===== Min, max and mean value of the valid entries ===== |
| |
If no valid values are available, 0 is returned as the '''mean'', and the value ''false'' as the ''min'' and ''max''. | The statistics '''min''', '''mean''' and '''max''' will only calculate a correct value if numeric values are available for the question. In the case of a text entry, entries that are not numbers will be ignored -- unless it is specified as a third parameter (''true'') that invalid values should also be included in the statistics. |
| |
| If there are no valid values so far, 0 is returned for '''mean''', for '''min''' and '''max''' the value ''false'' is returned. |
| |
<code php> | <code php> |
$mean = statistic('mean', 'BB01_03'); | $mean = statistic('mean', 'BB01_03'); |
html(' | html(' |
<p>The participant has given the programme | <p>Participants have given the program an |
an average rating of '.$mean.' so far.</p> | Average rating of '.$mean.'</p>. |
<p>The ratings lie between '.$min.' und '.$max.'.</p> | <p>Ratings range from '.$min.' to '.$max.'</p> |
'); | '); |
</code> | </code> |
| |
| |
| |
| ===== Evaluate partial data sets ===== |
| |
| By means of ''statistic('filter', ...)'' a filter can be set which will be applied to all further calls of ''statistic()''. As a second parameter (optional) //Variables// can be specified for acceleration, which are needed in subsequent calls. |
| |
| The number of cases that match the filter is returned. The fourth parameter //AllData// only affects the return value, but not the further counting. |
| |
| <code php> |
| // Statistics on female participants only (SD02 = 1) |
| // The RT variables are loaded immediately to reduce latencies |
| $n = statistic('filter', ['RT02_01', 'RT02_02', 'RT02_03'], 'SD02==1'); |
| // Mean of ratings (women only) |
| $mean1 = statistic('mean', 'RT02_01'); |
| $mean2 = statistic('mean', 'RT02_02'); |
| $mean3 = statistic('mean', 'RT02_03'); |
| </code> |
| |
| The filter allows common comparison operators (''>'', ''>='', ''<'', ''%%<=%%'', ''!='', ''==''), parentheses and and Boolean operators (''AND'', ''&&'', ''OR'', ''||'', ''NOT'', ''!''). |
| |
| **Note:** Comparisons are only possible between one variable and a constant value (a number or a string) at a time, e.g. ''SD02==2'', comparisons between two variables (''SD03>SD04'') are not supported. |
| |
| <code php> |
| // Statistics only on female participants (SD02 = 1) aged 35 and over (SD03 >= 35). |
| $n = statistic('filter', false, '(SD02==1) AND (SD03 >= 35)'); |
| </code> |
| |
| Besides the variable names, ''QUESTNNR'', ''CASE'' and ''LANGUAGE'' can be used for the filter. |
| |
| <code php> |
| // Statistics only on female participants (SD02 = 1) aged 35 years and older (SD03 >= 35) in the German language version. |
| $n = statistic('filter', false, '(SD02==1) AND (SD03 >= 35) AND (LANGUAGE == "ger")'); |
| </code> |
| |
| For the comparison with texts, they must be enclosed in quotation marks. For example, the following code would consider all cases that have the same reference (REF) as the current interview. |
| |
| <code php> |
| $n = statistic('filter', false, 'REF=="'.reference().'"'); |
| </code> |
| |
| The point combines the ''%%REF=="%%'' with the current reference and a closing quotation mark. If the current interview was started with the reference ABC, the third parameter is calculated as ''%%REF="ABC"%%''. |