In this glossary you will find only a very short description of each concept. If you need a more detailed explanation, please follow one of the external links.
- Analysis of variance
- Research data are often based on samples drawn from a larger population of cases. This is true of most types of questionnaire surveys, among other examples. When, on the basis of the analysed results of such sample surveys, we conclude that the same results are valid for the population from which the sample has been drawn, we are making a generalisation with some degree of uncertainty attached. Analysis of variance is a method that allows us to determine whether differences between the means of groups of cases in a sample are too great to be due to random sampling errors. This can, for instance, help us to determine whether observed differences between the income of men and women (in a sample) are great enough to conclude that these differences are also present in the population from which the sample has been drawn. In other words, the method tells us whether the variable Gender has any impact on the variable Income, or more generally, whether a non-metric variable (which divides the cases into groups) is a factor in deciding the values that the cases have on a separate and metric variable. Analysis of variance looks at the total variance of the metric variable and tries to determine how much of this variance is due to, or can be explained by, the non-metric grouping variable. The method consists of the following building blocks; 1) total variance 2) within-group variance and 3) between-group variance. Total variance is the sum of the squared differences between the data values of each of the units and the mean value of all the units. This total variance can be broken down into within-group variance, which is the sum of the squared differences between the data values of each of the units and the mean of the group to which the unit belongs, and between-group variance, which is the sum of the squared differences between each of the group means and the mean of all the units.
- Asymmetrical distribution
- If you split the distribution in half at its mean, then the distribution of the two sides of this central point would not be the same (i.e., not symmetrical) and the distribution would be considered skewed. In a symmetrical distribution, the two sides of this central point would be the same (i.e., symmetrical).
- A parameter estimate of a regression equation that measures the increase or decrease in the dependent variable for a one-unit difference in the independent variable. In other words, b shows how sensitive the dependent variable is to changes in the independent variable.
- Standardised b shows by how many standard deviations the dependent variable changes when the independent variable increases by 1 standard deviation. The beta coefficient should be used when comparing the relative explanatory power of several independent variables.
- Birth cohort
- A birth cohort is normally defined as consisting of all those who were born in the region or country of interest in a certain calendar year. In this document, however, we redefine a birth cohort so that it consists of all those who were born in a certain calendar year and were living in the country or countries of interest at the time of the second ESS interview round.
- Box-and-whisker plot
- The distribution can also be shown by means of a box-and-whisker plot. This form of presentation is based on a group of statistical measures which are known as median measures. All such measures are based on a sorted distribution, i.e. a distribution in which the cases have been sorted from the lowest to the highest value. The first quartile is the data value of the case where 25 % of the cases have lower values and 75 % of the cases have greater values. The third quartile is the data value of the case where 75 % of the cases have lower values and 25 % of the cases have greater values. The inter-quartile range is the distance from the first to the third quartile, i.e. the variation area of the half of the cases that lies at the centre of the distribution. The box-and-whisker plot consists of a rectangular box divided by one vertical line, with one horizontal line (whisker) extending from either end. The left and right ends of the box mark the first and third quartiles respectively. The dividing line at the centre represents the median. The length of the box is equal to the inter-quartile range and contains the middle half of the cases included in the sorted distribution. The end points of the two extending lines are determined by the data values of the most extreme cases, but do not extend more than one inter-quartile range from either end of the box (i.e. from the first and third quartiles). The maximum length of each of these lines, therefore, is equal to the length of the box itself. If there are no cases this far from either quartile, the lines will be shorter. If the maximum or minimum values of the entire distribution are beyond the end points of the line, a cross will indicate their position.
- By cases we mean the objects about which a data set contains information. If we are working with opinion polls or other forms of interview data, the cases will be the individual respondents. If data have been collected about the municipalities of a county or about the countries of the world, the cases will be the geographical areas, i.e. the municipalities or countries.
- Central tendency
- The central tendency is a number summarising the average value of a set of scores. The mode, the median and the mean are the commonly used central tendency statistics.
- Chi-square distribution
- A family of distributions, each of which has different degrees of freedom, on which the chi-square test statistic is based.
- Chi-square test
- A test of statistical significance based on a comparison of the observed cell frequencies of a joint contingency table with frequencies that would be expected under the null hypothesis of no relationship.
- Compute is used for creating new variables on the basis of variables already present in the data matrix. It is possible to do calculations with existing variables, such as calculating the sums of the values on several variables, percentaging, multiplying variables by a constant, etc. The results of such arithmetic operations are saved as new variables, which can then be used just like the other variables in the data set.
- Conditional mean
- A conditional mean value is the mean value of a variable for a group of respondents whose members have a particular combination of values on other variables, whereas an overall mean is the mean value of a variable for all respondents.
- Confidence interval
- Research data are very often based on samples drawn from a larger population of cases. This applies to most types of questionnaire surveys, for instance. When we assume, on the basis of results from the analysis of such sample surveys, that the same results apply to the population from which the samples are drawn, we are making generalisations with some degree of uncertainty attached. The confidence interval is a method of estimating the uncertainty associated with computing mean values in sample data. We normally use a confidence interval with a significance level of 95 %. This means that there is a 5 % chance of being wrong if we assume that a mean value for a sample lies within the confidence interval.
- Constants in linear functions
Some of you may remember that a line on a plane can be expressed as an equation. Assume first that there are two variables, one that is measured along the plane’s vertical axis and whose values are symbolised by the letter y, and one that is measured along the horizontal axis and whose values are symbolised by the letter x. The function that represents a linear association between these two variable values can be expressed as follows: y = a + b∙x
Here, a and b symbolise constants (fixed numbers). The number b indicates how much the variable value y increases or decreases as x changes. When x increases by 1 unit, y increases by b units. To see this, assume, for instance, that x has the initial value 5. Insert this value into the equation. You get: y = a + b∙5. Then, let the value x increase by one unit to x = 5 + 1 and insert this in the equation instead of the former value. Now the equation can be written y = a + b∙5 + b∙1. Thus, by letting x increase by 1 unit, we have made y increase by b∙1 = b units. This implies that if b is equal to, say, 2, y will increase by 2 units whenever x increases by 1 unit, or if b is negative and equal to, say, -0.5, y will decrease by 0.5 units whenever x increases by 1 unit.Figure A. Graphic presentation of a linear function
The latter case is illustrated in Figure A, where we have drawn a line in accordance with the function y = 7 - 0.5∙x (where we have inserted the randomly chosen value 7 for a).
As explained above, x and y are variables, which means that they can take on a whole range of different values, while 7 and -0.5 are constants, i.e. values that determine the position of the line on the plane and which cannot change without causing the position of the line to change. In figure A, the x-values range between 0 and 10, while the y-values range between 2 and 7. It is to be hoped that you realise that for each value x takes on between 0 and 10, the equation and the corresponding line assigns a unique numerical value to y as shown in the figure. Thus, if x = 0, y takes the value of 7, which is the value we have given to the constant a. Check this out by inserting 0 in place of x in the equation y = 7 - 0.5∙x and then compute the value of the expression on the right-hand side of the equals sign. (You should get y = 7.) This illustrates the important point that the constant a in the equation y = a + b∙x is identical to the value that y takes on when x = 0. From a graphical point of view (see Figure A), a can be interpreted as the distance between two points on the vertical axis, namely the distance between its zero point (y = 0) and the point where this axis and the line given by the equation meet each other. (Assuming that the vertical axis crosses the horizontal axis in the latter’s zero point.) Thus, the constant a is often called the intercept.
Now for the graphical interpretation of the constant b: We repeat the exercise from the numerical example presented above by starting from the point on the line in Figure A where x = 5 (as marked by vertical line k). We then increase the x-value by 1 unit to x = 6 as we move downwards along the line (the new x-value is marked by vertical line l). This change in x makes the corresponding value of y decrease from 4.5 (marked by horizontal line m) to 4 (marked by horizontal line n), i.e. it ‘increases’ by -0.5 units (decreases by 0.5 units).Thus, Figure A confirms what we just saw in our numerical example: b is the change in y that takes place when x increases by 1 unit, provided that the changes occur along the line that is determined by the function y = a + b∙x. If b is positive, y increases whenever x increases, and if b has a negative numerical value, y decreases when x increases.
Note also that b can be interpreted as a measure of the steepness of the line. The more y changes when x is increased by 1 unit, the steeper the line gets.
- Correlation is another word for association between variables. There are many measures of correlation between different types of variables, but most often the word is used to designate linear association between metric variables. This type of correlation between two variables is measured by the Pearson correlation coefficient, which varies between -1 and 1. A coefficient value of 0 means no correlation. A coefficient of -1 or 1 means that if we plot the observations on a plane with one variable measured along each of the two axes, all observations would lie on a straight line. In regression terminology this corresponds to a situation where all observations lie on the (linear) regression line and all residuals have the value 0.
- Dependent and independent variables
- The idea behind these concepts is that the values of some variables may be affected by the values of other variables, and that this relation makes the former dependent on the latter, which, from the perspective of this particular relationship, are therefore called independent. In practice, the analysts are the ones who determine which variables shall be treated as dependent and which shall be treated as independent.
- Descriptive statistics
- Descriptive statistics is a branch of statistics that denotes any of the many techniques used to summarize a set of data. The techniques are commonly classified as: 1) Graphical description (graphs) 2) Tabular description (frequency, cross table) 3) Parametric description (central tendency, statistical variability).
- Dichotomies are variables with only two values, e.g. the variable Gender with the two values Male and Female.
- Factor analysis is used to uncover the latent structure (dimensions) of a set of variables. It reduces attribute space from a larger number of variables to a smaller number of factors. The eigenvalue for a given factor reflects the variance in all the variables, which is accounted for by that factor. A factor's eigenvalue may be computed as the sum of its squared factor loadings for all the variables. The ratio of eigenvalues is the ratio of explanatory importance of the factors with respect to the variables. If a factor has a low eigenvalue, then it is contributing little to the explanation of variances in the variables and may be ignored. Note that the eigenvalues associated with the unrotated and rotated solution will differ, though their total will be the same.
- Factor analysis
- The objective with this technique is to explain the most of the variablility among a number of observable random variables in term of a smaller number of unobservable random variables called factors. The observable random variables are modeled as linear combinations of the factors, pluss 'error' terms. The main application of factor analytic techniques are: 1. to reduce the number of variables and 2. to detect structure in the relationships between variables.
- Factor analysis is used to uncover the latent structure (dimensions) of a set of variables. It reduces attribute space from a larger number of variables to a smaller number of factors. The factor loadings are the correlation coefficients between the variables and factors. Factor loadings are the basis for imputing a label to different factors. Analogous to Pearson's r, the squared factor loading is the percentage of variance in the variable, explained by a factor. The sum of the squared factor loadings for all factors for a given variable is the variance in that variable accounted for by all the factors, and this is called the communality. In complete principal components analysis, with no factors dropped, communality is equal to 1.0, or 100% of the variance of the given variable.
- Frequency is a method of describing how the cases are distributed over the different data values on a particular variable. The Frequency table gives an overview of the number of cases that have each of the values on a variable. Frequency tables are most suitable for variables with few data values.
- A function is a mathematical equation in which the values of one dependent variable are seen as uniquely determined by the values of one or more other independent variable. The function y = a + b∙x, for instance, expresses the dependent variable y as a linear function of the independent variable x.
- Identification number
- Unique number given to each member of a survey sample. The identification numbers of the ESS survey sample members are stored as the variable ‘idno’.
- An index variable is a variable that in one way or another summarises information about several other variables. Index variables are most frequently used where the data set includes several measures of the same basic phenomenon, e.g. political participation, status, etc. We can combine these measures or indicators in one index to create a variable which gives an overall impression of the basic phenomenon. But note that many authors use the word scale (e.g. a summated scale) rather than the word index to denote variables that are composed of several measures of the same basic phenomenon.
- This statistic shows how the distribution of the variables deviates from the normal distribution. The normal distribution of a variable is a bell-shaped symmetric curve where approximately 2/3 of the cases are within 1 standard deviation on either side of the mean value and approximately 95 % of the cases fall within these 2 standard deviations. Kurtosis is a measure of the degree to which a variable meets this condition - whether it has a more concentrated (more peaked) or more even (flat) distribution. A positive kurtosis tells us that the distribution of the variable is more peaked than the normal distribution. A negative kurtosis tells us that the distribution is less peaked than the normal distribution. (Skewness tells us whether the variable meets the normal distribution curve's symmetry requirement.)
- When respondents answer to a Likert questionnaire item, they normally specify their level of agreement to a statement on a five point scale which ranges from ‘strongly disagree’ to ‘strongly agree’ through ‘disagree’, ‘neither agree nor disagree’, and ‘agree’.
- Listwise deletion is a method used to exclude cases with missing values on the specified variable(s). The cases used in the analysis are cases without missing values on the variable(s) specified.
- Logical operators
Use these relational logical operators in If statements in SPSS commands:
EQ or = Equal to NE or ~= or = or <> Not equal to LT or < Less than LE or <= Less than or equal to GT or > Greater than GE or >= Greater than or equal to
Two or more relations can be logically joined using the logical operators AND and OR. Logical operators combine relations according to the following rules:
- The ampersand (&) symbol is a valid substitute for the logical operator AND. The vertical bar ( | ) is a valid substitute for the logical operator OR.
- Only one logical operator can be used to combine two relations. However, multiple relations can be combined into a complex logical expression.
- Regardless of the number of relations and logical operators used to build a logical expression, the result is either true, false or indeterminate because of missing values.
- Operators or expressions cannot be implied. For example, X EQ 1 OR 2 is illegal; you must specify X EQ 1 OR X EQ 2.
- The ANY and RANGE functions can be used to simplify complex expressions.
AND Both relations must be true for the complex expression to be true. OR If either relation is true, the complex expression is true.
The following table lists the outcomes for AND and OR combinations.Logical outcomes
Expression Outcome Expression Outcome true AND true = true true OR true = true true AND false = false true OR false = true false AND false = false false OR false = true true AND missing = missing true OR missing = true missing AND missing = missing missing OR missing = missing false AND missing = false false OR missing = missing
- Data matrix
- When preparing data for statistical analysis, we structure the material in a data matrix. A data matrix has one row for each case and a fixed column for each variable. The cases are distributed over the values of each variable, so that the values are shown in the cells of the matrix.
- A statistical method for estimating population parameters (as the mean and variance) from sample data that selects as estimates those parameter values maximizing the probability of obtaining the observed data.
- The arithmetical mean is a measure for the central tendency for metric variables. The arithmetical mean is the sum of all the cases’ variable values divided by the number of cases.
- The median is a measure of the central tendency for ordinal or metric variables. The median is the value that divides a sorted distribution into two equal parts, i.e. the value of the case with 50 % of the cases above it and 50 % below. For example: If you have measured the variable Height for 25 persons and sorted the values in ascending order, the median will be the height of person no. 13 in the sorted sample, i.e. the person that divides the sample into two with an equal number of cases above and below him/her.
- Metric variable
- A variable is metric if we can measure the size of the difference between any two variable values. Age measured in years is metric because the size of the difference between the ages of two persons can be measured quantitatively in years. Other examples of metric variables are length of education measured in years, and income measured in monetary units. Thus, we can use linear regression to assess the association between these two variables.
- Missing Values
SPSS acknowledges two types of missing values: System-missing and User-missing. If a case has not been (or cannot automatically be) assigned a value on a variable, that case’s value on that variable is automatically set to ‘System missing’ and will appear as a . (dot) in the data matrix. Cases with System-missing values on a variable are not used in computations which include that variable.
If a case has been assigned a value code on a variable, the user may define that code as User-missing. By default, User-missing values are treated in the same way as System-missing values.
In the ESS dataset, refusals to answer and ‘don’t know’ answers etc. have been preset as User-missing to prevent you from making unwarranted use of them in numeric calculations. If you need to use these values to create dummy variables or for other purposes, you must first redefine them as non-missing. One way to achieve this is to open the ‘Variable View’ in the data editor, find the row of the variable whose missing values you want to redefine, go right to the ‘Missing’ column, click the right-hand side of the cell, and tick ‘No missing values’ in the dialogue box that pops up. You can also use the MISSING VALUES syntax command (see SPSS’s help function for instructions). Cases with System-missing values can be assigned valid values using the ‘Recode into different variables’ feature in the ‘Transform’ menu. Be careful when you use this option, that you do not overwrite value assignments that you would have preferred to keep as they are.
Moreover, if you need to define more values as User-missing, you can use the syntax command MISSING VALUES or the relevant variable’s cell in the ‘Missing’ column in the ‘Variable View’.
- The mode is a measure of the central tendency. The mode of a sample is the value which occurs most frequently in the sample.
- Nominal variable
- Nominal variables measure whether any two observations have equal or different values but not whether one value is larger or smaller than another. Occupation and nationality are examples of such variables.
- Normal distribution
- Normal distribution is a theoretical distribution which many given empirical variable distributions resemble. If a variable has a normal distribution (i.e. resembles the theoretical normal distribution), the highest frequencies are concentrated round the variable's mean value. The distribution curve is symmetric round the mean and shaped like a bell. Approximately 2/3 of all cases will fall within 1 standard deviation to either side of the mean value. Approximately 95 % of the cases fall within 2 such deviations.
- Operationalisation is the process of converting concepts into specific observable behaviours or attitudes. For example, highest education completed could be an operationalisation of the concept academic skill.
- Ordinal variable
- Ordinal variables measure whether one observation (case / individual) has a larger or smaller value than another but not the exact size of the difference. Measurements of opinions with values such as excellent, very good, good etc. are examples of ordinal variables because we know little about the size of the difference between ‘very good’ and ‘good’ etc.
- Overall mean
- The overall mean of a variable is the mean of all participating individuals (cases / observations) irrespective of their values on other variables.
- SPSS distinguishes between pairwise and listwise analyses. In a pairwise analysis the correlations between each pair of variables are determined on the basis of all cases with valid values on those two variables. This takes place regardless of the values of these cases on other specified variables.
- Policy cycle
- This is the technical term used to refer to the process of policy development from the identification of need, through assessment and piloting, to implementation and evaluation.
- Recode reassigns the values of existing variables or collapses ranges of existing values into new values. For example, you could collapse income into income range categories.
- Regression is a method of estimating some conditional aspect of a dependent variable’s value distribution given the values of some other variable or variables. The most common regression method is linear regression, by means of which a variable’s conditional mean is estimated as a linear function of one or more other variables. The objective is to explain or predict variations in the dependent variable by means of the independent variables.
- In psychometrics, reliability is the accuracy of the scores of a measure. The most common internal consistency measure is Cronbach's alpha. Reliability does not imply validity. A reliable measure is measuring something consistently. A valid measure is measuring what it is supposed to measure. A Rolex may be a very reliable instrument for measuring the time, but if it is wrong, it does not give a valid measure of what the time really is.
- A residual is the difference between the observed and the predicted dependent variable value of a particular person (case / observation).
- Response bias
- A response bias is a systematic bias towards a certain type of response (e.g. low, or extreme responses) that masks true levels of the construct that one is attempting to measure.
- The scattergram is a graphic method of presentation, in which the different cases are plotted as points along two (three) axes defined by the two (three) variables included in the analysis.
- Select cases
- Select the cases (persons / observations) you want to use in your analysis by clicking ‘Data’ and ‘Select Cases’ on the SPSS menu bar. Next, select ‘If condition is satisfied’ and click ‘If’. A new dialogue box opens. Type an expression where you use variable names, logical operators and value codes to delineate the cases you want to retain in the analysis from those that you want to exclude.
- Structural equation modelling
- Structural equation modelling (SEM) is a very general statistical modelling technique. Factor analysis, path analysis and regression all represent special cases of SEM. SEM is a largely confirmatory, rather than exploratory, technique. In SEM, interest usually focuses on latent constructs, for example well-being, rather than on the manifest variables used to measure aspects of well-being. Measurement is recognized as difficult and error-prone. By explicitly modelling measurement error, SEM users seek to derive unbiased estimates for the relations between latent constructs. To this end, SEM allows multiple measures to be associated with a single latent construct.
- In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. ‘A statistically significant difference’ simply means there is statistical evidence that there is a difference; it does not mean that the difference is necessarily large, important, or significant in the common meaning of the word.
- Skewness is a measure that shows how the variable distribution deviates from the normal distribution. A variable with normal distribution has a bell-shaped symmetric distribution round the mean value of the variable. This means that the highest frequencies are in the vicinity of the mean value and that there is an equal number of cases on either side of the mean. Skewness measures deviation from this symmetry. Positive skewness tells us that the value of the majority of the cases is below the mean and hence there is a predominance of cases with positive extreme values. Negative skewness tells us that the majority of the cases are greater than the mean while there is a predominance of negative extreme values.
- Squared value
- A squared value (for instance a squared distance) is that value (distance) multiplied by itself.
- Square root
- A value’s square root is a number that, multiplied by itself, produces that value. Thus, a is the square root of b if a∙a = b.
- Standard deviation
- A variable’s standard deviation is the square root of its variance. Standard deviation is a measure of statistical dispersion.
- Standard error
- The standard error of a parameter (e.g. a regression coefficient) is the standard deviation of that parameter’s sampling distribution.
- Standardised values
- A standardised variable value is the value you get if you take the difference between that value and the variable’s mean value and divide that difference by the variable’s standard deviation. If this is done to all the observed values of a variable, we get a standardised variable. Standardised variables have a 0 mean value and a standard deviation of 1.
- Statistics is the science and practice of developing human knowledge through the use of empirical data. It is based soundly on statistical theory, which is a branch of applied mathematics.
An SPSS syntax is a text command or a combination of text commands used to instruct SPSS to perform operations or calculations on a data set. Such text commands are written and stored in syntax files, which are characterised by the extension .spx.
In order to run the syntax commands that have been provided with this course pack, we suggest that you first open an SPSS syntax file. Either create a new syntax file (click ‘New’ and ‘Syntax’ on the SPSS menu bar’s ‘File’ menu) or open an old one that already contains commands that you want to combine with the new commands (click ‘Open File’ and ‘Syntax’ in the ‘File’ menu, and select the appropriate file). Then find, select and copy the relevant syntax from this course pack’s website and paste it into the open syntax file window. While doing exercises, you may have to make partial changes to the commands by editing the text. Run commands from syntax files by selecting them with the cursor (or the shift/arrow key combination) before you click the blue arrow on the syntax window’s tool bar.
If you use the menu system, you can create syntaxes by clicking ‘Paste’ instead of ‘OK’ before exiting the dialogue boxes. This causes SPSS to write the commands you have prepared to a new or to an open syntax file without executing them. Use this option to store your commands in a file so that you can run them again without having to click your way through a series of menus each time. New commands can be created from old ones by copying old syntaxes and editing the copies. This saves time.
- This is a measure of the covariance of independent variables (normally called colinearity). The tolerance value shows how much of the variance of each independent variable is shared by other independent variables. The value can range from 0 (all variance is shared by other variables) to 1 (all variance is unique to the variable in question). If the tolerance value approaches 0, the results of the analysis may be unreliable. In such cases it is also difficult to determine which of the independent variables explain the variance of the dependant variable.
- After an estimation of a coefficient, the t-statistic for that coefficient is the ratio of the coefficient to its standard error. That can be tested against a t distribution to determine how probable it is that the true value of the coefficient is really zero.
- T test
- The t test is used to determine whether a difference between a sample parameter value (e.g. a mean or a regression coefficient) and a null hypothesis value (or the difference between two parameters) is sufficiently great for us to conclude that the difference is not due to sampling errors. This method can, for instance, help us to decide whether the observed differences in income between men and women (in a sample) are great enough for us to be able to conclude that they are also present in the population from which the sample has been drawn.
- Type I error
- In statistical hypothesis testing, a type I error involves rejecting a null hypothesis (there is no connection) that is true. In other words, finding a result to be significant when this in fact happened by chance.
- Type II error
- In statistical hypothesis testing, a type II error consists of failing to reject an invalid null hypothesis (i.e. falsely accepting an invalid hypothesis. As the likelihood of type II error decreases, the likelihood of type I error increases.
- Validity refers to the degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure. While reliability is concerned with the accuracy of the actual measuring instrument or procedure, validity is concerned with the study's success at measuring what the researchers set out to measure.
- Value codes and value labels
A case's (a person's) value on a variable must be given a code in order for it to be recognised by SPSS. Codes can be of various types, for instance dates, numbers or strings of letters. A variable's codes must consist of numbers if you want to use it in mathematical computations. Value codes may have explanatory labels that tell us what the codes stand for. One way to access these explanations is to open the data file and keep the ‘Variable view’ of the SPSS 'Data editor' window open. (You can toggle between 'Variable view' and 'Data view' by clicking the buttons in the lower part of the window.) In the 'Data view', each variable has its own row. Find the cell where the row of the variable you are interested in meets the column called 'Values'. Click the right end of the cell. A dialogue box that displays codes and corresponding explanatory labels appears. These dialogue boxes can be used to assign labels to the codes of variables that you have created yourself (recommended). Codes of continuously varying variables do not have explanatory labels. The meaning of the codes of such variables must be stated in the variable label.
The value labels can also be accessed from the 'Variables' option in the 'Utilities' menu or from the variable lists that appear in many dialogue boxes. Right click the variable label and click 'Variable information'.
- By variables we mean characteristics or facts about cases about which the data contain information, e.g. the individual questions on a questionnaire. For instance, if the cases are countries or other geographical areas, we may have population figures, etc. There are several types of variables, and the variable type determines which methods and forms of presentation should be used. Where the individual groups or values have no obvious order or ranking, the variable is called a nominal variable (gender). The next type is called an ordinal variable. In addition to the actual classification, there is a natural principle ranking the different values in a particular order. It can obviously be claimed that the response "Very interested in politics" is evidence of a stronger political interest than "Quite interested". The values of ordinal variables have a natural order, but the intervals between the values cannot be measured. The third type is called metric variables. These are variables that in some way measure parameters, quantities, percentages etc., using a scale based on the numerical system. The numerical values of these variables have a direct and intuitive meaning. They are not codes used as surrogates for the real responses as in the case of the nominal and ordinal variables. It follows that these variables also have the arithmetic properties of numbers. The values have a natural order, and the intervals between them can be measured. We can say that one person is three times the age of another without violating any logical or mathematical rules. It is also possible to compute the average age of a group of people.
- A variable’s variance is the sum of all the squared differences between its observed values and its overall mean value, divided by the number of observations. (Subtract 1 from the number of observations if you are computing an estimate of a population’s variance by means of sample data.) The variance is a measure of the statistical dispersion, and it is defined as the mean square deviation of a continuous distribution.
- The variation is a number indicating the dispersion in a distribution: How typical is the central tendency of the other sample observations? For continuous variables, the variance and the standard deviation are the most commonly used measures for dispersion.
- Factor analysis is used to uncover the latent structure (dimensions) of a set of variables. It reduces attribute space from a larger number of variables to a smaller number of factors. Varimax rotation seeks to maximize the variances of the squared normalized factor loadings across variables for each factor. This is equivalent to maximizing the variances in the columns of the matrix of the squared normalized factor loadings. The goal of rotation is to obtain a clear pattern of loadings, i .e., the factors are somehow clearly marked by high loadings for some variables and low loadings for other variables. This general pattern is called ‘Simple Structure’.
- Weighting allows you to assign a different weight to the different cases in the analysis file. In SPSS, a weight variable can be used to assign different weights to different cases when used in calculations. If case A has value 4 and weight 0.5, while case B has value 6 and weight 1.5, their weighted mean is (4 ∙ 0.5 + 6 ∙ 1.5)/2 = 5.5, whereas their unweighted mean is (4 + 6)/2 = 5. Use the ‘Weight Variable’ procedure in the ‘Data’ menu to make SPSS perform weighting. Information about how and why you may want to use weights when analysing ESS data can be found on NSD’s web pages (follow the link to the reference site). Weighting is usually used for correcting skewness in a sample that is meant to represent a particular population. It can also be used for "blowing up" sample data so that the analysis results are shown in figures that are in accordance with the size of the population.