THE SINGLE-LAYER PERCEPTRON...

Department of Phonetics and Linguistics

THE SINGLE-LAYER PERCEPTRON AS A MODEL OF HUMAN CATEGORISATION BEHAVIOUR

Roel SMITS and Louis TEN BOSCH

Abstract
A descriptive model of human categorisation behaviour is presented in which the single-layer perceptron (SLP) is the central part. First the modelling properties of the model are studied. It appears to be useful to separate an average component and a differential component in the ratio of the response probabilities for two competing classes. The differential component exclusively determines the location of the linear equal-probability boundary between the two classes. The average component effectively "scales" the contribution of the differential component. In one extreme case of scaling the SLP-based model is shown to be equivalent to an asymptotic instance of the well-known similarity-choice model. It is also shown that the model's probability functions may have local extrema in the feature space. Connectionist models, such as our SLP-based model, generally have a large number of parameters to be estimated, which may lead to overfitting. Therefore we propose a way in which a cross-validation technique called the "leaving-one-out" method can be used to estimate the generalizability of our model after it has been fitted on human classification data.

1. Introduction
Categorisation plays an important role in everyday processes of perception and cognition, such as the recognition of spoken and written language. A number of formal models for the categorisation process have been developed , such as the similarity-choice model (SCM, Shepard, 1958; Luce, 1963), multi-dimensional scaling (MDS, Kruskal, 1964), the fuzzy logical model of perception (Oden and Massaro, 1978), multiple-exemplar models (Nosofsky, 1986), and general recognition theory (GRT, Ashby and Perrin, 1988).

During the last decade, connectionist models have become very popular, not in the least for the modelling of perceptual and cognitive processes (e.g. McClelland, Rumelhart and the PDP research group, 1986; Quinlan, 1991). The multi-layer perceptron (MLP) is probably the mathematically best-developed connectionist model (e.g. Lippman, 1987; Hertz, Krogh and Palmer, 1991; Haykin, 1994), and is currently widely applied as an engineering tool. Within the field of psychology, a number of formal perceptron-related models have been developed to describe category learning (e.g. Gluck and Bower, 1988; Kruschke, 1992). Essential in these models are the adaptive properties of the connection weights during a training procedure. Typically, the model performance is evaluated by studying correct classification rate as a function of training epochs.

Although the perceptron has also been used as a model of the human categorisation process itself (rather than the learning of this process), a formal set-up of the model in this context as well as a study of its modelling properties and capacities seems to be lacking. As a result, the perceptron is easily misused as a model of human categorisation behaviour, and continues to meet with considerable suspicion, as it is considered to be too powerful or unconstrained a model which is easily "overfitted" and which does not allow for the extraction of knowledge from its parameters (e.g. Massaro, 1988).

It is the purpose of this paper to fill parts of the knowledge gap mentioned above. In this paper, we will restrict ourselves to a model based on the single-layer perceptron (SLP)¹. The model we will be presenting is intended purely as a model of human categorisation behaviour, not of the learning of this behaviour. Furthermore, we do not claim that the details of the processing within the SLP-based model embody actual psychological mechanisms. Thus, the model is simply intended as an analysis tool for categorisation data. Nevertheless, we will show that fundamental knowledge on human categorisation mechanisms can be extracted from a fitted SLP-based model by studying aspects the model's input-output relation.

¹ A "two-layer perceptron" may seem to be a more appropriate name for perceptron consisting of an imput layer and an putput layer. Nevertheless, in accordance with other authors (e.g. Haykin, 1996; Lippman, 1987), we have chosen to use the term single-layer perceptron because the output layer is the only "real" layer in the sense that it consists of neurons which perform the summation and nonlinear activation transfer

1.1 General approach
In order to fit a particular categorisation model on a set of experimental data in a useful way, at least the following 3 basic issues have to be dealt with.

The model parameters have to estimated in such a way that the best possible account is given of the observed behaviour. We will call this step the model estimation.
The performance of the model in accounting for the observed behaviour has to be evaluated. This step will be called model evaluation.
The model has to be interpreted in order to gain insight into the relevant psychological processes. This step will be called model interpretation.

On issue 1, the estimation of a perceptron on a set of data, a considerable amount of literature is available (e.g. Hertz, Krogh and Palmer, 1991; Haykin, 1994). Therefore we will not expand on in this issue here. Issue 3 is closely related to the model's properties and capacities which will be treated first. In the sections 2 and 3 we will set up the general model structure and discuss the mathematics of the SLP. On the basis of the mathematical expressions as well as a number of examples we will discuss the SLP's modelling properties and how to interpret a model which is fitted on an actual data set. In section 4 we will discuss issue 2. As indicated earlier, the perceptron, often having a great many parameters to be estimated, can easily be overfitted. We therefore present a practical method for estimating the generalizability of particular estimates of the SLP-based model. Some of the proposed methods are not specific for the SLP-based model and can also be used with other categorisation models. Finally, in chapter 5 the comprehensive methodology is illustrated by a practical example which deals with the issue of the perception of stop consonants.

2. General model structure and definitions
In each trial in a categorisation experiment a subject is presented with one of N_s stimuli and is required to assign one of N_r predefined labels to this stimulus. Essentially, N_r< N_s. For the sake of simplicity, we will assume that each of the N_s stimuli is presented to the subject N_p times. Furthermore, we will assume that in each trial the categorisation of the presented stimulus does not depend on previous trials. On the basis of this assumption, the results of the experiment can, without loss of information, be summarised in a stimulus-response matrix consisting of N_s rows and N_r columns. Each entry R_ij in the stimulus-response matrix denotes the number of times the stimulus S_i has been labelled as belonging to category C_j. Note that and .

We will now propose a model which simulates the mapping of a set of stimuli onto a set of categorical responses. The proposed model consists of 3 steps:

extraction of stimulus features,
calculation of class probabilities on the basis of stimulus features,
actual choice of a single response class on the basis of class probabilities.

These steps can be described as a cascade, as is shown in Figure 1.

Figure 1. Schematic representation of the model for perceptual classification. 1, 2, and 3 indicate the representation stage, retrieval stage, and response selection stage, respectively.

The model fits into the general framework described by Ashby (1992), which consists of 3 stages: the representation stage, the retrieval stage and the response selection stage.

In the representation stage the stimulus S_i is transformed from the physical domain into an internal representation, which is a vector containing N_F stimulus features. Thus, in the representation stage, each stimulus is mapped to a point in an N_F -dimensional feature space . Feature vectors in will be denoted by F, the value of F for stimulus S_i is denoted by , and the kth component of , that is, the value of feature k of stimulus S_i, is denoted by F_ik. Generally, the specific choice of features in a model will be based on either knowledge of the potential perceptual relevance of various stimulus features or on knowledge of statistics of the stimulus set. Often, in the preparation of the experimental stimuli, a number of stimulus features is explicitly manipulated in order to test their perceptual relevance. The feature extraction is assumed to be deterministic, which means that each presentation of a particular stimulus will lead to the same feature vector, that is, this stage is noise-less.

In the retrieval stage the feature vector of each stimulus S_i is mapped to a vector of length N_r containing the probabilities of choosing each of the response categories. The jth component of the probability vector is denoted by P_ij. The feature-to-class-probability mapping is assumed to be deterministic. Here, it is modelled by the SLP.

In the response selection stage the actual labelling takes place. This labelling is assumed to be probabilistic, and it is here modelled by a multinomial function. Suppose that after N_p presentations of stimulus S_i, a subject has generated an output vector of length N_r. Each component R_ij of denotes the number of times the subject has assigned stimulus S_i to class C_j. The multinomial model states that the probability of generating the response vector after N_p presentations of stimulus S_i, given class probabilities , equals

(1)

The 3-stage mapping can be summarised as follows:

(2)

continues...

Back to SHL 9 Contents

Back to Publications

Back to Phonetics and Linguistics Home Page

Department of Phonetics and Linguistics