This schedule is tentative and approximate. Changes will be announced
in class and Oncourse. Check this schedule frequently for updates.
Because this is a seminar course with only a few students enrolled, we
will keep the progression of topics very flexible and responsive to
student (and instructor!) interest and comprehension. Thus, the
schedule below will probably be filled in retrospectively as the
course progresses, and serve primarily as a record and review of what
we have covered.
Week |
Dates (Spring 2004) |
Topic and Assignments |
1 |
Jan 12 - Jan 16 |
Introductory readings: A model is a (parameterized) family of
probability distributions, which implies a family of sampling
distributions. Non-textbook items are available in
Oncourse (http://oncourse.iu.edu/); go to the "In
Touch" tab, then look in the section called "Group Spaces" and click
on "Readings".
- Myung, I. J., Pitt, M. A. & Kim, W. (2003). Model
Evaluation, Testing, and Selection. In: K. Lamberts and R. Goldstone
(Eds.), Handbook of Cognition. London: Sage.
- Morgan textbook:
Chapter 1, especially (or even exclusively) Examples 1.1 and 1.3.
Optional additional readings:
- Busemeyer, J. (in preparation). Working manuscript for textbook on
model fitting and comparison. Chapters 1 and 2.
- Navarro, D. J. & Myung, I. J. (2003). Model Evaluation and
Selection. Manuscript for encyclopedia article.
- Zucchini, W. (2000). An Introduction to Model
Selection. Journal of Mathematical Psychology, 44,
41-61.
|
2 |
Jan 19 - Jan 23
No classes Monday Jan. 19: MLK Jr. Day.
|
Maximal likelihood estimation. Non-textbook items are available in
Oncourse (http://oncourse.iu.edu/); go to the "In
Touch" tab, then look in the section called "Group Spaces" and click
on "Readings".
- Myung, I. J. (2003). Tutorial on maximal likelihood
estimation. Journal of Mathematical Psychology, 47,
90-100.
|
|
Left panel shows a
predicted sampling distribution from the memory retention model
p(t) = w1 exp( -w2 t ), with n=100 at
each retention time. An actual sample's data rides (in red dots
connected by red lines) atop the predicted sampling distribution. The
right panel shows the log-likelihood of the data for different
parameter values. Matlab programs and figures by Prof. Kruschke.
|
Optional additional readings:
- Busemeyer, J. (in preparation). Working manuscript for textbook on
model fitting and comparison. Chapter 3.
|
3 |
Jan 26 - Jan 30 |
Continuation of maximal likelihood estimation. New this week:
Numerical estimation/optimization techniques. Non-textbook items
are available in
Oncourse (http://oncourse.iu.edu/); go to the "In
Touch" tab, then look in the section called "Group Spaces" and click
on "Readings".
|
|
Each figure above
(click to enlarge) shows a function R2 -> R with the
linear approximation at a point and the Hessian approximation to the
residuals. The left figure shows a Gaussian function and the right
figure shows a quadratic. Notice in the right figure that the Hessian
matches the linear residuals, to within limits of numerical
approximation. The Matlab program that generated these graphs was
written by Prof. Kruschke. |
- Morgan textbook:
Chapter 2; specifically section 2.2, regarding an example of an
analytical solution to a maximal likelihood estimation problem.
Chapter 3: Function optimization methods.
If time, we'll go back to Myung (2003) and look at his Matlab
programs for maximal likelihood estimation.
- Assignment: The goal of this assignment is for you to learn
about the linear and Hessian approximations to a likelihood
function. A secondary goal is for you to limber up your Matlab
fingers. The assignment is for you to compose Matlab programs that can
make graphs like those shown above. That is, for a given function of
two variables, plot the function, its linear approximation at a point,
its residuals, and the Hessian approximation to the residuals at that
point. Due Monday February 9.
Optional additional readings:
- Busemeyer, J. (in preparation). Working manuscript for textbook on
model fitting and comparison. Chapter 3.
|
4 |
Feb 2 - Feb 6 |
Numerical estimation/optimization techniques. Non-textbook items
are available in
Oncourse (http://oncourse.iu.edu/); go to the "In
Touch" tab, then look in the section called "Group Spaces" and click
on "Readings".
|
Structure of model fitting programs in Matlab, as advised by Prof. Kruschke. |
- Morgan textbook Chapter 3: Understanding the Newton-Raphson method and Hessian matrices. Example of likelihood optimization in Matlab for Fig. 3.4.
- Myung (2003): Example of MLE fit to memory retention data, and Matlab code.
- Busemeyer, J. (in preparation). Working manuscript for textbook on
model fitting and comparison. Chapter 3 and its Appendix.
- Assignment: The goal of this assignment is for you to
get experience with model fitting in Matlab.
The file BusemeyerRetentionModelData.txt, posted in
Oncourse, contains simulated data for a retention experiment like that
described by Busemeyer in his Chapter 3. The file contains one row per
trial, with each row specifying the Subject number, Duration (in
seconds), the Trial number (for that duration for that subject), and
the Recall success (1 for successful recall, 0 for failure to
recall). As you can see by examining the file, there are 10 subjects,
5 different durations, and 50 trials per duration.
Your job is to fit the retention model (Equation 4, Page 9, Chapter 3
of Busemeyer's draft textbook) to each individual subject's data (i.e,
using different parameter values for each subject), and to fit the
model simultaneously to all the subjects (i.e., using a single set of
parameter values to best fit all the data simultaneously).
Use maximal likelihood as your measure of fit. That is, minimize the
negative log-likelihood.
Structure your files the way
Prof. Kruschke recommends (see diagram above), putting the model
in a file separate from the file that specifies the experiment design
and data and fit function. An example of this sort of structure was
described in class on Monday Feb. 9 and can be found in Oncourse under
the "In Touch" tab, "Group Spaces" link "Matlab Programs", then in the
folder "ForMyung2003". The model is specified in the file
"ExpMemModel.m" and the experiment and data etc. are specified in the
file "Murdock61_ExpMemModel.m". The log-likelihood computation also
calls the function specified in "choose.m".
Due Monday Feb. 16, by class time.
|
5 |
Feb 9 - Feb 13 |
Sampling distributions of maximal likelihood estimates.
|
|
|
|
Sampling
distributions of maximal likelihood estimates of parameters for the
"toy" memory retention model p(t) = 1/(1+exp(-r
gt)). Level contours show bivariate normal with same
means and covariance as the sampling distribution. Rows are for
different populations sampled from. Columns are for different sample
sizes. Click on any panel to enlarge. Matlab programs and graphs by
Prof. Kruschke. |
|
6 |
Feb 16 - Feb 20 |
Confidence intervals of maximal likelihood estimates.
Readings:
- Ch. 4 of Morgan textbook. (Unfortunately I'm discovering that the
Morgan textbook is unsuitable as an initial tutorial on these topics;
it might be better as a reminder or review for people already familiar
with the details. Nevertheless, give it a go and I'll do a lot of
derivations in class.)
|
|
|
|
Top left: Sampling
distribution of maximal likelihood estimates for large N (this is same
as the top right graph in last week's figure, rescaled). The other
three panels show three specific samples' actual likelihood surfaces
and Hessian approximations. Confidence regions can be constructed from
the likelihood contours; approximate confidence regions can be
constructed from the Hessian approximations. Matlab programs and
graphs by Prof. Kruschke. |
|
7 |
Feb 23 - Feb 27 |
Readings:
- Verguts, T. and Storms, G. (2003?). Assessing the informational
value of parameter estimates in cognitive models. Unpublished
manuscript.
- Busemeyer, draft textbook, Chapter 4 and Appendix.
- Ratcliff, R. and Tuerlinckx, F. (2002). Estimating parameters of
the diffusion model: Approaches to dealing with contaminant reaction
times and parameter variability. Psychonomic Bulletin and
Review, 9(3), 438-481.
Optional Readings:
- Huber, D. E. (2003?). Computer simulations of the ROUSE model: An
analytic method and a generally applicable technique for producing
parameter confidence intervals. Unpublished manuscript.
- Wichmann, F. A. and Hill, N. J. (2001). The psychometric
function: II. Bootstrap-based confidence intervals and
sampling. Perception and Psychophysics, 63(8), 1314-1329.
Assignments:
- This assignment follows from the Verguts and Storms
manuscript. The purpose is to explore whether the parameter redundancy
(or lack of redundancy) discovered by parameter fitting can be derived
algebraically.
Show algebraically that, in ALCOVE, the specificity parameter ("c")
and the attention learning rate parameter ("lambda_alpha") exactly
trade off. That is, show that when one parameter is "twiddled" a small
amount, an exactly compensating twiddle can be made in the other
parameter. The relevant equations for the ALCOVE model can be found in
the file Kruschke1992.pdf in the Readings Group Space, especially
Eqn. 6, p. 24.
Next, show that, in RASHNL, the specificity parameter and the gain
shift rate parameter do not trade off. See the file KruschkeJ1999.pdf
in the Readings; especially Eqn. 7 on p. 1097.
This is due in class Monday March 22.
-
This assignment follows-up on the linear regression example in
Ratcliff and Tuerlinckx. Consider the model
y = m * x + b + N(0,1)
where m and b are parameters and N(0,1) is a normal distribution with
mean zero and standard deviation 1. For all the scenarios below, a
sample is generated by 20 cases for each of x = {1,3,5,7,9}; i.e., 100
data points total in a sample.
1. Set m = 1 and b
= 1 as true population parameters. Generate 200 random samples from
this population, and for each sample use maximal likelihood estimation
to determine mest and best. Make a scatter plot
of the sampling distribution of mest and
best. Compute the mean of mest and
best, and compare with the true values. Compute the SD of
mest and best, and the correlation of
mest with best.
2. Repeat part 1
but with contaminated samples, as follows. Generate a sample as
before, but then for each individual data point retain it, as is, with
92% probability, but replace it, with 8% probability, with a value
randomly sampled from a uniform distribution over the interval
[0,20]. Compare the new mean mest and mean best
with both the true values and the means obtained from uncontaminated
samples. Comment also on the SDs and correlation, compared with
uncontaminated samples.
3. Reparameterize the model as follows:
y = (m+b) * x + (b-m) + N(0,1)
Repeat part 1; i.e., with uncontaminated samples. Now the generating
model again has the slope and intercept equal to 1; that is, (m+b)=1
and (b-m)=1, which implies that m=0 and b=1. Comment on the
correlation of mest and best for the
reparameterized model, compared with the original model.
Due Monday March 29 in class.
|
8 |
Mar 1 - Mar 5 |
9 |
Mar 8 - Mar 12 |
Spring Break |
Mar 13 - Mar 21 |
|
10 |
Mar 22 - Mar 26 |
Previous weeks explored how to estimate best fitting parameter
values. We now explore how to evaluate whether the best fit is any
good. First, we consider traditional goodness-of-fit testing. Then we
consider measures of model complexity and generalizability.
Readings regarding the generalized likelihood ratio test
(G2) for nested models:
- Busemeyer Ch. 3, pp. 39-41.
- Busemeyer Ch. 4, pp. 22-26.
- Busemeyer Ch. 5, pp. 25-29.
|
|
|
|
|
|
Selected lecture
overheads regarding G2. I hope the information here is
approximately correct; treat it as a noisy sample. |
|
11 |
Mar 29 - Apr 2 |
Cross validation.
|
|
Selected lecture
overheads regarding cross validation. |
|
12 |
Apr 5 - Apr 9 |
More on cross-validation.
Reading: Busemeyer and Wang (2000). Model comparisons and model
selections based on generalization criterion methodology. Journal
of Mathematical Psychology, 41, 171-289.
|
13 |
Apr 12 - Apr 16 |
Bayesian model selection.
Readings:
- Myung and Pitt (1997).
- David J. C. MacKay (1991 NIPS).
- David J. C. MacKay (199X Network article).
|
14 |
Apr 19 - Apr 23 |
Bayesian model selection.
Readings: Continuation from last week.
- Myung and Pitt (1997).
- David J. C. MacKay (1991 NIPS).
- David J. C. MacKay (199X Network article).
|
|
Bayesian parameter estimation and model comparison for two "toy"
retention models: Exponential: p(x) = exp( -a * x ), and
Sigmoidal: p(x) = 1 ./ ( 1 + exp( b * ( x - 1 ))). The left panel
above shows a case of "equal" prior distributions across the two
parameters, with the result being a higher posterior probability for
the Sigmoidal model. The right panel shows a case of different ranges
for the priors of the two parameters, with the result being a higher
posterior probability for the Exponential model. These graphs are
generated by evaluating the model at many finely-spaced increments of
the parameter. For multi-parameter models that are also time-consuming
to evaluate even once, generating analogous graphs or computations
could be impossible in practice. Matlab program and graphs by
Prof. Kruschke.
|
Optional Assignment: (Essentially, make the graphs shown
above.) Consider two "toy" memory retention models, which predict the
probability (p) of recall as a function of duration (x, in some
unspecified time scale):
H1, exponential: p(x) = exp( -a * x )
H2, sigmoidal: p(x) = 1 ./ ( 1 + exp( b * ( x - 1 )))
Each model has a single parameter, a and b respectively. We assume
that for any given duration x, when a subject tries to recall an item,
the probability of success is p(x) and is independent of other trials;
i.e., the sampling distribution of successful recall is distributed as
a binomial distribution with probability p(x).
Duration |
0.33 | 0.67 | 1.33 | 2.67 |
Proportion recall |
0.8 | 0.6 | 0.4 | 0.1 |
We suppose that we have a sample of data from four durations, with
10 items per duration. The observed proportion of recall is as
shown at the right:
(A) Determine the maximal likelihood estimates of a and b for the two
models. This is merely the sort of thing we did in previous weeks; we
are maximizing p(data|a,hyp_a) and p(data|b,hyp_b). Graph the
likelihood functions. Graph the data points superimposed with the
curves of p(x) for these "best fitting" models.
(B) Suppose that we have identical "triangular" prior
probability distributions over the two parameters, such that
p(a|hyp_a) = (1/2) - abs( (1/4) * ( a - 2.0 ) ), 0 <= a <= 4.0.
p(b|hyp_b) = (1/2) - abs( (1/4) * ( b - 2.0 ) ), 0 <= b <= 4.0.
Determine the posterior probability distributions over the two
parameters, given the data above. Use Eqn's 3 and 6 of MacKay's
chapter:
p(a|data,hyp_a) = p(data|a,hyp_a) p(a|hyp_a) / p(data|hyp_a)
[Eqn. 3]
p(data|hyp_a) = Integral_a p(data|a,hyp_a) p(a|hyp_a) delta_a
[Eqn. 6]
When integrating for Eqn. 6, use numerical approximation. That
is, finely divide the range of a into delta_a's, compute p(data|a) and
p(a) at each point, and sum (integrate) them up across points. Don't
forget to include delta_a in the product you are summing. Graph the
prior and posterior distributions (superimposed). Is the maximum of
each posterior close to the maximal likelihood estimate of the
corresponding parameter?
(C) Suppose that the prior probabilities of the models are
equal; i.e., p(hyp_a) = p(hyp_b) = 0.5. Determine the posterior
probabilities of the models, given the data. Use the equations
p(hyp_a|data) = p(data|hyp_a) p(hyp_a) / p(data)
p(data) = p(data|hyp_a) p(hyp_a) + p(data|hyp_b) p(hyp_b)
and Eqn. 6, above.
|
15 |
Apr 26 - Apr 30 |
Reading: Pitt, Myung and Zhang (2002). Toward a method of selecting
among computational models of cognition. Psychological
Review, 109, 472-491.
|
Final Exams |
May 3 - May 8 |
(no final exam) |