

MAT4374 Computational Statistics (Summer 2008) Course Synopsis following the Textbook: EM Optimization Methods , Simulation and Monte Carlo Methods, Markov Chain Monte Carlo , Bootstrapping, Permutation Tests Professor: David McDonald Office Hours: Wednesday, 10:00  11:00, or by appointment. email:
dmdsg@uottawa.ca Tuesday, 16:00  18:00 CBE 202 (The Cube) Thursday, 16:00  18:00 CBE 202 (The Cube) Courses start May 1 and end July 31
Textbook: Givens, G.H., Hoeting, J.A. (2005). Computational Statistics. John Wiley & Sons, New Jersey. Check the website for this book http://www.stat.colostate.edu/computationalstatistics/ to get datasets, some R code and information. Computing: The computer package S plus will be used. It is available in the Cube. You can buy a student copy for $20 at MRC133. Syllabus: Chapter 6: Simulation Chapter 7: Markov Chain Monte Carlo Chapter 8: More MCMC Chapter 9: The Bootstrap Grades: There will be a small number of homeworks on background material (Markov chains and so forth) worth 10% plus a term test on basic material (Markov chains and so forth) 25% (on June 26th). There won't be a final exam but each student must make a presentation. The presentation will be on one of the following:
1) Antithetic sampling 6.3.2  May 27 2) Control variates 6.3.3  June 3 3) RaoBlackwellization 6.3.4 June 10 4) Importance Sampling 6.2  simulation of random variables June 17 5) Importance Sampling 6.3 network simulation July 1 6) Perfect sampling July 8 7) Perfect sampling II July 15 The presenter will consult with the professor at least two weeks before the presentation. The student will present the material in class together with a written report in pdf format which will attached to this page one week before the presentation. The student will illustrate the algorithm by solving a practical problem. The solution must be implemented in S+ (from the web if possible). The software and instructions are part of the report.The student will also prepare 15 minute written quiz everybody will write after the presentation. The presenter will grade the quiz from 1 to 10 and return the grades to the professor. The presenter will also prepare a homework problem to be solved using the algorithm and software presented. Each student will turn this in for grading one week later. The presenter will provide a solution to the homework problem. The professor will post the solution to the quiz and the homework and grade each homework from 1 to 10. The quiz grades will count for 5%. The presentation homework grades will count for 15%. The quiz and the homework problem should be cleared with the professor two weeks before the presentation. Along with the homework, each student will submit a one page written critique on the presentation giving a grade from 1 to 100 based on the following: 1) clarity of the presentation 2) ease of use of the software provided 3) how much the presentation helped in solving the homework problem 4) how much was learned. NO TWO PRESENTATIONS SHOULD RECEIVE THE SAME GRADE! The professor will use these grades and the results of the quiz and homework plus his own assessment to assign a grade on 40% to the presentation. The professor will also assign a grade from 1 to 10 to each critique based on insightful comments and the concordance of the critique with the professor's assessment and that of the other students; i.e. a decisive, accurate critique is better than a limip, vague critique. The total grade for the critiques will be 5%. May 1  We start with the basic concepts of random number generation and simulation. HW1 due May 15 : a) Solution We have just received a secret message from a friend. Each of us have a memory stick with 100 gigabytes of random data. We use one megabyte of data per day as a one time pad; i.e. we never reuse the same random numbers. The start of todays one time pad is 646F7070 and we have received the data 0C0A1C00 all written in hexadecimal. What is the message in ASCII? We know our friend has taken the original ASCII word and converted it to hex. Then he xored the hex with 646F7070. We can get the message back by xoring the coded messge with 646F7070. What is the message? Note that this code is totally unbreakable if the one time pad was created from truly random numbers because a one becomes a zero or a one with probability 1/2. Note that we could send a megabyte a day for the next 100 years without getting a new pad. b) A circuit board has 4 components A B arranged as an array on the board. C D The circuit board is 5 cm by 5 cm. The specifications for all components is 2cm wide and long but there is a standard deviation of .3 centimeters in each dimension. The distribution of the length and width is normal. If the components don't fit onto the board then the board is defective. What proportion of the boards are defective? Solve by a simulation in S+. Generate a large number of boards. For each board check if components overlap using the data > transform ifelse function. For instance if I create a column AW  widths for component A and a column BW  widths for component B then the expression ifelse(AW+BW<5,0,1) will encode a zero if the widths of A and B fit across the board. Create columns which check the different dimensions and then create a column of zeros and ones for all those that have a dimension that doesn't fit. The proportion that don't fit is your estimate for the proportion of defectives. c) Use the rejection method to generate random variables with density (5/12)(1+(x1)^4) on [0,2]. Try to be efficient. Write the code in Splus. HW2 Due May 22 Solution a) Redo c) from HW1 a bit better using squeeze sampling. b) Consider a probability
transition kernel $K$ on three states: 0,1,2. Review Markov chains and generate chains with Splus. For instance the code below generates the chain analyzed in class b1 < 1/3 Week
three: We started MCMC. The next homework is due
June 6th.
The task is to write the Splus code to implement the baseball
example given in class. Please review the theory on the baseball
example is given in the attached
pdf We are starting the bootstrap. See the page http://www.insightful.com/hesterberg/bootstrap/#Reference.shortCourse and the book http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf Here is the Splus program for the presentation on antithetic sampling: X1 < rep(0,50000) X2 < rep(0,50000) ans < rep(0,100000) i < 1 h < function(x){(3*x+17)/2^x} while(i <= 50000) { x < rnorm(1,0,1) ans[i] < h(x) X1[i] < h(x) ans[i+50000] < h(x) X2[i] < h(x) i < i+1 } mean(ans) cor(X1,X2) Here is Mr. Mei's Splus program for the presentation on importance sampling. Here is Mr. Tong's Splus program from the presentation on importance sampling. For the perfect sampling project: The steady state given for the matrix K= .5 .3 .2 .4 .2 .4 .3 .5 .2 was WRONG. It should have been (.4151,3108,.2641) Below is the code for perfect sampling . Mr. Mei made some corrections and it now seems correct: hold< matrix(0,1000,1) k<1 my<function(state,pass) { if({{state==1} && {0 <= pass}} && {pass<=0.5}){return(1); stop} if({{state==1} && {0.5 < pass}} && {pass<=0.8}){return(2); stop} if({{state==1} && {0.8< pass}} && {pass<=1}){return(3); stop} if({{state==2} && {0<= pass}} && {pass<=0.4}){return(1); stop} if({{state==2} && {0.4< pass}} && {pass<=0.6}){return(2); stop} if({{state==2} && {0.6< pass}} && {pass<=1}){return(3); stop} if({{state==3} && {0<=pass}} && {pass<=0.3}) {return(1); stop} if({{state==3} && {0.3< pass}} && {pass<=0.8}){return(2); stop} if({{state==3} && {0.8< pass}} && {pass<=1}){return(3); stop} } while(k<10001) { X<matrix(0,1000,3) tm < 1 X[1,1]<1 X[1,2]<2 X[1,3]<3 while(X[1,1]!=X[1,2] X[1,1]!=X[1,3]  X[1,2]!=X[1,3]) { hold[tm]< runif(1,0,1) tm< tm+1 X[tm,1] <1 X[tm,2] <2 X[tm,3] <3 for(i in (1:tm1)) { X[tmi,1]< my(X[tmi+1,1],hold[tmi]) X[tmi,2]< my(X[tmi+1,2],hold[tmi]) X[tmi,3]< my(X[tmi+1,3],hold[tmi]) } } res[k,1]<X[1,2] k<k+1 } guiPlot( PlotType = "Histogram", DataSet = "res", AxisType = "Linear") 