Last updated 8-9-2007
Time-Series Prediction
There have been several
competitions in this area in the past. Take the data of any of those, apply any
ANN architecture that you desire, and try to beat the winner!
Leuwen
competition.
Santa
Fe competition.
Or, take any time series that interests you, and try to predict it.
Pattern Recognition
Select any pattern recognition application that has data available. For example, NIST has a large database for handwritten character recognition. MNIST database is a manageable subset of it.
Speech recognition requires modelling time structure in signals, and requires some additional tools for this, such as Hidden Markov Models that are outside the scope of this course. However, short-utterance or phoneme recognition could be possible.
Network intruder detection as a classification task. This was the topic of the 1999 KDD Cup.
UCI Machine Learning Repository has lots of smaller (and a bit bigger, too) databases.
Three sets of challenges based on underlying pattern recognition problems:
· Feature Selection Challenge. Results published in the field of feature selection have been in the past, for the most part, on different data sets or have used different data splits. This makes them hard to compare. We formatted a number of datasets for the purpose of benchmarking feature selection algorithms in a controlled manner. The data sets were chosen to span a wide variety of domains. We chose data sets that had sufficiently many examples to create a large enough test set to obtain statistically significant results. The input variables are continuous or binary, sparse or dense. All problems are two-class classification problems. The similarity of the tasks will allow participants to enter results on all data sets to test the genericity of the algorithms. The challenge in feature selection is to find algorithms that significantly outperform methods using all features, using as benchmark ALL five datasets formatted for that purpose.
· Performance Prediction Challenge. This project is dedicated to stimulate research and reveal the state-of-the art in "model selection" by organizing a competition followed by a workshop. Model selection is a problem in statistics, machine learning, and data mining. Given training data consisting of input-output pairs, a model is built to predict the output from the input, usually by fitting adjustable parameters. Many predictive models have been proposed to perform such tasks, including linear models, neural networks, trees, and kernel methods. Finding methods to optimally select models, which will perform best on new test data, is the object of this project. The competition will help identifying accurate methods of model assessment, which may include variants of the well-known cross-validation methods and novel techniques based on learning theoretic performance bounds. Such methods are of great practical importance in pilot studies, for which it is essential to know precisely how well desired specifications are met.
·
Brain
Activity Interpretation Competition. The purpose of this
competition is to challenge multiple groups to use state-of-the-art techniques
to infer subjective experience from a rigorously collected set of fMRI data
associated with passive viewing of videos using a quantitative metric of
success. Ideally, the analysis of brain activation patterns can characterize
what an observer experienced while watching the video (e.g., was there a face
present in the video? who was it? was someone speaking? was the event pleasant?
how attentive was the viewer?). To advance the methodology and assess the state
of the science, DARPA is funding collection of the data through a grant to the
Regression
Here is an example of a competition. The task is a regression problem where the goal is to estimate the return from a direct mailing in order to maximize donation profits. This is a huge task!
Smaller regression tasks are given here, together with results using various different methods. Appropriate project could be to take a few datasets, and compare several ANN methods for regression on those datasets.
Implementation of the Tree-Structured Self-Organizing Map
None of the free SOM packages ( SOMToolbox, som_pak) have this implemented. Code it using any language you feel comfortable with. Two papers on this topic are here, and here. Instructor can give more guidance.
Data exploration using Self-Organizing Maps
Acquire any largish dataset that interests you, and analyze the relationships in the data using SOMs. SOM toolbox for Matlab is applicable up to largish databases, large databases probably require using something like som_pak.
Examples:
Get a database of movies, convert the data into a suitable numerical form, and train a map where close neurons encode similar movies. Study what kind of movie clusters form on the map. Study the feasibilty of using this as the basis of an example-based movie search mechanism (and sell it to Blockbuster!).
Get a database of weather information from cities around the globe, convert the data, and train an SOM, and study whether similar climate corresponds to geography at all. Find your next vacation destination!
Other interesting datasets: Details of all
Lots of data sets are available in the Statlib.
Reinforcement learning
Simulate playing a simple game, such as extended tic-tac-toe, and learn the policy.
Simulate an agent in a "gridworld" where there are locations that give rewards, and demonstrate learning the optimal policy. Example: Create a small maze and simulate a rat learning its way out by using exploration and reinforcement learning.
Then there are lots of simple control problems that could be simulated. Simulate a task of your interest, and implement a neural controller for it.
Sutton's on-line book might contain good ideas: http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html
Likewise for the RLR: http://www-anw.cs.umass.edu/rlr/
Other possibilities
The project does not need to be practical; it can be
· theoretical,
· a literature review – a deeper study of some sub area of your interest
· improvement or modification of an existing method (such as "How to regularize LVQ?").