EEE511: Artificial Neural Computation Systems

Final project topic possibilities

Last updated 8-9-2007


Time-Series Prediction

There have been several competitions in this area in the past. Take the data of any of those, apply any ANN architecture that you desire, and try to beat the winner!
Leuwen competition.
Santa Fe competition.
Or, take any time series that interests you, and try to predict it.


Pattern Recognition

Select any pattern recognition application that has data available. For example, NIST has a large database for handwritten character recognition. MNIST database is a manageable subset of it.

Speech recognition requires modelling time structure in signals, and requires some additional tools for this, such as Hidden Markov Models that are outside the scope of this course. However, short-utterance or phoneme recognition could be possible.

Network intruder detection as a classification task. This was the topic of the 1999 KDD Cup.

UCI Machine Learning Repository has lots of smaller (and a bit bigger, too) databases.

Three sets of challenges based on underlying pattern recognition problems:

 

·         Feature Selection Challenge. Results published in the field of feature selection have been in the past, for the most part, on different data sets or have used different data splits. This makes them hard to compare. We formatted a number of datasets for the purpose of benchmarking feature selection algorithms in a controlled manner. The data sets were chosen to span a wide variety of domains. We chose data sets that had sufficiently many examples to create a large enough test set to obtain statistically significant results. The input variables are continuous or binary, sparse or dense. All problems are two-class classification problems. The similarity of the tasks will allow participants to enter results on all data sets to test the genericity of the algorithms. The challenge in feature selection is to find algorithms that significantly outperform methods using all features, using as benchmark ALL five datasets formatted for that purpose.

 

·         Performance Prediction Challenge. This project is dedicated to stimulate research and reveal the state-of-the art in "model selection" by organizing a competition followed by a workshop. Model selection is a problem in statistics, machine learning, and data mining. Given training data consisting of input-output pairs, a model is built to predict the output from the input, usually by fitting adjustable parameters. Many predictive models have been proposed to perform such tasks, including linear models, neural networks, trees, and kernel methods. Finding methods to optimally select models, which will perform best on new test data, is the object of this project. The competition will help identifying accurate methods of model assessment, which may include variants of the well-known cross-validation methods and novel techniques based on learning theoretic performance bounds. Such methods are of great practical importance in pilot studies, for which it is essential to know precisely how well desired specifications are met.

 

·         Brain Activity Interpretation Competition. The purpose of this competition is to challenge multiple groups to use state-of-the-art techniques to infer subjective experience from a rigorously collected set of fMRI data associated with passive viewing of videos using a quantitative metric of success. Ideally, the analysis of brain activation patterns can characterize what an observer experienced while watching the video (e.g., was there a face present in the video? who was it? was someone speaking? was the event pleasant? how attentive was the viewer?). To advance the methodology and assess the state of the science, DARPA is funding collection of the data through a grant to the University of Pittsburgh, and has provided a competitive prize for the groups showing the most effective methodologies.


Regression

Here is an example of a competition. The task is a regression problem where the goal is to estimate the return from a direct mailing in order to maximize donation profits. This is a huge task!

Smaller regression tasks are given here, together with results using various different methods. Appropriate project could be to take a few datasets, and compare several ANN methods for regression on those datasets.


Implementation of the Tree-Structured Self-Organizing Map

None of the free SOM packages ( SOMToolbox, som_pak) have this implemented. Code it using any language you feel comfortable with. Two papers on this topic are here, and here. Instructor can give more guidance.


Data exploration using Self-Organizing Maps

Acquire any largish dataset that interests you, and analyze the relationships in the data using SOMs. SOM toolbox for Matlab is applicable up to largish databases, large databases probably require using something like som_pak.

Examples:

Get a database of movies, convert the data into a suitable numerical form, and train a map where close neurons encode similar movies. Study what kind of movie clusters form on the map. Study the feasibilty of using this as the basis of an example-based movie search mechanism (and sell it to Blockbuster!).

Get a database of weather information from cities around the globe, convert the data, and train an SOM, and study whether similar climate corresponds to geography at  all. Find your next vacation destination!

Other interesting datasets: Details of all US colleges, Sports statistics of any type, Computer network intruder detection, Semiconductor fabrication fault analysis (available from the instructor).

Lots of data sets are available in the Statlib.


Reinforcement learning

Simulate playing a simple game, such as extended tic-tac-toe, and learn the policy.

Simulate an agent in a "gridworld" where there are locations that give rewards, and demonstrate learning the optimal policy. Example: Create a small maze and simulate a rat learning its way out by using exploration and reinforcement learning.

Then there are lots of simple control problems that could be simulated. Simulate a task of your interest, and implement a neural controller for it.

Sutton's on-line book might contain good ideas: http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html

Likewise for the RLR: http://www-anw.cs.umass.edu/rlr/


Other possibilities

 

The project does not need to be practical; it can be

·         theoretical,

·         a literature review – a deeper study of some sub area of your interest

·         improvement or modification of an existing method (such as "How to regularize LVQ?").