Cornell's Factory Sciences Research Project
Informal Update - June 1, 1998
Contact: Lee Schruben (Lee@orie.cornell.edu)

The spring term is (almost) over here and we are looking forward to a productive summer of research. Detailed descriptions of our current research plans are included. The next few weeks are ideal for comments and suggestions. In particular, if we are not working on projects on your hot list or if we are working on projects of low interest, please let me know.
 

Attachment contents:

1. Status of the Virtual Center

2. Confirm SRC Mentor Mailing List

3. Activities Last Term

4. Summer Research Plans
        4.1. Predicting Tool Failures
        4.2. Improving Simulation Algorithms
        4.3. Research on Demand Modeling and Dynamic Capacity Management
                4.3.1. Demand Modeling
                4.3.2. The Equipment Plan and The Manpower Plans
        4.4. Fab Performance Interpretations
        4.5. Hierarchical Cluster Tool/Fab Modeling


1. Status of the Virtual Center

The Factory Sciences Virtual Center is off to a very good start. All of the PIs at the different universities have good relationships, and enthusiasm is high since the ASU meeting last March. Cornell has given copies of our Cluster Tool Performance Simulator to ASU and Maryland for their research. John Fowler (ASU) and I are finishing up a joint paper on frequency domain sensitivity analysis of semiconductor fab simulations with two other co-authors. John has also arranged for all of us to come to Phoenix in June for a workshop with Motorola and a training course at Intel.


2. SRC Mentors:

The following is the list of SRC project mentors Gene sent out. If you know of persons not on this list who might be interested in our work, please let me know.
 
Edward Rietman  Lucent 
Sean Cunningham  Intel 
Larry Farey  LSI 
Sarah Hood  IBM  
Steven Brown  IBM/Micrus 


3. Activities Last Term

Cornell's factory sciences research group hosted seminars on semiconductor manufacturing by Sarah Hood (IBM), Veronica Czitrom (Lucent), Mark Hanson (Lucent), and Sam Wood (Stanford). Dr. Czitrom is scheduled to be a visiting faculty member at Cornell for the Spring term. She will teach a course on statistical practice in semiconductor manufacturing (experimental design and quality). Dr. Rietman has provided us with an extensive data set and will be visiting Cornell next week. Steve Brown sponsored a Master's of Engineering project with us this spring looking at alternatives for product expediting. The students developed an excellent high-level material flow simulator. DEC has given us a copy of Factory Explorer to use in our research.

David Juran (Columbia Univ.) and I have submitted our paper "Modeling the Worker" for publication. This paper contains alternative approaches to "people modeling" appropriate for the highly-skilled workforce in semiconductor manufacturing.

Cornell faculty visited production sites at IBM, Intel, and HP. We appreciate all the time people spent with us as well as the data and insights they have provided.


4. Current Research Plans

4.1. Predicting Tool Failures

 Profs. Samorodnitsky, Resnick, Ruppert, and Schruben
 
Machine down-times have a  major impact on production throughput. With complex and expensive tools now in use, a large proportion of down-times is due to hard failures as opposed to planned maintenance. Furthermore, many different causes of hard failures are now being identified.  Often, a machine that was down goes back online after the apparent cause of the problem has been fixed, only to shortly fail once again for another reason. Obviously, what was viewed as the root  of the problem the first time was only a symptom. If one could identify the real problem the first time around, multiple down-times would be avoided.
 
One of our goals is to investigate statistical correlations in the failure patterns of a given machine and between several machines.  Modern time series techniques can help predict failures in a highly correlated environment.  Another goal is to statistically relate readings of physical parameters of a machine over  large time lags to failure patterns so that failures can be predicted well in advance. A major step in this process is collection of high quality data from the actual manufacturing process. We will experiment with statistics from process control, time series analysis, extreme value theory, and diffusion approximations to seek effective techniques for identifying and predicting failures.

The end goal of this research will be a methodology for automatically tracking both physical parameters of machines and their failures. If successful, the technology can then be deployed in software that will flag likely failures well before they happen. One can then carry out preventive maintenance to reduce the overall down-time significantly.


4.2. Improving Simulation Algorithms

 Professors Samorodnitsky and Schruben

A fab is a complicated system of tool groups. Re-entrant queueing, priorities, mixtures of batch and single mode operations, and the need to revisit specific tools further complicate the situation. Commercially available simulation packages  simulate the detailed flow of wafers through the entire system. Consequently, simulation run times are lengthy and cumbersome, limiting the amount of experience that can be gained.

We will study simulation methodology that potentially execute much faster than conventional simulations by concentrating on bottleneck or critical  groups of tools. Bottleneck tools will still be simulated in great detail. Non-bottleneck groups of tools will not receive detailed simulation. Rather we will estimate (either statistically or using high level queueing models) the distribution of the time it takes a wafer to move through these non-critical steps. Instead of following a wafer leaving a critical group of tools and entering a non-critical group of tools, we will draw a random time from the estimated distribution (depending, perhaps, on the present state of the system) and place that wafer into the next critical group of tools on its schedule after that many units of time.  Since bottleneck groups of tools mostly determine fab performance, by eliminating detailed simulation of non-critical groups of tools, simulation speed should increase dramatically and reliability should not suffer.

We plan to study, both analytically and empirically, the effects of using summary representations of parts of the simulated production flow process. The final methods should result in efficient floor simulation algorithms.
 
A secondary goal is to study the accuracy of simulation methods and  the reliability of the assumptions that go into standard simulation modeling by comparing the assumptions with raw data from the shop floor. This will allow us to begin determining model sensitivities to these assumptions and decide if standard simulation packages produce reliable output. In particular, we will examine the effect  on simulation  reliability of ignoring dependencies and of making speculative and unconfirmed distributional assumptions.


4.3. Research on Demand Modeling and Dynamic Capacity Management

Almost every aspect of the business environment in which today's semiconductor manufacturers operate is highly dynamic and unpredictable. The market, the products, the technology used to manufacture the products, and the skills that operators, engineers, and technicians need are all in a continual state of flux. Tools and training are both extremely expensive. Competition is often fierce. Financial success or failure is often determined by the ability of a company to deploy the right combination of resources at the right time, and to utilize them fully. Our goal is to substantially improve the ability of today's semiconductor manufacturers to accomplish this, in a highly unpredictable, dynamic environment.

By "Demand Modeling and Dynamic Capacity Management" we mean the ability to understand and to model the demand for semiconductors over, say, the next year, and to effectively utilize that information to manage machine tools and human resources.  This research has three key, inter-connected elements: Demand Modeling, the Equipment Plan, and the Manpower Plan.


4.3.1 Demand Modeling

 Professors Roundy and Schruben

The market for semiconductors is unpredictable and highly dynamic.  The people who perform marketing and forecasting functions for semiconductor manufacturers understand a lot about the uncertainties in the marketplace and the different ways in which market demand might evolve over time. However the numerical forecast data that they typically provide to other groups in the corporation does not fully capture this knowledge. One of our goals is to develop a model of the manner in which demand and forecasts of demand evolve over time.  This model will form the basis for a tool that will allow forecasters to quantify their understanding of uncertainty more completely and more accurately. This enhanced, quantified view of the uncertainly that exists in demand forecasts can be used by other organizations within the company to support a variety of strategic, tactical, and operational decisions.

Suppose, for example, that two different chips are both used primarily in multi-media personal computers. Forecasters will understand that an unexpected increase in demand for one of the semiconductors will almost certainly be accompanied by an unexpected increase in the demand for the other. On the other hand, suppose that a new generation of memory chips is coming on line and that demand for the new chips does not grow as rapidly as anticipated. A forecaster will know that the demand for the previous generation of memory chip will remain strong for a longer period of time than was previously believed, i.e., an unexpected decrease in the projected demand for one product is accompanied by an unexpected increase in the demand for another.

Relationships of this type are widely understood, but in most cases, forecasters do not have a modeling tool that is capable of adequately expressing and quantifying them.  We intend to create an easy-to-use method for quantifying and communicating interactions of the type described above, as well as other types of uncertainties.  A strong understanding of the evolving demand for semiconductors is needed to develop plans which will allow for maximal utilization of expensive tools and for the best possible utilization of critical human resources.


4.3.2. The Equipment Plan and the Manpower Plans

 Professors Roundy and Schruben

A major determinant of the financial success of many semiconductor companies is the effectiveness with which they utilize their equipment. The cost of equipment is one of the largest categories of costs that semiconductor companies face. The speed with which technology is evolving limits the useful economic life of much of the equipment, so companies need to gain returns on very large investments over very short periods of time.

The cost of manpower, including engineers, technicians and operators, is usually lower than the cost of the equipment, but it is still a substantial cost.  Competition for qualified people can be intense, and training takes time. There have been many cases in which a very expensive tool has been idle or underutilized because trained personnel were not available.

The information contained in the equipment plan and the manpower plan are largely separate.  Typically they are developed sequentially.  However they need to be consistent with each other.  The equipment plan is clearly a major input into the development of the manpower plan.  When the availability of trained personnel constrains the tool plan, the development of the two plans should be tightly linked.

Our approach to this problem is to use the demand model described above as a basis for determining the need for tools and for human resources.  This is combined with cost data, information on ramp times for new tools, and other pertinent information.  Using this information we intend to formulate optimization algorithms which will determine the best schedules for bringing tools into a new or an existing fab, and a plan for hiring and training new employees, and/or re-training current employees.

The role of the optimization algorithms is to make complex tradeoffs in a cost-effective manner. New hires usually take longer to train than current employees.  The time required to re-train an experienced employee depends on how similar the new tool is to the tools the employee is already trained on. While being trained on a new tool, an employee may not be available for other activities.  Because the number of operators, technicians, and engineers in a fab is usually well into the hundreds, these tradeoffs can become very complex.
 

Summary

The approach we are proposing is both innovative and new.  Elements of it have appeared in the literature, and some related concepts have found their way into industrial practice.  However it is in many ways a substantial advance in the state of the art, and it will require a substantial amount of research and development.

This approach is far superior to current industrial practice.  When key capacity management decisions are made, they will be based on the best understanding of the trend and uncertainties in the market that is currently available.  Costs will be minimized, and the optimization algorithms will resolve complex tradeoffs and allocate scarce resources in the best possible manner.
 


4.4. Fab Performance Interpretations

 Professors Roundy, Ruppert, and Schruben

While we were visiting semiconductor manufacturing sites, we encountered two important problems that may be supported by the same model.

The first problem is to help the industrial engineers (wrong title). These people spend about 25% of their time in meetings, 25% of their  time on miscellaneous administrative duties, and half of their time determining what the capacity of the tools on the shop floor is.  Each industrial engineer is assigned to one tool group (steppers, etc.).  As we understand it, the reason that the tool capacity estimation problem is difficult is because the data that comes back from the shop floor is effected by many factors, some of which originated with the tool group  in questions, and some of which originated with other tool groups.  For example, changes in maintenance schedules effects the capacity of a tool  group.  If one tool group has a capacity problem, another tool group has less work to do and, consequently, less maintenance time.

The second problem is interpreting fab performance statistics. When critical measures improve or get worse, the cause may be the tool group in question, or a result of events which took place at other tool groups, possibly at other times. There is a need to unravel cause and effect, so that credit can be given where credit is due, and so that resources can be allocated to the root causes of important  problems.

 It seems to us like a single model can address both of these concerns. Input to the model would consist of important fab performance data associated with the different tool groups (throughput, queue times, tool downtimes of different types, tool idle times, and to the extent it can  be measured , quality). The model would determine the root cause of  observed fluctuations. It would assist in determining what the capacity  of a given tool group is in two different ways. First, it would  identify time periods during the last month during which the throughput or the maintenance time of a given tool group was effected by other tool groups or other extraneous factors. Second, it would determine when a fundamental change in the capabilities or performance of a given tool group has occurred.


4.5. Hierarchical Modeling

 Professors Ruppert and Schruben

Using adaptive spline techniques we hope to be able to model cluster tool throughput and cycle times for a rich set of configurations providing formulas with sufficient resolution to be used within factory level simulations. The Cornell Cluster Tool Performance Simulator is our target model at the tool level and Factory Explorer (by WWK from DEC) is our target factory simulator.