Investigators
John Fowler, Matt Carlyle, George Runger, Esma Gel (Arizona State Univ.)
Scott Mason (Univ. of Arkansas)
Oliver Rose (Univ. of Würzburg)
Lars Mönch (Tech. Univ. of Ilmenau)
Roland Sturm (Fraunhofer Institute)
Industrial Liaisons
Manuel Aybar, Texas Instruments Incorporated
Sidal Bilgin, LSI Logic Corporation
Wayne F. Carriker, Intel Corporation
Paul A. Flores, Intel Corporation
James D. Heskin, Texas Instruments Incorporated
Sarah J. Hood, IBM Corporation
Mani Janakiram, Intel Corporation
Joanna Shear, Intel Corporation
Raja S. Sunkara, National Semiconductor Corporation
Scheduling of Semiconductor Wafer Fabs: An Explanation
In a typical wafer fab, there often are dozens of process flows. Each process flow contains 200-500 processing steps and more than one hundred machines. These machines are expensive, ranging in price from couple of hundred thousand dollars to over ten million dollars per tool. The economic necessity to reduce capital spending dictates that such expensive machines be shared by all lots requiring the particular processing operation provided by the machine, even though they may be at different stages of their manufacturing cycle. This results in a manufacturing environment that is different in several ways from both traditional flow shops as well as job shops (see Figure 1). The main consequence of the re-entrant flow nature is that wafers at different stages in their manufacturing cycle have to compete with each other for the same machines. The manner in which this competition is resolved has a clear impact on plant performance measures.

Furthermore, the nature and duration of the various operations in a semiconductor flow differ significantly. Some operations require 15 minutes or less to process a lot, while others may require over twelve hours. Many of these long operations involve batch processes. In reality, it is not uncommon for one third of the fab operations to be batch operations. Batch machines tend to off-load multiple lots (1 to 12) onto tools that are capable of processing only one lot at a time. This leads to the formation of long queues in front of these serial tools and ultimately a non-linear flow of products in the factory. The probabilistic occurrence of long tool failures results in large variability in the time a job spends in process. High variability in cycle times prevents accurate prediction of production cycle times, resulting in longer lead-time commitments. There are some machines, such as implanters, that require significant sequence-dependent setups. If not scheduled well, these tools can become bottlenecks. Finally, some processing steps require an auxiliary resource, such as a reticle in photolithography, in order to process the job. Some of these auxiliary resources are quite expensive, so only a very limited number of them are purchased. Therefore, the challenge is to ensure that the machine and the auxiliary resource are available at the same time.
In order to understand the tools that are currently being utilized in the semiconductor wafer fabrication facility, a survey instrument was created and sent to each of the FORCe member companies. The survey was designed to ask specific questions regarding the types of scheduling methodologies currently implemented, the limitations of these methodologies, and the needs for future generation scheduling systems. In total, 16 respondents from 14 companies participated in this survey, representing fabs from Europe, Asia, and North America.
From this report, we have found that many dispatching systems are in place and are mature installations (install time greater than 5 years). These systems have been considered to be “satisfactory” in that benefits are being received, but the majority believe that more benefits are possible. Specifically, respondents indicated that better scheduling / dispatching rules, test environments, and reporting tools are needed.
Compared to the results of a 1994 SEMATECH survey entitled Measurement and Improvement of Manufacturing Capacity, cycle time and on time delivery have gained significant importance in the fab. Respondents indicated that these performance metrics are most impacted by a bottleneck machine breakdown and jobs going on hold, which were also the two most frequently occurring events as indicated by respondents. Thus, scheduling/rescheduling methodologies that incorporate these events are needed.
With regard to the frequency of rescheduling, respondents had mixed opinions. While most respondents favored rescheduling either at every shift or within a shift, many cited management challenges (operator stability/morale, staging for setup) as well as technical challenges such as hardware support. However, over 35% of respondents would like to reschedule after every job movement.
The primary goal of our proposed research effort is to develop solution methodologies for minimizing total weighted tardiness (TWT) for wafer fabrication facilities. We have taken the first steps towards this goal. In Mason et al. (Journal of Scheduling, to appear), we extended the classical job shop work of Pinedo and Singer (Naval Research Logistics, 1999) and developed a disjunctive graph formulation of the Jm | rj, sjk, B, recre | TWT problem. Unlike previous job shop scheduling efforts, our disjunctive graph formulation accounts for batching tools, tool groups (identical machines operating in parallel), sequence-dependent setup times, and recirculating product flow.
Our Project Plan : 2002 Annual Report
The project consists of four main tasks. The first develops the overall approach of scheduling and rescheduling the entire factory while the second focuses on scheduling individual tool groups. The third task seeks to develop methods to determine when conditions in the factory have rendered the schedule obsolete. Finally, the fourth task is focused on testing our approach. Each of the tasks is briefly described below.
Task 1: Scheduling and Rescheduling Methodologies
Anticipated Primary Result : We will develop a shifting-bottleneck-based approach for scheduling in a wafer fab, including a means for simplifying fab models by “black-boxing” non-critical areas of the fab. We will develop the main scheduling approach, a rescheduling approach that improves sub-optimal schedules in real time, and enhanced features such as order release.
Background : The scheduling and rescheduling approaches developed in this task will form the centerpiece of our decision support methodology. The first step in this process will be to define and implement a basic scheduling approach based on the shifting bottleneck heuristic. This methodological development will follow the outline for future research in both Dr. Mason's and Dr. Russ Dabbas' dissertations. It will also involve developing a means by which non-critical areas (toolgroups, sets of toolgroups, or possibly entire bays of tools) in a fab can be treated as simple delays, as opposed to requiring detailed models and analysis. This will drastically simplify the models used, without impacting their accuracy, and will allow the solution of scheduling problems for larger fabs. We will also consider model simplification strategies to handle issues such as the complexities of batch operations. Other less-studied elements of wafer fab scheduling that we will consider include automated material handling systems, including interbay and intrabay transit systems; the coordination of order release policies with our general scheduling approach; and modifications to the shifting bottleneck strategy that explicitly take into account the structure of any special SSPs (such as those for reticle management) that have been employed (see Task 2).
Description : The first year of this task will be concerned with implementing the overall shifting bottleneck approach for scheduling, and its preparation for integration into our testing environment described in Task 4. Once the main procedure has been implemented (iteratively choosing critical toolgroups and scheduling the work on those groups), the remaining effort in this year will be the implementation of rescheduling. This will involve structuring the approach so that it can be started with an initial (suboptimal) solution, or possibly a partial solution, and can proceed towards a better solution. This ability to dynamically reschedule will enable our methodology to be used in real time, and compete with (or complement) dispatching policies. Year 2 will involve the development of more advanced techniques, especially in the area of building the scheduling models. The most important of these will be the development of techniques to identify non-critical areas of the fab. The goal would be to model these areas as single nodes with simple delays (as opposed to building detailed operational models that track jobs through many sub-stages) in order to drastically reduce the complexity of the resultingmodels. This stage of Task 1 will also investigate the incorporation of automated material handling systems into the scheduling models, which we anticipate will involve some important modifications. Year 3 will be primarily concerned with implementing a module that can determine optimal order release policies. This functionality will be supplied by appending an artificial start node to the disjunctive graph representation for each product. This will allow the scheduling algorithm (or possibly a suitable order release SSP, as an additional Task 2 exercise) to help determine the appropriate lot releases over the scheduling model horizon.
Task 2: Subproblem Solution Procedures
Anticipated primary Result : This task will develop subproblem solution procedures for specific toolgroup types and topologies, for incorporation into the shifting bottleneck approach for fab-level scheduling in Task 1. Important instances include clustertools, photolithography steppers and their associated reticle management issues, etc. We will also consider alternate subproblem objectives.
Background : The development of appropriate SSPs will proceed based on toolgroup types, and will proceed by analyzing the specific characteristics of photolithography steppers and reticle management systems, and clustertools in various areas of a wafer fab. We will also investigate other similar opportunities for future developments of specialized SSPs. Finally, we will explore the usefulness of other performance measures in SSPs based on how they affect the overall solution of the fab scheduling problem, as well as how they encourage robustness in schedules for the individual toolgroups.
Description : Reticle management, and, more generally, auxiliary resource management, will be modeled by adapting a network flow model developed by the PIs under previous SRC funding to serve as a subproblem solution procedure for photolithography equipment. The resulting disjunctive graph representation will simultaneously schedule jobs and resources (reticles), and provide reasonable estimates of completion times through these resource constrained toolgroups. Clustertools provide a completely different modeling challenge than standard equipment, and therefore require special SSPs to handle them correctly. We will extend models of clustertools developed by Dr. Rose and Mr. Duemmler so that they may be incorporated into our scheduling approach. Other performance measures besides total lateness may be appropriate for solving SSPs, especially for the special cases developed previously. For instance, certain machines may need to be scheduled so that their schedules are insensitive to changes in the operating parameters. Many performance measures may more adequately characterize this desire. We will develop SSPs for standard toolgroups as well as the specialized toolgroups mentioned in previous years for this task. These will be incorporated into the scheduling system, allowing for a wide variety of approaches.
Task 3: Statistical Operations Control
Anticipated primary Result : We will develop decision rules that trigger, at an opportunistic time, rescheduling of an operation or operations. These rules will enable an adaptive rescheduling strategy that is based on the current disagreement between actual and predicted schedule performance weighed against the potential improvements gained from rescheduling.
Background : Peak performance is obtained when an optimal schedule is generated for the current status of an operation (including the status of each piece of equipment, orders, inventory, etc.). If an operations performance is significantly different from the schedule, the status that was the basis for the schedule has changed. Consequently, rescheduling may be recommended to realign the model with this new status. In practice, small departures between the schedule and actual process performance are expected; however, large departures indicate a process change, and that rescheduling may be beneficial. A set of rules that monitor these discrepancies in real time can be implemented to trigger a rescheduling event at opportunistic times. A dramatic change such as a tool failure is an easy rule to implement, but we expect more subtle process conditions to signal a need for a new schedule such as WIP changes, deviation from internal due dates, etc. Furthermore, we will consider responses for which moderate departures might trigger modifications to dispatching, while large departures will signal that a new schedule is recommended. This adaptive rescheduling will utilize resources effectively by allowing the process to run normally while on target with quick corrections for significant process changes.
Description : Statistical process control (SPC) approaches monitor process data in real-time to detect statistically significant changes that indicate the need for corrective action. We will develop similar approaches for statistical operations control (SOC) that benefit from the perspective that some natural variation from a schedule is expected, but that significant variation should flag the need for corrective action such as rescheduling. Monitoring the information from several variables simultaneously is typically much more effective than monitoring any single variable. Therefore, triggers for SOC will be based on the simultaneous monitoring of many variables such as WIP levels, internal due-date performance, throughput, etc. To develop our initial rules, a set of simulation runs will be used to compare fab performance with and without rescheduling in various scenarios. The status vectors can then be analyzed to characterize the conditions that benefit the most from a reschedule. We will then explore ruleinduction, decision trees, instance-based learning, neural networks, and more traditional statistical methods as means for developing robust triggers for rescheduling. We will extend our triggering mechanisms by using an alternative approach to characterize the departures expected in the status vector through a joint distribution on its components. Because analytical analysis is probably unreasonable in this case, we will employ simulation to characterize these departures. Experience with multivariate SPC will be used as a guide to generate an appropriate distance metric, and the statistical distributions will then be integrated into our triggers to define a signal of a significant departure. Enhancements and modifications for good integration with rescheduling algorithms are expected. These will consider the effort involved in rescheduling an operation and refine the role of a more graduated response that dispatches appropriately.
Task 4: Testing, Comparison, and Implementation Issues
Anticipated primary Result : We will develop a testing and development framework and implementation of our scheduling, statistical operational control, and rescheduling methodologies. This will consist of a simulation package linked to implementations of our methodologies, and we will provide documentation for integration of these methods with MES systems.
Background : Our primary testing and validation methodology will involve the embedding of our scheduling approaches into a simulation environment that captures fundamental components of wafer fabrication. This environment will simulate the information arriving to a MES, and the scheduling and rescheduling routines will use this data to determine optimal or near-optimal schedules for the work. In this framework, we will be able to provide statistical estimates of the performance of our various approaches, and verify that we are improving over current practice. Our goal is to link optimization software (such as the commercial optimization package CPLEX) to our custom code through callable libraries. This package can be imbedded in turn into a simulation model for testing. We have successfully integrated such a system using SLAM, CPLEX, and Fortran subroutines for a small scheduling problem. This task will entail the investigation and choosing of appropriate software for this approach, testing of that environment on simple dispatching rules, and then implementation and further development of our approaches in this environment for testing purposes. Finally, we will incorporate the SOC signaling methodology and fully integrate it with a rescheduling approach, and provide a guide for implementing our approach and integrating it with MES systems.
Description : We will investigate which simulation and optimization packages can be most easily combined to provide the functionality we need for our various approaches. Any standard programming language can be used to implement our methods, and so the choice will center on the ability to simulate the data available in a typical MES. To create a functional system, we will first implement various dispatching rules and test them in our environment to verify that we are getting the expected performance. The next step will be to implement our scheduling approaches, including the special SSPs for various equipment types. To validate our methods we will compare them both statistically (in terms of long-term expected performance) as well as on an instance-by-instance basis with the dispatching rules implemented in year one. We will also start implementing the SOC procedures, and during testing we will have them raise a flag when rescheduling is indicated.We will then check those individual cases to verify that rescheduling is indeed appropriate, in terms of potential performance gains and the costs of rescheduling. The final phase of this project will include further development of the various SSPs, the auxiliary resource management (e.g., reticles) approaches, the SOC signaling mechanism, and, most importantly, the rescheduling methodology into a complete decision support system. This DSS will be thoroughly tested on long time-horizon simulation runs that model various situations that may arise in the wafer fab context, included unexpected downtimes, changes in auxiliary resource availability, product mix changes, etc. We will also provide a full report detailing how the various components of our methodology can be integrated with an MES system to provide real-time operational support in wafer fabs.
Conclusions
We provided an overview of a three-year research project aimed at developing a new approach to scheduling wafer fabrication facilities. We are currently working with fourteen semiconductor manufacturing companies and two software vendors to assure that the overall scheduling system is feasible and implementable in real-life systems.