"And Then There Were None," by Harvey M. Wagner

University of North Carolina
Chapel Hill, North Carolina 27514

A half century ago, several scholars, some of whom subsequently received Nobel prizes in economics, developed inventory models primarily in response to the needs of American military service branches and a few large corporations. Significant government waste was attributed to mismanagement of weapon systems assets. There were readiness-debilitating shortages in the presence of available supplies of weapon components that were crisscrossing the globe. Big corporations also recognized a potential for profit improvement from getting more bang from a buck of inventory investment.  Given that management information systems 50 years ago employed punched card processing in most organizations, it was impossible for a typical company to adopt the new mathematical inventory models. Fast forward to year 2002. Today we are frustrated when we repeatedly find a favorite brand out of stock at the local grocery or drug store. We are dismayed at experiencing third-world customer service levels only a few blocks from home. We surmise that the stock shortages are rooted in poor management and not in poor systems. This essay focuses on why we continue to and empty shelves where we do business despite a half century of impressive research in inventory modeling, augmented by high-priced multiplatform supply-chain management software. We observe companies that have poor customer service despite excessive inventories. Good inventory modeling and advanced inventory control systems are supposed to eliminate such things. Our story unfolds as follows: The central theme of this reflection is inventory theory in the service of practice— not theory for the sake of theory. First, I will review several formative research findings published between 1950 and 1965. I will make a case for why these research contributions remain relevant today. The references at the end of the article are publications that appear no later than 1965, with only a few exceptions. Next I will segue to an arguable proposition (with all deference to Ron Howard) that nothing is as impractical as a good theory of inventory. I will explain why what has been truly good inventory research for more than 35 years has not done much to advance the practice of industrial inventory control. Finally, I will suggest that developments in information technology renew the opportunity to improve practice provided that particular avenues of inventory systems research are pursued. Silver et al. (1998) published their third edition of an extraordinary text dealing with inventory management. It cites and thematically organizes the findings of more than 1,600 researchers—its topical coverage is encyclopedic (after 1965). In my opinion, it reaches a superlative level of achievement, notwithstanding any of my comments below, and it is readily accessible. Consequently, I am going to refer to it occasionally to back up my propositions; I will use the abbreviation SPP3 whenever I refer to the book.

1. IN SEARCH OF FULFILLMENT The earliest publications on inventory modeling date back to the 1920s—the lot size (square root EOQ) model is the most notable example, with its commercial context of stocks held in businesses. Before 1950, macroeconomists also wrote about fluctuations in inventory levels in the context of classic business cycles. In the early 1950s, a few influential research contributions emerged from this primordial condition. Arrow et al. (1951) analyzed probabilistic inventory models. Simon (1952) wrote about servo theory applied to production. Dvoretzky et al. (1952a, 1952b) discussed inventory models and statistical processes. All of these contributions were published in Econometrica. Whitin (1953) published a book that was devoted solely to inventory themes, and Bellman, at the RAND Corporation, wrote a monograph on dynamic programming in 1953; see Bellman (1957). The United States Air Force, Navy, and Army funded research efforts aimed at improving the performance of logistics systems. The research impetus driving inventory modeling was in full force by 1954. These scholarly endeavors encompassed probability modeling (especially renewal and queuing processes), feedback systems, statistical decision theory, microeconomics, and multi-period optimization. All of these themes are lively today and continue to influence advanced inventory research. Subject classifications: Forecasting: inventory system effectiveness. Inventory/production: impact of forecasts on. Professional: comments on. Area of review: Anniversary Issue (Special). 0030-364X/02/5001-0217 $05.00 1526-5463 electronic ISSN 217 Operations Research ? 2002 INFORMS Vol. 50, No. 1, January–February 2002, pp. 217–226 218 / Wagner the planning horizon. These subtleties are taken for granted today. Inventory research is patently indebted, even now, to the path-breaking modeling of the early 1950s.

2. POINTS OF LIGHT For a decade beginning in the mid-1950s, the core ideas above were pursued vigorously. With the notable exception of Brown (1959), publications assumed that either future demand values are given or that their underlying distribution is completely known. Most researchers continue to make these assumptions. I will summarize a few of the pivotal themes from this ten-year span. Wagner and Whitin (1957, 1958), and Manne (1958) extended the classic deterministic lot-size model with stationary demand to accommodate known demand that fluctuates from period to period. The dynamic lot-size approach was eventually recognized to be a deterministic version of a renewal model, and equivalent to finding a shortest route in an acyclic network. Nevertheless, even sophisticated texts like SPP3 explain it in a complicated way, despite the fact that the dynamic lot-size model is a much simpler acyclic network than a typical critical path, which is standard material in OM textbooks. The network characterization relies on the concave property of the objective function. Bowman (1956) and Johnson (1957) explored alternative convex objective function formulations, and established the consequent optimality of a myopic decision process. The idea is closely related to the notion of a greedy solution process (optimize incrementally and never revise a prior decision). The concept of a ?nite planning horizon determined by the data emerged from the lot-size modeling above. Conditions were discovered that ensured an optimal solution for a ?nite horizon remained optimal for an extended horizon; further, these conditions did not require full knowledge about the longer horizon. Complementary to renewal theory research, Morse (1958) examined steady-state stochastic replenishment systems using queuing theory. He viewed inventory items as analogous to servers in a queuing system, server busy time as tantamount to replenishment lead time, and a waiting line as comparable to customer demand backlog. The mathematical form of optimal multiperiod decision rules in the presence of stochastic demand was explored in different ways. Holt et al. (1956, 1960) demonstrated that under certain assumptions, an optimal rule is linear in the parameters of the demand distribution. These assumptions, however, did not turn out to be suf?ciently appealing to motivate much further research. It was known by the mid-1950s that the form of an optimal policy is sensitive to whether the objective function contains a setup cost and a smooth cost function associated with less than perfect service, the delivery lead time is greater than one period, and unfilled demand is fully backlogged. The noteworthy research achievements mentioned next address these technical challenges. In the early 1950s, it was difficult to obtain historical demand data for individual items (even for weapons systems components stocked at military bases), and computing capacity to automate replenishment formulas was extremely limited. There was little opportunity to empirically test emergent inventory theory. There were, however, fundamental insights from these early research publications. The new literature revealed the appropriate form (architecture) of replenishment rules, the tractability of discrete vis-à-vis continuous time modeling, the interdependence between the reorder point and reorder quantity, the convenience of particular demand distributions (Poisson, exponential, and normal), and the usefulness of the principle of optimality. Since there was little available empirical demand data at the level of a stock keeping unit (SKU), assuming a Poisson or exponential distribution versus a normal distribution was a matter of mathematical elegance versus flexibility in being able to specify both a mean and standard deviation. Square-root-type formulas using continuous time models were computationally more practical than renewal recursions. And mathematically derived replenishment policies that explicitly accounted for imprecision associated with observational data seemed too elegant, if not esoteric, to implement. Early 1950s research also shed light on technical details regarding felicitous model formulations. It was helpful to assume that the economic impact of customer service level is included in the model’s objective function to be optimized (rather than to express it as a side condition), unmet demand is fully backlogged, lead time is knowable and deterministic, demand is iid for an individual SKU (which does not deteriorate or become obsolete), the time horizon is a single period or unbounded, and if the latter, all parameter values are stationary. Even what seems to be slight departures from these assumptions were known to create serious analytic challenges. The aforementioned stochastic inventory modeling research was complemented by deterministic multiperiod multiproduct linear-programming formulations. In the early 1950s, however, performing linear-programming optimization (by the simplex method) meant using punch-card computers. By necessity these models were toy-sized. Only later did it become feasible to think about deterministic multiperiod linear-programming models as a way to consider multiproduct time-phased production and inventory decisions. An important legacy of the early 1950s is an insight about the architecture of inventory control solutions. Optimizing dynamic stochastic models implies finding a strategy or policy—in other words, a rule in which all future decisions are contingent on future states of the system. These states are determined in part by the future outcomes of random events. Consequently, future decisions are described probabilistically. In contrast, optimizing deterministic linear-programming models implies using an algorithm that yields numeric values for all future decisions— therefore all of these values are knowable at the outset of Wagner .  Using reasonable assumptions, Scarf (1960) established the optimality of s S policies: When inventory on hand plus inventory due in less backlog falls below s, order enough to bring it up to S—else, do not order. (It was known that under alternative plausible model assumptions, this simple policy was not optimal.) Roberts (1962) provided a way to compute approximately optimal policies. Veinott and Wagner (1965), and subsequently other researchers, investigated effective computational methods for obtaining optimal policies, and later, approximately optimal policies. Veinott (1965) showed for an important class of situations that a myopic replenishment policy is optimal, and thereby established that an unbounded horizon model can be decoupled so as to yield near-term optimal decisions using only near-term parameter values. Further, the work established a sound basis for what has become an important decision rule in practice: Replenish what you sell. This is an elementary example of a so-called pull system. Clark and Scarf (1960) formulated a seminal and tractable model for inventory replenishment in an environment comprised of several echelons that hold inventory and where ?nal demand is uncertain. Operations research textbooks at that time did not make sharp distinctions among different inventory management settings. Inventory scholars realized by then, however, that inventory theory approximates reality well when a stocked item is ordered from an outside vendor, but not so well when an item is manufactured by the enterprise as a direct result of a replenishment decision. Today this underlying distinction is evident and re?ected in ?nite capacity scheduling models. Expository habits are hard to break, nevertheless, and even SPP3 discusses EOQ in terms that allow for both interpretations of purchasing from an outside vendor and manufacturing the SKU from within the enterprise. By the early 1960s, it was clear that linear programming is a conceptually well-suited alternative for addressing a combination of multi-item, multilocation, multiperiod issues in a capacity-constrained environment, even given its limitations. In contrast, it was hard to imagine that anything practical would result from an extension of stochastic inventory theory models in a pull context to multi-item situations (that is, to something beyond applying single-item analysis to each SKU individually). The reason is that it would be a heroic task to use historic data to establish multivariate demand distributions. The fundamental distinction between multi-item and single-item models corresponds closely to the split in practice between push and pull inventory control systems. In a multi-item manufacturing environment, pull systems, which aggregate orders for all items that are requested, may lead to infeasible production schedules as well as periods with excess capacity; thus push systems tend to be the rule in practice. In a stock-replenishment environment, push systems, which keep capacity fully utilized, may lead to overstocking as well as excessive obsolescence; thus pull / 219 systems tend to be the rule in practice, although transportation constraints sometimes countervail. Forrester (1958, 1961) published an article and later a book on what he called industrial dynamics, and thereby created a computational engine that exempli?es economists’ traditional business-cycle logic. The approach views a SKU’s inventory level as a time series mathematically created by the difference between cumulative production and cumulative demand, layered on a base of safety stock. The industrial dynamics model facilitates visualizing the imposition of boundary conditions on the time series (and their slopes), along with the imposition of servomechanisms (feedback). Despite industrial dynamics’ broad sweep, it does not adequately address probabilistic uncertainty, which ultimately has limited its contribution to inventory control systems (in particular, how to set safety stocks). Wagner (1962) published a research monograph under the Operations Research Society’s sponsorship that introduced statistical issues of importance to senior management. The research theme focuses on how corporate management can use aggregate economic measures to ascertain whether lower echelons of a supply chain are adhering to automated inventory replenishment logic. Recognizing statistical uncertainty is essential in assessing the aggregate measures. By the mid-1960s, inventory modeling was technically sophisticated. The easy (and some not-so-easy) wins were already won. Most of the restrictive assumptions used in the prior 10 years had been relaxed at least in exploratory research. In the decades to follow, the technical horizons continued to expand, although the managerial scope of the theory remained much the same. Inventory theory analyses rarely merged with strategic management deliberations. Analytic inventory models usually take as given what strategic thinking views as choices. For example, product line breadth, a strategic choice but implicitly given in inventory modeling, influences demand for stocked items.

3. SHOW ME THE DATA By the early 1950s, the theory and practice of inventory control faced a fundamental issue: The specification of demand and lead-time processes cannot be done with much precision. It was presumed, however, that once it was possible to obtain timely historic data on a continuing basis, sophisticated replenishment formulas could be easily applied by using appropriate statistical methods. Brown’s Statistical Forecasting for Inventory Control (1959) was groundbreaking. He suggested specific statistical methods to use, particularly the approach that he named as exponential smoothing. (In 1956, Brown presented his ideas at an ORSA conference, and in 1957, Holt wrote an Office of Naval Research report discussing exponential weighted moving averages.) In his book, Brown illustrates the calculations with hand-drawn worksheets. The book appears to encompass both manual systems and automated calculations; clearly, this was a transitional moment policy that is applicable (possibly) over an unbounded horizon. An OR model can be implemented in practice by using repeatedly updated statistical estimates of the parameter values in an assumed demand probability distribution. In the discussion that follows, I designate the approach of using point forecasts and observed variation in forecast error by PFErr. I designate the alternative approach of using an OR inventory model populated with statistically estimated parameters of an assumed demand distribution by OREst. (You will find it helpful to jot down the definitions of these two ad hoc abbreviations.) What we want to explore further is whether one of the two approaches is significantly more effective than the other in practice. It is unfortunate that the term forecast has been identified with the data analysis suggested by Brown and others. To a manager, a forecast refers to the demand value that actually will be observed; since it is hard to perfectly anticipate future sales, a point forecast is not to be believed (unless one has a crystal ball). The output of exponential smoothing and similar calculations is an estimate of mean future demand, which is a concept—the mean itself is never actually observed. Likewise, the term forecast error is interpreted ex post by managers as a mistake, possibly a misjudgment, whereas it is only one observation from a distribution of forecast errors. The distribution purportedly reveals inherent uncertainty about future demand and can be used to determine a value for safety stock. Academic readers may and this niggling over terminology only mildly amusing, but the misunderstanding of this terminology in common use has been the downfall of many practitioners. Managers do not grasp what they are going to get. Brown’s point of view is reflected in today’s commercial supply-chain software packages: Historical data are utilized to make point forecasts of future demand. Further, uncertainty in future demand is formulated by an estimate of the standard deviation of forecast error; lead-time uncertainty is finessed by some other approximation (I will explain further below). Practitioners bought into the statistical point of view right away, whereas most inventory theorists gave it short shrift. The chapter organization in the book SPP3 lends support for the preceding discussion. Most of SPP3 is devoted to mathematical models, that is, the results of some 1,600 theorists; the cited work provides decision-support tools in supply chains. With few exceptions, these models assume that the demand and lead-time uncertainties are described completely by known (or postulated) probability distributions with precisely specified parameter values. SPP3 makes it quite clear in a single long and comprehensive chapter devoted to forecasting that a supply chain is often comprised of nonstationary uncertainties. Nowhere is there an explicit consideration of the impact of statistical noise (from having only finite data from a nonstationary environment) on the performance of the mathematical models. Yet dealing with this impact is one of the major challenges facing a practitioner. with respect to computing power. Brown points out that exponential smoothing does not require keeping long files of prior demands, viewed as an advantage at that time but irrelevant, if not legally impossible, today. Brown expected his readers to know the fundamentals of inventory control. It is not easy, however, to spot any fully developed mathematical replenishment models in Brown’s book. But Brown does set forth a stock-replenishment point of view by posing two central questions in automated inventory control (a) Is it now time to replenish inventory? (b) What should be the order quantity? His approach to these questions is rooted in the requirements of practice and not in the underpinnings of inventory theory. This is evident from the way the questions are posed—the operative word in (a) is now. We review the implied architecture below. To make the replenishment-order decision (a) above, assume that replenishment review occurs at the start of each period. Practical considerations rise when we attempt to ascertain a SKU’s current stock-status level, defined as inventory on hand plus inventory due in, less backlog. Frequently, the stock-status level is hard to calculate, given the organization of corporate MIS systems—typically the components of stock status are in different transaction systems. We determine whether the stock-status level gives enough service protection for demand over lead time plus a review period, taking into account uncertainty in demand and perhaps lead time. We order now if the stock-status level does not provide enough service protection; else, we wait for another period and test again. One of two situations arises. If bona ?de probability distributions for the uncertain quantities are available, the above determination about service can be made using a mathematical inventory model. We directly apply a formula to calculate s (reorder point). But if there is only historic data about these quantities, then we use the data to forecast demand over lead time plus a review period, and prospectively take account of accompanying forecast error so as to hedge against demand and lead-time uncertainties. The hedge often is called safety stock, and how to calculate it is the challenge. A similar dichotomy occurs in answering the replenishment-quantity decision (b) above. If the relevant probability distributions are known, we calculate S (the order-up-to point), and order enough to bring the stock status level to S. But if there is only historic data, the alternative is to forecast what will be sold over a span of time that is expected to elapse until the next replenishment, possibly adjusting the amount upward if the stock-status level is exceptionally low. A statistical approach is a far cry from operations research logic, such as that underlying a classic s S policy. You can spot the difference immediately by observing that a statistical approach produces two numbers based on history as of now, whereas an OR model produces a reorder.

4. IT’S THE DATA’S FAULT With the passage of 50 years, one thing is true by now. There is no reason to complain anymore about scarcity of data. All businesses in the United States have loads of sales data. Usually this data is accessible for at least two years back, and often further back than that (although the earlier data may not be online). The agony that is felt today reflects that the demand data are dirty. This is not quite the same as what statisticians call missing observations and errors of measurement, although dirty data may include both. Here are some typical examples of data challenges. ? Demand is not level across a year, so a twelve-week running average of demand fluctuates considerably. ? A SKU is new and being phased in, or is old, obsolete, uncompetitive and being phased out; in either case, the span of historical data is limited. ? A SKU was new last year and introduced without enough supply to meet demand, so last year’s sales understate potential demand. ? A supplier failed to ship orders and consequently the SKU was out of stock, so historic sales understate demand for a span and overstate demand in the weeks immediately after. ? A SKU received special promotions (maybe under a different SKU number, and with a different set of promotion dates, depending on the physical location of demand), so weekly sales data show spikes and subsequent valleys, and these differ by location. ? Several times last year, demand was unusually large due to the unexpected orders of a few large customers, so the mean and standard deviation of historic demand may not be appropriate for inventory control. ? The SKU’s product specifications changed (color, flavor, package size, quality, etc.), so there is discontinuity in the pattern of historic demand. ? Customers’ returns are netted out of a current week’s sales, so raw sales may overstate demand in one time period and understate it later. ? Some holidays fall on different days and weeks from one year to the next, so sales data have to be adjusted from one year to the next to obtain comparability. ? Weather impacts demand, so historic sales re?ect special conditions that may not recur. ? A competitor introduced a new product that altered market share, so historic demand declined—the size of the reduction varies, re?ecting the number of weeks since the new product introduction. ? A new SKU was introduced and thereby cannibalized the sales of several other SKUs. This list is long enough to convey why the phrase garbage-in-garbage-out is often used in discussions about demand forecasting and the distribution of forecast errors, and why automated replenishment systems commonly under-perform relative to manager’s expectations. The preceding examples give rise to distinct classes of data problems. Some of the issues cited are solvable by / 221 suturing data streams; this involves identifying holidays, promotional weeks, and other special events. But other data issues are fundamental and reflect business practices—an important example is when a SKU’s life span is less than two years. Some companies have invested in computer systems that assist in cleaning up dirty data. (These are almost never off-the-shelf software packages; the system requires considerable hands-on guidance.) Consider a company that has an effective MIS system so that the cleaned sales data reasonably state what the relevant historic demand is for each SKU. This setting makes it easier to compare alternative architectures for automated inventory control. Note in passing that systems for cleaning historic data rarely clean previous forecasts. Hence, automated replenishment logic based on the distribution of historical forecast error is problematic when data cleaning is implemented. This is another reason to be circumspect about adopting an automated replenishment architecture that relies on forecast errors. It is unlikely that in practice there is much difference between the effectiveness of PFErr and OREst in deciding what should be the order quantity? Here is the reason. Suppose that the real situation is so simple that the stationary EOQ formula applies. Then early research established that EOQ is often a good approximation to the optimal value of S  s in an s S model. Further, the actual value of the objective function is insensitive to an order quantity that differs considerably from EOQ. In most real situations, however, there are other factors that influence the order quantity, such as pack size, discount pricing, transportation minimums and maximums, etc. The classic lot-size model is formulated to characterize the order decision as a quantity of goods. An equivalent alternative is to characterize the decision as a time interval over which the imminent order should last, such as weeks of coverage. This perspective yields an EOI (economic order interval). Point forecasts of future demand over EOI can be effective and integrated equally as well into a PFErr or OREst system. Note that when the EOI is a single period, the order-quantity decision comes down to how close the replenishment quantity should be to what was sold last period. An analytic challenge with respect to order quantity is recognizing in advance when to make the final order for a SKU, and then choosing how much. It always is clearer in retrospect which order is responsible for unsalable leftover stock. Our comparison of PFErr and OREst systems rests primarily then on how to determine is it now time to replenish inventory? Given the current stock-status level, the question is whether postponing a reorder for at least another period gives acceptable customer service over the interval from now until an order placed next period would be delivered. This is a prospective assessment and essentially uses probabilistic thinking. PFErr and OREst differ essentially comprised of statistical estimates of mean demand, mean lead time, variance of forecast error, and variance of lead time. Typically the statistical estimate for the variance of lead time is seriously in error. Commercially available supply-chain software that purports to do these calculations sometimes documents the algorithms. But often the software architecture makes it practically impossible for users to verify the underlying logic by testing numeric examples. In simple terms, these systems operate in real time and do not easily accommodate what if explorations. Consequently, when unintuitive forecasts and resupply recommendations occur, the user must take the results on faith. Suppose that the PFErr system is technically sound. The final step above produces a fraction—for illustrative purposes, assume that the number is .93, which represents 93% service over the relevant time slice. If .93 is below management’s targeted service level for this time span, a replenishment occurs. Otherwise it does not. In any case, the calculations are repeated with updated forecasts and stock status at the start of the next period. Practitioners naively assume that .93 (in our illustration) is an accurate evaluation of expected service. Rarely, if ever, does such a value give a sufficiently accurate evaluation, and typically it overstates service performance noticeably. No wonder many companies are dissatisfied with the service performance of automatic replenishment software. They expect to get high service at an acceptable level of inventory investment, and actual service falls short of expectation. To conserve space, I omit a review of all the underlying assumptions in PFErr that in practice are seriously violated and contribute to the illusion that a replenishment system will actually obtain a service level close to what is targeted. The main point in this discussion is that PFErr does not deliver what it promises. OREst has comparable problems. When OREst employs historic data to estimate the mean and standard deviation of a postulated demand distribution, the calculated service measure may well be an overestimate due to the postulated distribution being inappropriate and the parameter estimates from data being inaccurate. The underlying mathematical reason is that the statistical error in an OR inventory model impacts service asymmetrically and nonlinearly; in other words, the improvement in service from an overestimate of a SKU’s mean or variability of demand is not the same magnitude as the degradation in service from an underestimate by the same amount. In realistic settings the amount of historic demand information is not sufficient to provide much confidence that the shape of the demand distribution (which is unknown) is well represented with respect to that part of the assumed distribution that is most critical, the right tail. The above discussion brings us to a realization about our objective of answering is PFErr or OREst significantly more effective in practice? We must posit assumptions about the demand process that gives rise to the historic data. in detail, if not concept, as to how to calculate the expected service level associated with not ordering now. PFErr is implemented in today’s supply-chain software systems as follows. ? First, stock-status level (or some analogous measure of inventory) is converted into an ordinate of a standardized normal distribution by subtracting off the point forecast for demand over lead time plus a review period and dividing by the variability of historic forecast error (usually, the standard deviation of past forecast errors, properly scaled using a square root calculation to provide an interval of lead time plus a review period). A serious complication arises when lead time is uncertain. We discuss this issue later. ? Second, a standardized unit normal loss function is referenced to get the expected lost demand associated with the ordinate. ? Third, this standardized expected loss is scaled up to original demand units to provide expected lost demand over lead time plus a review period. ? Finally, expected service is measured as the ratio of expected lost demand to the comparable point forecast of demand. An alternative articulation of this process works backward from a target service level to an implied critical ordinate of the standardized unit normal loss function, and finally to a reorder level. Looked at this way, all SKUs that have the same target service level have the same critical ordinate. That common ordinate value is a consequence of assuming that forecast errors for a SKU are independently and normally distributed, so that a single inverse function applies. Missing from this algorithm is a correction that may be required due to forecast bias. Although mean forecast error need not be close to zero, mean square error is rarely used in practice. If lead time is uncertain, the above calculations for variability become more complicated (and often more dif?cult to implement because historic data for estimating lead-time variation is limited). If expected lead time is two weeks, but it turns out that an order does not arrive until four weeks, some customer demand in the third and fourth weeks may not be satisfied from inventory on hand. Whereas the demand distribution for a given lead time may be viewed as concentrated around a single modal value, adding uncertainty in lead time can deform the one period demand distribution into a composite distribution with several modal values corresponding to a discrete number of weeks of lead time. If targeted service level is high, then safety stock has to cover the rightmost possible values for demand. The composite distribution need not be well approximated by a normal distribution. You and in practice two different approaches to accommodating uncertain lead time. One assumes that lead time is fixed at a value safely above its mean (such as the 90th percentile). This approach impacts the forecast demand over lead time, but does not affect the value expressing demand forecast uncertainty. The other uses a standard formula.  We seem to have a chicken-or-the-egg conundrum. This logic puzzle is a familiar one to statisticians, who examine analogous issues in data-driven analyses. When is it prudent to assume an underlying normal distribution when making a decision based on limited data? Determining the appropriate value for safety stock is a serious challenge for practitioners because in today’s business environment, management wants its supply chain to perform at an extremely high level of customer service. At such high service levels, this determination is the hardest to make—the accuracy of data-driven estimates is poor.

5. SIMULATE TO CALIBRATE There are enlightened practitioners who do have a way of dealing with this issue. I designate this process as model calibration. These practitioners apply what is called retrospective simulation to a collection (possibly a stratified random sample) of SKUs. They utilize historic data rather than estimated distributions in a simulation of SKU replenishment, where the target service level is made identical for all SKUs in the collection. This yields a simulated measure of service, and is compared to the target service level that is used in the PFErr or implied by the OREst formula. As an illustration, a data-driven simulation that targets .98 service overall may result in obtaining .93 service overall from the retrospective simulations. In a subsequent step, practitioners adjust the targeted service amount to get the desired actual service. It is now feasible to perform this calibration process using spreadsheet software on a fast PC. The consequence of avoiding an assumed probability distribution for demand, such as a normal distribution, becomes clear once a retrospective simulation is performed. A mathematical distribution like the normal has a positive probability of exceeding any given value for the ordinate. Hence, no matter what value is used for safety stock, there is a chance that some demand goes unfilled. In contrast, in a retrospective simulation using finite historic data, there is a value of safety stock for each particular SKU that gives 100% service. As a consequence, as the target service level is raised in the calibration process, more and more SKUs reach 100% service. But other SKUs may not be close to the desired service level, and therefore the overall service level may not be sufficiently high. Raising the target level further provides no improvement for those SKUs already at 100% service, but adds to their inventory. The irony is that all unfilled demand in a retrospective simulation was actually filled historically, since these data represent actual sales, not potential demand. Consider what happens when calibration becomes an integral part of the systems design process. The calibration mechanics enable a practitioner to ascertain trade-offs between simulated service overall, inventory investment overall, as well as other ancillary measures. Hence, using calibration, it is possible to make direct comparisons of alternative replenishment approaches, such as PFErr and / 223 OREst, by looking at an aggregation of results from simulating a multitude of SKUs. The calibration process is a cross-sectional variation on what statisticians call bootstrap analysis. The same replenishment architecture (for example, PFErr) and target service level are used for a sample of SKUs, where each SKU has its own historic demand. Calibration is best practice in system design, but it is not standard practice by any means, and not easily accomplished with today’s supply-chain software. For that reason, the typical PFErr implementation of an automatic inventory replenishment system is fraught with worse-than expected outcomes, and consequent overrides of the underlying replenishment logic.

6. RESEARCH OPPORTUNITIES Unlike areas of OR where current research is technically (mathematically) far ahead of practice, current inventory research investigates issues of central importance to practice. Therefore, to the extent that such research reveals important insights about supply-chain architecture, the research expands knowledge in a way that deepens understanding of how these processes behave. Further, the enhanced understanding builds on the established base of prior knowledge. Recognizing this, it also must be said that incremental mathematical research is not likely to enhance practice. The reason is that mathematical inventory research is blind to all the data issues discussed above and far removed from the entrenched software that now drives supply-chain systems. The references at the end of this article contain a few examples of research efforts that have dealt with statistical issues. For the most part, these studies assume that demand occurs in a stationary environment, and that lead time is a known value. I am aware of other relevant articles with less restrictive assumptions, but I have not included them in the references. It seems promising to explore a new research avenue that takes as its starting point a finite history of demand data for a family of SKUs. Unlike the publications of 40 to 50 years ago, the new research should assume plentiful, recognizably dirty, demand data. Lead-time uncertainty needs to be examined from a data perspective as well. The objective should be stocking and replenishment logic that is driven by such data, and operating in the context of a supply chain. The overall research objective is to and sufficiently robust approaches that produce reliable trade-off assessments between service and other aspects of inventory management. The theory should rest on the analysis of a family of SKUs, although the replenishment strategy itself may be single-item focused. The family of SKUs should be defined in a way that its members are recognizable by reference, at least, to historic demand over a finite history. The theory should encompass data-oriented tests as to when the actual replenishment rules may be reliably applied. The theory should provide measures of accuracy statistics problem growing out of the research program of RAND’s Logistics Department. The central theme of the thesis was the design of top-level managerial systems for controlling lower echelons in a supply chain. The research applied statistical principles in fashioning a control system comprised of s S inventory policies. Both Ken and Bob gave generously of their time in challenging my thinking about these issues. In 1962 the thesis was published in the Operations Research Society’s research monograph series. I never had the opportunity to collaborate with William Cooper at Carnegie. But Bill’s dedication to doctoral students was already legendary before the 1960s, and Bill provided a role model for me early on: I am forever grateful. Robert Fetter, who was on the Sloan School faculty when I was studying at MIT, was interested in practical applications of operations research, and we had many lengthy and wide-ranging conversations about advancements in operations research. It was Bob who introduced me to McKinsey & Co. while I was on the Stanford faculty. Then in 1967, Bob became a close colleague again when I joined him on the faculty at Yale. My working relationship with McKinsey & Company began in 1960 in San Francisco. Shortly thereafter, David Hertz became a partner at McKinsey in New York. I first met David when he served as editor of the Operations Research Society research monograph series. David kindly tutored me on the skills that are required to take university research and make it relevant to business. Over four decades, I have had innumerable opportunities to participate with McKinsey client service teams in designing and implementing what is now called supply chain logistics systems. In the early 1960s it already was evident that software packages developed by leading computer companies such as IBM were going to be critically important in getting businesses to adopt operations research inventory models. Whereas commercial applications of mathematical programming algorithms seemed, from the 1960s onward, to develop in concert with advances in computing technology, inventory modeling did not fare so well. This article takes a retrospective look as to why. ACKNOWLEDGMENTS I am grateful for suggestions made by my colleagues Jack Evans, Geraldo Ferrer, Kyle Catanni, Ann Marucheck, Jayashanker Swaminathan, and Clay Whybark, and the editors Lawrence Wein and Frederic Murphy. with respect to estimates of future system wide behavior comparable to measures of estimation error commonplace in standard statistics. The theory should recognize that most demand environments change over time, usually gradually but sometimes abruptly. It should assume that on occasion human intervention is called for, and the theory should assist in signaling when such occasions are warranted. Finally, it should take notice of rules of thumb that may, at ?rst glance, seem too simple (such as replenishment rules based on weeks of supply, or kth largest prior week of demand), but that may function well in a nonstationary environment that contains both demand and lead-time uncertainties.

PERSONAL RECOLLECTIONS My career interest in inventory and statistical processes was fostered in the beginning by eight scholars, some being mentors, some colleagues, a few both, and all good friends. The pivotal circumstances occurred in four places, Stanford University, RAND Corporation, Massachusetts Institute of Technology, and McKinsey & Co. Kenneth Arrow and Gerald Lieberman at Stanford wisely counseled me as an undergraduate during the early 1950s about exciting career opportunities in the emerging field of operations research. They opened my eyes about studying economics and mathematical statistics. Until then I never realized that applied mathematics could provide a livelihood. Jerry guided me in learning about important business applications of statistical analyses. Ken assisted me in spending summers at RAND, starting in 1953, and thereby first sparked my interest in inventory modeling. In 1957, I became a junior colleague of Jerry’s in the Department of Industrial Engineering at Stanford. I had an office in Serra House, in proximity to Ken and his preeminent co-authors. They were actively collaborating on inventory theory, and writing papers that comprised several Stanford University Press research monographs. In the early l960s, Arthur (Pete) Veinott, Jr. also joined the faculty in Stanford’s Department of Industrial Engineering. Pete and I shared a deep interest in inventory and production algorithms, resulting in a dialog that lasted for years. Along the way, we developed a practical algorithm to compute s S policies for discrete probability distributions. Pete’s broad perspective on the issues central to inventory and production modeling was inspiring. In 1955–1957, while doing doctoral studies in mathematical economics at MIT, I was fortunate to be assigned as a research assistant to Thomson Whitin, then on the faculty of the Sloan School of Management. Tom was preparing the second edition of his monograph on inventory processes, and posed the core problem that led to our collaboration on the dynamic economic lot-size model. Robert Solow, in guiding my doctoral studies, counseled me as to the importance of broadly examining alternative analytic perspectives in researching a scholarly area. I selected for my MIT doctoral thesis topic an applied REFERENCES Arrow, K. J., T. Harris, J. Marschak. 1951. Optimal inventory policy. Econometrica 19 250–272. , S. Karlin, H. Scarf. 1958. Studies in the Mathematical Theory of Inventory and Production. Stanford University Press, Stanford, CA. Bellman, R. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Wagner,  I. Glicksberg, O. Gross. 1955. On the optimal inventory equation. Management Sci. 2 83–104. Blazer, D. 1983. Testing the cost effectiveness of an inventory filtering rule using empirical data. Ph.D. thesis, School of Business Administration. University of North Carolina at Chapel Hill, Chapel Hill, NC. Bowman, E. H. 1956. Production scheduling by the transportation method of linear programming. Oper. Res. 100–103. Brown, R. G. 1959. Statistical Forecasting for Inventory Control. McGraw Hill, New York. Clark, A. J., H. E. Scarf. 1960. Optimal policies for a multiechelon inventory problem. Management Sci. 6 475–490. Dvoretzky, A., J. Kiefer, J. Wolfowitz. 1952a. The inventory problem. I: Case of known distributions of demand. Econometrica 20 187–222. , , J. Wolfowitz. 1952b. The inventory problem. II: Case of unknown distributions of demand. Econometrica 20 450– 466. , , . 1953. On the optimal character of s S policy in inventory theory. Econometrica 20 586–596. Ehrhardt, R. 1979. The power approximation for computing (s S) inventory policies. Management Sci. 25 777–786. , C. Mosier. 1979. A revision of the power approximation for computing (s S) inventory policies. Management Sci. 30 618–622. , H. M. Wagner.  1982. Inventory models and practice. H. J. Greenberg, F. H. Murphy, S. H. Shaw, eds. Advanced Techniques in the Practice of Operations Research. North Holland, New York, 251–332. Ehrhardt, R., C. R. Schulz, H.M. Wagner 1980. (s S) Policies for a Wholesale Inventory System. L. Schwarz, ed. Multi-Level Production/Inventory Systems: Theory and Practice. North Holland Publishing, New York, 141–161 Estey, A. S., R. Kaufman. 1975. Multi-item inventory system policies using statistical estimates: Negative binomial demand. Working paper, Yale University, New Haven, CT. Forrester, J. W. 1958. Industrial dynamics—A major break though for decision makers. Understanding the forces causing industrial fluctuations, growth, and decline. Harvard Bus. Rev. 36 (July–August) 37–66. . 1961. Industrial Dynamics. The MIT Press, Cambridge, MA. Hayes, R. H. 1969. Statistical estimation problems in inventory control. Management Sci. 15 686–701. . 1971. Efficiency of simple order statistic estimates when losses are piecewise linear. J. Amer. Statist. Assoc. 66 127–135. Holt, C. C., F. Modigliani, J. F. Muth. 1956. Derivation of a linear decision rule for production and employment. Management Sci. 2 159–177. , , , H. A. Simon. 1960. Planning, Production, Inventories, and Work Force. Prentice Hall, Englewood Cliffs, NJ. Jacobs, R. A., H. M. Wagner.  1989a. Reducing inventory system costs by using robust demand estimators. Management Sci. 35 771–787. , . 1989b. Lowering inventory system costs by using regression-derived estimators of demand variability. Decision Sci. 20 558–574. Johnson, S. M. 1957. Sequential production planning over time at minimum cost. Management Sci. 3 435–437. / 225 Karlin, S. 1960. Dynamic inventory policy with varying stochastic demands. Management Sci. 6 231–258. Kaufman, R. L. 1977. (s S) inventory policies in a nonstationary demand environment. Working paper, University of North Carolina at Chapel Hill, Chapel Hill, NC. , J. Klincewicz. 1976. Multi-item inventory system policies using statistical estimates: Sporadic demand. Working paper, Yale University, New Haven, CT. Klincewicz, J. G. 1976a. Biased variance estimators for statistical inventory policies. Yale University, New Haven, CT. . 1976b. Inventory control using statistical estimates: The power approximation and sporadic demands. Yale University, New Haven, CT. . 1976c. The power approximation: Control of multi-item inventory systems with constant standard-deviation-to-mean ratio for demand. Yale University, New Haven, CT. Laderman, J., S. B. Littauer, L. Weiss. 1953. The inventory problem. J. Amer. Statist. Assoc. 48 717–732. MacCormick, A. 1978. Predicting the cost performance of inventory control systems by retrospective simulation. Naval Res. Logist. Quart. 25 605–620. , A.S. Estey, R. L. Kaufman 1977. Inventory control with statistical demand information. Working paper, University of North Carolina at Chapel Hill, Chapel Hill, NC. Manne, A. S. 1958. Programming of economic lot sizes. Management Sci. 4 115–135. Modigliani, F., F. Hohn. 1955. Production planning over time and the nature of the expectation and planning horizon. Econometrica 23 46–66. Morse, P. 1958. Queues, Inventories, and Maintenance. John Wiley & Sons, New York. Peterson, D. K. 1987. The (s S) inventory model under low demand. Ph.D. thesis, School of Business Administration. University of North Carolina at Chapel Hill, Chapel Hill, NC. Roberts, D. M. 1962. Approximations to optimal policies in a dynamic inventory model. K. J. Arrow, S. Karlin, H. E. Scarf, eds. Studies in Applied Probability and Management Science. Stanford University Press, Stanford, CA, 196–202. Scarf, H. 1959. Bayes solutions of the statistical inventory problem. Ann. Math. Statist. 30 490–508. . 1960. The optimality of (S s) policies in the dynamic inventory problem. K. Arrow, S. Karlin, P. Suppes, eds. Mathematical Methods in the Social Sciences. Stanford University Press, Stanford, CA. Schulz, C. R. 1980. Wholesale warehouse inventory control with statistical demand information. Ph.D. thesis, Operations Research and Systems Analysis, University of North Carolina at Chapel Hill, Chapel Hill, NC. , R. Ehrhardt, A. MacCormick. 1977. Forecasting operating characteristics of (s S) inventory systems. Working paper, School of Business Administration. University of North Carolina at Chapel Hill, Chapel Hill, NC. Silver, E. A., D. F. Pyke, R. Peterson. 1998. Inventory Management and Production Planning and Scheduling, 3rd ed. John Wiley & Sons, New York. Simon, H. A. 1952. On the application of servomechanism theory in the study of production control. Econometrica 20 247– 268. . 1956. Dynamic programming under uncertainty with a quadratic criterion function Econometrica 24 74–81. Veinott, A., Jr. 1963. Optimal stockage policies with nonstationary stochastic demands. H. Scarf, D. Gilford, M. Shelly eds. Multistage Inventory Models and Techniques, Chapter 4. Stanford University Press, Stanford, CA. . 1965. Optimal policy for a multi-product, dynamic, nonstationary inventory problem. Management Sci. 12 206–222. . 1966. The status of mathematical inventory theory. Management Sci. 12 745–777. , H. M. Wagner.  1965. Computing optimal (s S) inventory policies. Management Sci. 11 525–552.  Wagner, H. M. 1962. Statistical Management of Inventory Systems. John Wiley & Sons, New York. . 1969. Principles of Operations Research. Prentice Hall, Englewood Cliffs, NJ. , T. M. Whitin. 1957. Dynamic problems in the theory of the firm. The Theory of Inventory Management. Princeton University Press, Princeton, NJ, 299–327. , . 1958. Dynamic version of the economic lot size model. Management Sci. 5 89–96.