International Journal of Performability Engineering, 2011, No 5

, No 5

Dependability and Performability

Performance and Dependability Modeling of Dynamic Systems

Export Citations
EndNote Reference Manager ProCite BibTeX RefWorks

Editorial

Select Editorial

KRISHNA B. MISRA

2011, 7(5): 401. doi:10.23940/ijpe.11.5.p401.mag

Abstract

Performance of components, subsystems, system and services has always been concern of system designers or developers. Various performance attributes have also been developed and defined historically as the requirements varied over time. Initially in the first half of 20^th Century, the engineers were satisfied if quality of a component was good. Later on since quality did not ensure performance over time, performance criterion of reliability was introduced. The post Second World War period saw trade off between reliability and maintainability and availability became an important attribute of maintained components and systems. With the release of Prof. Rasmussen’s WASH 1400 report in 1975, safety (in probabilistic terms) became a design parameter. The terms like survivability and dependability also figured in the literature. Later on, John Meyer and W. H. Sanders (Specification and Construction of Performability Models” appeared in the Second International Workshop on Performability Modeling of Computer and Communication Systems, June 28-30, 1993) introduced a term, “Performability”, dwelling upon the distinction between “object system or simply system” and “environment” resources; they claim is distinct from the terms already existing at that time for performance evaluation. As far the definition of dependability is concerned, John Meyer agrees that it is the system's ability to perform in respect to some agreed specification of desired performance, where a failure of the system occurs if the performance no longer complies with this specification and the attributes of dependability are defined according to the nature of failure occurrences and/or their consequences which include reliability (continuity of failure-free service), availability (readiness to serve), safety (avoidance of catastrophic failures), and security (prevention of failures due to unauthorized access and/or handling of information). He however felt that in order to address the problem of degradable performance, measure of performability is needed to simultaneously address issues of both performance and dependability. In fact performability and performance are interchangeably used.

It is interesting to note that The Institute for Telecommunications Services, a part of the U.S. Department of Commerce, provides an extensive glossary of telecommunications terms in Federal Standard 1037C [16]. In this glossary, the definition of survivability is given by: A property of a system, subsystem, equipment, process, or procedure that provides a defined degree of assurance that the named entity will continue to function during and after a natural or man-made disturbance. For a given application, survivability must be qualified by specifying the range of conditions over which the entity will survive the minimum acceptable level or post-disturbance functionality, and the maximum acceptable outage duration. John C. Knight and Kelvin S. Sullivan in their report (CS-TR-33-00, 2000 of Department of Computer Science, University of Virginia) have presented a definition of survivability and related it to the area of dependability and the technology of fault tolerance. They claimed that the specialized requirements of critical information systems require a new facet of dependability and that the survivability as we they have defined it is different from reliability, availability, safety, and so on. Similarly, M. S. Deutch and R.R. Willis in their book (Software Quality Engineering: A Total Technical and Management Approach, Englewood Cliffs, NJ: Prentice-Hall, 1988) define survivability in the context of software engineering as the degree to which essential functions are still available even though some part of the system is down. It appears that this definition is more relevant for the performance evaluation of fault tolerant computer systems.

The genesis of trouble with various performance attributes can be traced mainly to follow up work in the area of reliability, as there was no concerted effort to standardize the definitions of various terms in use in relation to performance evaluation.

If one takes the Webster definition of performability, it is the ability (this ability expressed in terms of probability just as in case of reliability) to perform under given conditions. Therefore, based on the key terms, “perform” and” ability”, performability can be interpreted as performance under given conditions, which may include reliability, safety, risk, human performance or even computer and communication system performance under given conditions. One must not forget that the given conditions could be normal, fault conditions, conditions existing with failures, abnormal environment conditions, and extreme environment conditions. It is in this more general context that the term “performability” is used in the journal which would not only take into consideration the entire gambit of performance attributes but includes the sustainability aspect of products and systems performance in 21^st Century perspective. In other words, it would represent the holistic performance criteria. However, at present the sustainability aspect of performance is not quantifiable in probabilistic terms but very soon, it might become possible. At that time, it will be possible to aggregate all attributes in some way to define overall design criterion in probabilistic terms.

Select Guest Editorial:Performance and Dependability Modeling of Dynamic Systems

SALVATORE DISTEFANO, ANTONIO PULIAFITO, and KISHOR S. TRIVEDI

2011, 7(5): 402-404. doi:10.23940/ijpe.11.5.p402.mag

Abstract

System modeling is a crucial and essential part of designing modern systems in order to predict systems’ behavior before their deployment. Performance and dependability are typical attributes of interest and their evaluation is considered as an art usually supported by powerful software tools and sophisticated mathematical methods. Service based architectures, Cloud computing, wireless and networked systems are only some of the types of systems, that very often need to be evaluated all together looking for performance and dependability indices and their relationship to SLAs and QoS parameters. The aim of this special issue is to present novel ideas, methods, algorithms, and software tools for in-depth studies of dynamic aspects of dependable fault-tolerant systems.

The papers selected for this special issue on dynamic systems dependability and performance assessment cover some of the important problems that are currently being addressed by the engineering community. Both theoretical and practical aspects have been dealt with by the special issue. Empirical/statistical and simulation/analytical techniques are covered in this issue. All papers submitted for the special issue were reviewed by the experts in the field. According to the referees’ comments and having verified the focus with the theme of the special issue, 8 best papers were selected that we believe well reflect the current research trends on the topic.

The first paper, A Note on Spare Parts and Logistic Optimization with Monte Carlo based System Models, by A. Dubi, S. Khoroshevsky and A. Doron, presents a novel approach to the problem of spare parts allocation optimization. The proposed approach is based on the hybrid Monte Carlo optimization method removing some drawbacks of such technique and guarantees the optimal solution.

The second paper, MDWNsolver: A Framework to Design and Solve Markov Decision Petri Nets, by M. Beccuti, G. Franceschinis and S. Haddad, proposes a new tool, named MDWNsolver, for solving two extensions of the Petri net formalism for high level specification of Markov Decision Processes (MDP): the Markov Decision Petri Net and the Markov Decision Well-formed Net. In order to reduce the complexity of the analysis, the solution engine uses efficient algorithms that take advantage of system symmetries.

The third paper, Markov Modeling Approach for Survivability Analysis of Cellular Networks, by V. Jindal, S. Dharmaraja and K. S. Trivedi, deals with survivability in cellular networks, using Markov chains and stochastic reward nets to compute measures such as call blocking probabilities and excess delay due to failures.

The fourth paper, Stochastic Petri nets with Low Variation Matrix Exponentially Distributed Firing Time, by P. Buchholz, A. Horvath and M. Telek, proposes a new approach for the solution of stochastic Petri nets based on matrix exponential distributions. In particular the authors demonstrate that all kinds of such distributions can be used like phase type distributions in stochastic Petri nets evaluation.

The fifth paper, Optimal Design of Heterogeneous Series-Parallel Systems with Common-Cause Failures, by P. Boddu, L. Xing and F. Azadivar, proposes a technique for evaluating and optimizing the reliability of systems by combining cold and hot standby redundancy to achieve a balance between fast recovery and power conservation in the optimal system design. The proposed method has no limitation on the type of time-to- failure distributions for the system components and also uses genetic algorithm for obtaining the optimal design configuration satisfying some system-level cost constraints.

The sixth paper, Qualitative and Quantitative Modeling of Reliability for Intelligent Water Distribution Networks, by J. Lin and S. Sedigh, proposes an agent-based qualitative model based on UML and a Markov decision process to evaluate the reliability of water distribution networks, a special case of cyber-physical systems. Such models successfully capture the dependence between cyber components and physical components and evaluate the intelligent water distribution networks as a whole. The reliability results are also compared with purely physical systems to support the advantage of cyber infrastructures.

The seventh paper, PARSY: Performance Aware Reconfiguration of Software Systems, by M. Marzolla and R. Mirandola, presents a framework for runtime performance aware reconfiguration of component-based software systems. The basic underlying idea is to selectively degrade and upgrade system components to guarantee that the overall response time does not exceed a predefined threshold. The main assumption is that the user is willing to accept a degraded service within a maximum response time. A monitoring component is used to trigger a reconfiguration whenever the measured response time exceeds the threshold, and a QN model is solved to estimate, at run-time, the response time of various reconfiguration scenarios. The obtained results are used to feed the optimization model whose solution gives the system configuration that maximizes the total utility, while keeping the response time below the threshold.

Lastly, the eighth paper, Dynamic Aspects and Behaviors in System Performance and Reliability Evaluation, by S. Distefano, A. Puliafito, K.S. Trivedi, deals with dynamic reliability aspects and behaviors in component-based systems. Such aspects and behaviors are identified and characterized and thence some possible modeling and solution techniques are discussed. The main goal of this paper is to highlight the importance of adequately evaluating the system reliability, while taking into account dynamic-dependent aspects and behaviors.

We would like to thank all the authors for the patience and cooperation exhibited and we are grateful to all the referees who gave their valuable time to review the papers promptly. We do hope that this special issue will be received enthusiastically by academia, researchers and engineers involved in performance and reliability engineering.

Last but not the least; we would like to thank Prof. Krishna B. Misra, Editor-in-Chief of the International Journal of Performability Engineering, for enthusiastically supporting us in this endeavor.

Salvatore Distefano is currently a post-doctoral researcher at the University of Messina, also collaborating with the Politecnico di Milano. His research interests include performance evaluation, parallel and distributed computing, software engineering, and reliability techniques. During his research activity, he has contributed in the development of several tools such as WebSPN, ArgoPerformance and GS3. He has been involved in several national and international research projects. He is author of more than 70 scientific publications in international journals and conferences. He is an Associate Editor of International Journal of Performability Engineering.

Antonio Puliafito is a full professor of computer engineering at the University of Messina, Italy. His interests include parallel and distributed systems, networking, wireless, and GRID and Cloud computing. He was a referee for the European Community for the projects of the Fourth, Fifth, Sixth, and Seventh Framework Program. He has contributed to the development of the software tools WebSPN, MAP, and ArgoPerformance. He is a coauthor (with R. Sahner and K.S. Trivedi) of the text Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package, Kluwer Academic Publishers, New York, USA, 1996.

Kishor S. Trivedi holds the Hudson Chair in the Department of Electrical and Computer Engineering at Duke University, Durham, NC. He has served as a Principal Investigator on various AFOSR, ARO, Burroughs, DARPA, Draper Lab, IBM, DEC, Alcatel, Telcordia, Motorola, NASA, NIH, ONR, NSWC, Boeing, Union Switch and Signals, NSF, and SPC funded projects and as a consultant to industry and research laboratories. He was an Editor of the IEEE Transactions on Computers from 1983-1987. He is on the editorial board of the IEEE Transactions on Dependable and Secure Systems. He is also on the Editorial Board of International Journal of Performability Engineering. He is a co-designer of HARP, SAVE, SHARPE, SPNP, and SREPT modeling packages. He is the author of a well known text entitled, Probability and Statistics with Reliability, Queuing and Computer Science Applications, John Wiley & Sons, New York, USA, 2001. He has published two other books entitled, Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package, published by Kluwer Academic Publishers, New York, USA, 1996 and Queuing Networks and Markov Chains, John Wiley, New York, USA, 2006. His interests include stochastic processes, Petri nets, queuing networks, performance /reliability /performability/ survivability analysis and software rejuvenation. He has published over 350 articles and lectured extensively on these topics. He has supervised 39 Ph.D. dissertations. He is a Fellow of the Institute of Electrical and Electronics Engineers. He is a Golden Core Member of IEEE Computer Society.

Original articles

Select A Note on Spare Parts and Logistic Optimization with Monte Carlo based System Models

ARIE DUBI, STANISLAV KHOROSHEVSKY, and AVINOAM DORON

2011, 7(5): 405-416. doi:10.23940/ijpe.11.5.p405.mag

Abstract

PDF (186KB)

due to strong impact on the system performance and significant amount of resources invested in procurement and management of the inventory each year. These systems almost always involve complex operational aspects which require the use of the Monte Carlo (MC) method in order to model and analyze them. However, while the MC method enables realistic and reliable models analysis, it may not be sufficient for performing spare parts allocation optimization since it requires a substantial computer effort. A new and novel approach to this problem is presented in this paper. It is based on a new theorem, referred to as the “logistic optimization theorem", and a hybrid MC/analytical approach and enables the construction of a new algorithm for a rigorous and practical optimization mechanism. The new method, which is explained in details, is verified and validated using a worked out example with a detailed comparison to the results achieved by other commonly used optimization methods.
Received on November 22, 2010 and revised on May 22, 2011
References: 30

Select MDWNsolver: A Framework to Design and Solve Markov Decision Petri Nets

MARCO BECCUTI, GIULIANA FRANCESCHINIS, and SERGE HADDAD

2011, 7(5): 417-428. doi:10.23940/ijpe.11.5.p417.mag

Abstract

PDF (760KB)

MDWNsolver is a framework for system modeling and optimization of performability measures based on Markov Decision Petri Net (MDPN) and Markov Decision Well-formed Net (MDWN) formalisms, two Petri Net extensions for high level specification of Markov Decision Processes (MDP). It is integrated in the GreatSPN suite which provides a GUI to design MDPN/MDWN models. From the analysis point of view, MDWNsolver uses efficient algorithms that take advantage of system symmetries, thus reducing the analysis complexity. In this paper the MDWNsolver framework features and architecture are presented, and some application examples are discussed.
Received on December 02, 2010 and revised on March 22, 2011
References: 16

Select Markov Modeling Approach for Survivability Analysis of Cellular Networks

VANEETA JINDAL, S DHARMARAJA, and KISHOR S TRIVEDI

2011, 7(5): 429-440. doi:10.23940/ijpe.11.5.p429.mag

Abstract

PDF (256KB)

Survivability is the capability of a system to fulfill its mission in a timely manner in the presence of failures, attacks and accidents. In this paper, quantitative assessment of survivability of cellular networks is conducted by developing an analytical model using Markov chains. A stochastic reward net model is then developed for the automated generation of CTMC and hence survivability metrics in terms of call blocking probabilities and excess delay due to failures are computed. Finally, numerical results are presented for the illustration of the proposed model.
Received on November 24, 2010 and revised on April 12, 2011
References: 16

Select Stochastic Petri Nets with Low Variation Matrix Exponentially Distributed Firing Time

P. BUCHHOLZ, A. HORVÁTHand, and M. TELEK

2011, 7(5): 441-454. doi:10.23940/ijpe.11.5.p441.mag

Abstract

PDF (417KB)

Matrix exponential (ME) distributions with low squared coefficient of variation (scv) are such that the density function becomes zero at some points in(0,∞). For such distributions there is no equivalent finite dimensional PH representation, which inhibits the application of existing methodologies for the numerical analysis of stochastic Petri nets (SPNs) with this kind of ME distributed firing time. To overcome the limitations of existing methodologies we apply the flow interpretation of ME distributions and study the transient and the stationary behaviour of stochastic Petri nets with ME distributed firing times via ordinary differential and linear equations, respectively. The main result of this study is a theory stating that all kinds of ME distributions can be used like phase type (PH) distributions in stochastic Petri nets and the numerical computation of transient or stationary measures is possible with methods similar to those used for Markov models.
Received on November 21, 2010 and revised on May 18, 2011
References: 11

Select Optimal Design of Heterogeneous Series-Parallel Systems with Common-Cause Failures

PRASHANTHI BODDU LIUDONG XING

2011, 7(5): 455-466. doi:10.23940/ijpe.11.5.p455.mag

Abstract

PDF (172KB)

The rapid advancements in science and technology aggravate the need for the optimal design of modern systems, aiming to achieve the maximum system reliability while meeting some resource constraints (e.g., cost, weight, etc). In this paper, we consider the system reliability optimization for series-parallel systems subject to common-cause failures (CCF). Unlike traditional approaches that consider either cold or hot standby redundancy for a parallel subsystem, our approach considers a combination of cold and hot standby redundancy within one subsystem to achieve a balance between fast recovery and power conservation in the optimal system design. A new problem formulation to incorporate CCF is proposed. For problem formulation, an analytical method combined with an existing combinatorial method is proposed to generate the objective function that evaluates the reliability of a subsystem with both hot and cold standby units and subject to CCF. The method has no limitation on the type of time-to-failure distributions for the system components. An optimization solution methodology based on genetic algorithm is presented for obtaining an optimal design configuration with maximum system reliability while satisfying some system-level cost constraint. The proposed methodology is tested on several example data sets and the corresponding reliability optimization results are presented.
Received on December 15, 2010 and revised on May 19,2011
References: 26

Select Reliability Modeling for Intelligent Water Distribution Networks

JING LIN SAHRA SEDIGH

2011, 7(5): 467-478. doi:10.23940/ijpe.11.5.p467.mag

Abstract

PDF (584KB)

In cyber-physical systems, communications, computing, and information about the system's requirements and operational state are used to streamline and fortify physical operations. Techniques exist for assessment, modeling, and simulation of physical and cyber infrastructures, respectively, but such isolated analysis is incapable of fully capturing the interdependencies that occur when they intertwine to create a CPS. Our research seeks to develop models that capture these interdependencies, while accurately reflecting the operation and attributes of the cyber and physical infrastructures. In this paper, we use intelligent water distribution networks (WDNs) as a model problem. We propose an agent-based qualitative model for a WDN, as well as a Markov decision process model, which is quantitatively analyzed using SHARPE. Our models facilitate comparison of the reliability of purely physical systems to their intelligent counterparts – an important task in light of the increasing reliance of critical infrastructure systems on computing and communication.
Received on December 02, 2010 and revised on April 17, 2011
References: 08

Select PARSY: Performance Aware Reconfiguration of Software Systems

MORENO MARZOLLA RAFFAELA MIRANDOLA

2011, 7(5): 479-492. doi:10.23940/ijpe.11.5.p479.mag

Abstract

PDF (258KB)

Dynamic reconfiguration of component-based systems is recognized as a viable way to meet quality requirements in a mutable operating environment. In this paper we consider the problem of maintaining the overall response time of a component-based system below a given threshold, as the system is subject to variable workload. We assume that the system components can be dynamically reconfigured to provide a degraded service with lower response time. Each component operating at one of the available quality levels is assigned a utility, with higher quality levels associated to higher utilities. The main contribution of this paper is an on-line algorithm for performance-aware reconfiguration of degradable software systems called PARSY (Performance Aware Reconfiguration of software SYstems). PARSY tunes individual components in order to maximize the system utility with the constraint of keeping the system response time below a specific threshold. PARSY drives the runtime dynamic reconfiguration step with the help of a Queuing Network performance model. Numerical experiments are used to illustrate the effectiveness of the proposed approach.
Received on December 12, 2010 and revised on March 09, 2011
References: 24

Select Dynamic Aspects and Behaviors in System Reliability Evaluation

SALVATORE DISTEFANO ANTONIO PULIAFITO KISHOR S. TRIVEDI

2011, 7(5): 493-498. doi:10.23940/ijpe.11.5.p493.mag

Abstract

PDF (83KB)

Reliability is one of the key attributes of dependability and quality of service. Techniques and tools for reliability assessment are therefore required in order to evaluate and to predict system behavior. In many contexts, merely taking into account of static system structure is not adequate and it is necessary to take into account dynamic aspects and behaviors. The focus of the paper is to first identify and characterize such dynamic aspects in reliability, and therefore to provide an overview of the technique that can adequately evaluate them.
Received on December 01, 2010 and revised on May 15, 2011
References: 19

Online ISSN 2993-8341
Print ISSN 0973-1318