Inproceedings,

Automatic Failure Diagnosis Support in Distributed Large-Scale Software Systems based on Timing Behavior Anomaly Correlation

N. Marwede, M. Rohr, A. van Hoorn, and W. Hasselbring.
Proceedings of the 13th European Conference on Software Maintenance and Reengineering (CSMR~'09), page 47--57. IEEE, (March 2009)
DOI: 10.1109/CSMR.2009.15

Abstract

Manual failure diagnosis in large-scale software systems is time-consuming and error-prone. Automatic failure diagnosis support mechanisms canpotentially narrow down, or even localize faults within a very short time which both helps to preserve system availability. A large class of automatic failure diagnosis approaches consists of two steps: 1) computation of component anomaly scores; 2) global correlation of the anomaly scores for fault localization. In this paper, we present an architecture-centric approach for the second step. In our approach, component anomaly scores are correlated based on architectural dependency graphs of the software system and a rule set to address error propagation. Moreover, the results are graphically visualized in order to support fault localization and to enhance maintainability. The visualization combines architectural diagrams automatically derived from monitoring data with failure diagnosis results. In a case study, the approach is applied to a distributed sample Web application which is subject to fault injection.

BibTeX key: MarwedeRohrHoornHasselbring2009AutomaticFailureDiagnosisInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation
entry type: inproceedings
booktitle: Proceedings of the 13th European Conference on Software Maintenance and Reengineering (CSMR~'09)
year: 2009
month: mar
pages: 47--57
publisher: IEEE
location: March 24--27, 2009, Kaiserslautern, Germany
xeditor: Andreas Winter and Rudolf Ferenc and Jens Knodel
file: MarwedeRohrHoornHasselbring2009AutomaticFailureDiagnosisInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation-cameraReadysubmission-stamped-finalPageNumbers.pdf:avanhoorn/MarwedeRohrHoornHasselbring2009AutomaticFailureDiagnosisInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation-cameraReadysubmission-stamped-finalPageNumbers.pdf:PDF
isbn: 978-0-7695-3589-0
DOI: 10.1109/CSMR.2009.15

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{MarwedeRohrHoornHasselbring2009AutomaticFailureDiagnosisInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation, abstract = {Manual failure diagnosis in large-scale software systems is time-consuming and error-prone. Automatic failure diagnosis support mechanisms canpotentially narrow down, or even localize faults within a very short time which both helps to preserve system availability. A large class of automatic failure diagnosis approaches consists of two steps: 1) computation of component anomaly scores; 2) global correlation of the anomaly scores for fault localization. In this paper, we present an architecture-centric approach for the second step. In our approach, component anomaly scores are correlated based on architectural dependency graphs of the software system and a rule set to address error propagation. Moreover, the results are graphically visualized in order to support fault localization and to enhance maintainability. The visualization combines architectural diagrams automatically derived from monitoring data with failure diagnosis results. In a case study, the approach is applied to a distributed sample Web application which is subject to fault injection.}, added-at = {2018-02-14T17:55:46.000+0100}, author = {Marwede, Nina S. and Rohr, Matthias and van Hoorn, Andr\'{e} and Hasselbring, Wilhelm}, biburl = {https://puma.ub.uni-stuttgart.de/bibtex/209930a7771dc84417bbc02e4b3d7424c/andrevanhoorn}, booktitle = {Proceedings of the 13th European Conference on Software Maintenance and Reengineering (CSMR~'09)}, doi = {10.1109/CSMR.2009.15}, file = {MarwedeRohrHoornHasselbring2009AutomaticFailureDiagnosisInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation-cameraReadysubmission-stamped-finalPageNumbers.pdf:avanhoorn/MarwedeRohrHoornHasselbring2009AutomaticFailureDiagnosisInDistributedLargeScaleSoftwareSystemsBasedOnTimingBehaviorAnomalyCorrelation-cameraReadysubmission-stamped-finalPageNumbers.pdf:PDF}, interhash = {4288de3dbbfc8f80a166b227d81f4af7}, intrahash = {09930a7771dc84417bbc02e4b3d7424c}, isbn = {978-0-7695-3589-0}, keywords = {anomaly component dependability, dependency detection, diagnosis, failure fault faults, graphs localization, software}, location = {March 24--27, 2009, Kaiserslautern, Germany}, month = mar, pages = {47--57}, publisher = {IEEE}, timestamp = {2020-02-27T22:31:36.000+0100}, title = {Automatic Failure Diagnosis Support in Distributed Large-Scale Software Systems based on Timing Behavior Anomaly Correlation}, xeditor = {Andreas Winter and Rudolf Ferenc and Jens Knodel}, year = 2009 }

PUMA

Automatic Failure Diagnosis Support in Distributed Large-Scale Software Systems based on Timing Behavior Anomaly Correlation

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on