Citybreak Online

Incident report


Incident report

Incident manager
Johanna Nordquist

Systems affected
CBIS API, Citybreak online, WebX, Sales agent

2025-01-14 17:18 – Update on about delays with product updates
2025-01-15 09:21 – Showstopper raised
2025-01-15 16:57 – Showstopper set as Resolved

Executive Summary
Our Elasticsearch cluster had become overfilled with uncancellable tasks leading to the inability to process new tasks and being unable to effectively clear out existing tasks. To mitigate this a full restart of the cluster was required – following this, the cluster returned to a healthy state where it had started processing tasks in a timely manner.

Description of the issue
Our customers using CBIS started to contact CST 10/1 saying that new products were not published. On the 14/1 even more customers raised that many products were not publish and we started to receiving issues with bookable products. 15/1 All customers using Stays were affected.

Actions taken

  • Removed data nodes from the cluster, disabled the service and restarted the node.
  • Repeat for all data nodes, performing the same steps on the final master nodes.
  • Upon the last master node recovering, the Elasticsearch nodes are joined in the reverse order they were removed in.
  • Cluster status and health were verified continuously with the Elasticsearch API.

Root Cause Analysis:
A large volume of uncancellable tasks overwhelmed the processing capabilities of Elasticsearch, leading to the inability to process new tasks.

Preventive Measures:
Monitoring of the Elasticsearch cluster’s health has been improved, focused on the amount and longevity of ongoing tasks.

Visit Group |  Kungsgatan 34-36 |  411 19 Göteborg |  Sweden |  +46 (0)31 38 06 000