At 18:35(CET, GMT+01) a large number of connections directed at ROUTER1 were opened.
This caused the routing engine to become overloaded, and at 18:38 ROUTER1 started losing its internal BGP sessions.
Because the router did not fail completely, but only partially, the router kept attracting traffic from both the internet and the internal network which then got blackholed.
This affected traffic over the following connections:
- DECIX (peering)
- GlobalAXS (uplink)
- NLIX (peering)
The following connections were not affected:
- AMSIX (peering)
- Interoute (uplink)
- KPN Eurorings (uplink)
At 18:45, we logged into the routers and started diagnosing.
Due to the unusual circumstances, it took until 19:05 before we realized what was happening and started implementing counter-measures.
Around 19:10 we were able to stop ROUTER1 from attracting internal traffic, allowing the network to stabilize.
Around 19:21 we were able to stop ROUTER1 from attracting internet traffic, ending the disruption to traffic.
Around 19:28 we implemented measures to stop the connections directed at ROUTER1 from overloading the router.
In the meantime, we've implemented a long-term solution to this issue to prevent this from happening again.
Around 20:00 it was clear that the problems are solved, and we re-enabled both internal and internet traffic on ROUTER1
We apologize for the inconvenience caused.
If you have any further questions or comments regarding this interruption, please contact us.
At the
dark side of telematics...