Issues with some shards impacting a few customers
Incident Report for VividCortex
Resolved
We left this incident open this long out of an overabundance of caution as we were replacing some of replicas. In hindsight, we should have marked this resolved days ago as no customers have been impacted by it since March 1. We're sorry if this caused any unnecessary concern.
Posted about 1 year ago. Mar 03, 2017 - 15:58 UTC
Update
All primary systems continue to perform at full capacity and there have been no further reports of delays or gaps in data related to this incident. One backup replica is still being replaced and we expect that to be complete in the morning.
Posted about 1 year ago. Mar 01, 2017 - 23:05 UTC
Monitoring
At this point all primary systems have been restored to full capacity and no customers should be experiencing delays or gaps in data. We will continue to monitor this morning as we restore a couple of backup replicas.

You could notice that agents are currently continuing to run well after the host services they monitor have been shut down. A cleanup feature normally terminates agents after 1 hour, however as currently implemented agents may be terminated during an interruption of certain services on our end. The agent cleanup termination period is currently extended to 3 days and we plan to keep it set at 1 day while we evaluate alternatives.

Again, thank you for your patience and let us know if you have any questions or concerns.
Posted about 1 year ago. Mar 01, 2017 - 15:54 UTC
Update
Ongoing problems with AWS infrastructure today have contributed to our difficulties in completely correcting this issue. We continue to believe that this is impacting only a few customers. We are sorry and appreciate your patience if you are one of those affected.
Posted about 1 year ago. Feb 28, 2017 - 23:14 UTC
Update
We're continuing to rebuild one of our shards and have added additional processing capacity. Some customers may continue to experience delays in event notification and query sample processing. We have also implemented a change to host registration which will prevent agents from shutting down prematurely.
Posted about 1 year ago. Feb 28, 2017 - 17:08 UTC
Update
We're actively working on rebuilding some shards. Only a small fraction of our customers may be affected. The effect is mostly visible on events and samples.
Posted about 1 year ago. Feb 27, 2017 - 14:47 UTC
Identified
Degraded EBS volumes have had an effect across several shards and their fail-over copies. We're now rebuilding clean copies.
Posted about 1 year ago. Feb 27, 2017 - 12:18 UTC
Investigating
We're having issues with our shards. This may impact some customer environments, mostly in terms of events and samples availability. We're investigating and will update as soon as we have more information.
Posted about 1 year ago. Feb 27, 2017 - 11:37 UTC