This issue is resolved and the application is performing as expected. A hardware failure on an internal API service caused timeouts when processing incoming requests. The node became unhealthy but our load balancer continued to erroneously mark the node as healthy. We'll be improving our service health checks to detect this scenario and failover more quickly.
Clients may see a brief period of missing data which is being filled and should resolve within an hour.
Posted 7 months ago. May 30, 2018 - 18:09 UTC
The webapp is back up. We're diagnosing the root cause of the 500 errors but are confident that the service is performing as expected.
Posted 7 months ago. May 30, 2018 - 17:45 UTC
We're investigating an issue causing the app to be unavailable. Customers may experience high error rates, slow response times, or complete unavailability. We're working quickly to diagnose the issue and will post another update as soon as more information is available.