# FAQ
# Content Encoding
We enforce gzip content compression on the data being received by the endpoints. When "content-encoding" header is set to "gzip", the content body data will be compressed into a stream using gzip format and set to the request body. Therefore upon receiving the request on our endpoints, we must decompress the stream from the request body. Below is an example of how we are decompressing on our own endpoints.
string encodingType = Request.Headers["content-encoding"];
Stream decompressionStream = new GZipStream(Request.Body, CompressionMode.Decompress);
// Copy decompress stream into a new stream
MemoryStream outputStream = new MemoryStream();
decompressionStream.CopyTo(outputStream)
// Write stream to byte array
byte[] decompressedData = outputStream.ToArray();
// Decode byte array to string
string decompressedBody = Encoding.UTF8.GetString(decompressedData);
// Close the streams to do final buffer flushing
decompressionStream.Close();
outputStream.Close();
# Heartbeat
Once started, webhooks will send out a heartbeat event every 5 minutes. This serves to inform clients that the webhook is operational and to verify the status of the receiving endpoint, ensuring there are no network or system issues.
The heartbeat event will look like the below
{
"subscriptionId": "2fc686a1-123b-400f-86d6-356fcd39372d",
"sessionId": "079d1e3a-ab87-4f87-8e22-4f4d15d68806",
"attemptNumber": 1,
"records": []
}
# Retry Policies
If the configured endpoint for a webhook returns an HTTP response code from the below list, we will retry with backoff
- RequestTimeout
- InternalServerError
- ServiceUnavailable
- GatewayTimeout
- BadGateway
The retry process with backoff is as follows –
- The first retry will occur at 30 seconds. If that fails again with a code from the list, we will increase the time between subsequent retries, adding an additional 30 seconds. Subsequent retries will be at 60 seconds, 90 seconds, 120 seconds, and so forth.
- The number of retries is defined by the tier you are in (1, 10, 20, and 50 respectively for tiers Starter, Small, Medium, and Large).
- Example: If you are on the small tier, it allows for 10 retries. Thus, for the small tier, the retry process will continue for approximately 27 minutes in total. For the large tier, it is 50 retries. Thus, the retry process will continue for roughly 10 hours before we fail the webhook.
For other responses like BadRequest, we will fail the webhook immediately as they will generally not be resolved with a retry. Clients should resolve this error on their end and restart the webhook once fixed.
# Record Batching
Events can be grouped into batches based on throughput. A batch can contain different events from the same category (e.g., TrainingAssigned and TrainingCompleted). The maximum number of records in a batch depends on the customer's selected tier. For instance, the limit is 100 records for the small tier, while for the largest tier, it is 1000 records.
# Alerts Types
Alerts are a webhook event category that allows end users to setup webhooks in order to better alert and monitor their webhooks.
# Types of Alerts
# Webhook failure has occurred
When the webhook moves into a failed state, a Failed alert event will be sent. Generally this will occur due to issues with the client's endpoint that needs to be acted upon. On the UI, a warning icon will be displayed which will describe the error (e.g. HTTP status code: Unauthorized). In this example, the client endpoint assigned to the webhook is throwing an Unauthorized error and the client should take action to resolve it before restarting the webhook manually.
# Webhook is running with reduced capacity
Based upon the client's tiering, a client's webhooks may have up to 4 sessions to process their throughput. In the case where 1 or more of these sessions are not running for 30 minutes, a ReducedCapacity alert will be triggered. This indicates a partial failure and the webhook may likely end up in a failed state entirely. The client may want to pre-emptively stop and restart the webhook if the issue is with the client's endpoint.
# Average record lag is over 10 seconds
In the case where records' average lag is calculated to be more than 10 seconds, over a span of 30 minutes, a LaggingOver10Seconds alert will be triggered. This can be caused by a variety of factors but typically seen during the start of a new webhook when there are many events to be processed in the backlog. If the client is seeing this constantly, they may want to look into upgrading their tiering in order to resolve this as higher tiers can handle larger throughputs.
# Average record lag is over 10 minutes
In the case where records' average lag is calculated to be more than 10 minutes, over a span of 30 minutes, a LaggingOver10Minutes alert will be triggered. This can be caused by a variety of factors but typically seen during the start of a new webhook when there are many events to be processed in the backlog. If the client is seeing this constantly, they may want to look into upgrading their tiering in order to resolve this as higher tiers can handle larger throughputs.
# Average record lag is over 1 hour
In the case where records' average lag is calculated to be more than 1 hour, over a span of 30 minutes, a LaggingOver1Hour alert will be triggered. This can be caused by a variety of factors but typically seen during the start of a new webhook when there are many events to be processed in the backlog. If the client is seeing this constantly, they may want to look into upgrading their tiering in order to resolve this as higher tiers can handle larger throughputs.
# Average record lag is over 1 day
In the case where records' average lag is calculated to be more than 1 day, over a span of 30 minutes, a LaggingOver1Day alert will be triggered. This can be caused by a variety of factors but typically seen during the start of a new webhook when there are many events to be processed in the backlog. If the client is seeing this constantly, they may want to look into upgrading their tiering in order to resolve this as higher tiers can handle larger throughputs.
# No heartbeat detected
Webhooks produce a heartbeat event every 5 minutes in order to inform consumers that the webhook is running. When the last recorded heartbeat update is more than 30 minutes from the current time, the NoHearbeat alert will be triggered. This will occur repeatedly until action is taken (such as a restart). When the NoHeartbeat alert is triggered, Cornerstone automatically restarts the Webhook.
Example payloads for all of the events above can be found at Alerts Events Reference
# Status Types
In the Webhooks system, we provide the ability to create a external webhook to consume the webhook status stream. This allows clients to monitor the status of their webhooks via a webhook.
# Types of Statuses
# Starting
This status is produced when the webhook has just been started from a stopped status. Generally seen when someone creates and starts a webhook from the UI.
# Running
This status is produced when the webhook has been running and a heartbeat event has occurred. This will generally occur every 5 minutes from the start of a webhook, though there may be some times, where the event will be missed (generally around the start of a webhook and during the 1 hour session renewal).
# Stopped
This status is produced when the webhook is put into a stopped state. Generally this will be when the webhook has been stopped manually.
# Failing
This status is produced when the webhook is put into a failing state. Generally occurs if the webhook has been trying to send events to an endpoint who is in turn throwing a retriable event. See Retry Policies
# Failed
This status is produced when the webhook is put into a failed state. Generally occurs if there is a client error (most common) on the client endpoint or an error has occurred from the CSOD webhooks system.
Example payloads can be found in Status Events Reference