Uploaded image for project: 'Ibexa IBX'
  1. Ibexa IBX
  2. IBX-5501

Fastly shielding : Possible race condition if Edge PoP receives soft purge request before shield

Details

    • Yes

    Description

      There is a possible race condition if edge POP receives soft purge requests before the shield

       

      Ibexa DXP have implemented a workaround for this in the .vcl, but unfortunaltely, it is flawed:

      sub vcl_fetch {
      (...)
        /* Preventing race condition where edge might receive purge requests before shield
           https://developer.fastly.com/learning/concepts/purging/#race-conditions
           Workaround for surrogate key purges:
        */
        if (req.backend.is_shield && req.is_background_fetch) {
          // We assume shield will receive purges no later than 5 seconds after any edge
          set beresp.ttl = 5s;    // We disable SWR, ensuring the edge will not do a background_fetch before the 5s has passed
          set beresp.stale_while_revalidate = 0s;
        }
        else {
          /* Set stale_if_error and stale_while_revalidate (customize these values) */
          set beresp.stale_if_error = 86400s;
          set beresp.stale_while_revalidate = 300s;

       

      The idea is:
      On every response to a background fetch, the edge set a cachingTTL of 5 seconds and disable stale-while-revalidate
      Once the caching TTL expires (after 5 seconds), the edge will request the object again to the shielding POP. This time a background fetch will not be used, as stale-while-revalidate was previously disabled for this object. And the original TTL in the response from origin will apply

      In theory, this will prevent the race condition even if shield receives the purge request up to 5 seconds later than the edge ( Fastly claims a purge request takes no longer  than 2 seconds to propagate in their network )

       

      However, the workaround is flawed due to how Varnish/Fastly maintains it cache according to Fastly Support:

      It happens because responses from origins are stored in a list rather than in some sort of map from the request key to the response.
      Varnish chooses which of the responses associated with a request hash to deliver based on the most recent response that's suitable for the request: that is, the Vary response header allows the request to use the response, and the various TTL timers allow the object to be delivered.
      Response objects are stored in a list because each response object can technically contain a different Vary header, so each response needs to be checked for suitability.
      A newly generated object is inserted at the front of this list, and does what is expected during 5 seconds. Once those 5 seconds pass, Varnish sees the expired object at the front of the list, and skips it. After skipping the expired object, it finds that there's still an object that it can deliver with a stale-while-revalidate timer, and uses that. Once it has decided to use that object, it gets moved to the front of the list so that it's the first one found the next time a matching request arrives.
      

       

      So the solution seems to be to replace the 5s TTL logic with code that prevents shield from ever delivering stale content to edge POP:

      sub vcl_recv {
      (...)
      
      if (fastly.ff.visits_this_service > 0) {
        set req.max_stale_while_revalidate = 0s;
      }
      

      Resource : https://developer.fastly.com/learning/concepts/stale/#shielding-considerations

      Edit : Setting req.max_stale_while_revalidate = 0s in vcl_recv is already implemented in our public vcl and is known to not prevent all use-cases where the race-condition can happend

       

      Designs

        Attachments

          Activity

            People

              Unassigned Unassigned
              vidar.langseid@ibexa.co Vidar Langseid
              Votes:
              4 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: