Skip to content

Iterate Retrospective: Brainshelf Resurface

Scenario

Brainshelf shipped Resurface . a morning email digest resurfacing saved content . in Sprint 8. Within 48 hours of launch, a timezone offset bug caused the digest to send at 5:00 AM local time for users in UTC+1 and UTC+2 timezones (primarily UK and Central Europe) instead of the configured 7:30 AM. Approximately 340 affected users [fictional] received an early-morning email before the bug was detected and patched. The sprint also had two meaningful wins: on-time delivery of the first major user-facing feature in six months, and strong opt-in rate. The team held this retrospective three days after the bug was resolved.

Source Notes:

  • The Start/Stop/Continue retrospective format is described in Agile Alliance’s published retrospective practices guide (agilealliance.org) as the recommended starting format for small, fast-moving teams due to its low facilitation overhead.
  • Liberating Structures’ “1-2-4-All” activity (liberatingstructures.com) was used for the initial input-gathering phase to ensure quieter team members contributed before group discussion, consistent with published retrospective facilitation research.

Prompt

/retrospective
sprint 8 retro . resurface launch sprint. need to write this up.
what happened: we shipped resurface on time! first big feature in 6 months.
but we also had a timezone bug that sent the morning digest at 5am to UK/Europe
users (~340 people [fictional]). caught it 6 hours after launch, patched within 2
hours of catching it, but some users were not happy.
format: start/stop/continue. small team (7 people in the room).
2 wins, the timezone thing, and a few process issues that came up.
previous retro action items were: (1) set up staging environment email testing .
DONE, (2) add integration test for notification delivery . NOT DONE (this
contributed to the bug), (3) weekly eng/pm sync . IN PROGRESS.

Output

Retrospective: Sprint 8 . Resurface Launch

Overview

Period Covered: Sprint 8 (Feb 3 - Feb 16, 2026) Date Held: February 19, 2026 Facilitator: PM Duration: 60 minutes Format Used: Start/Stop/Continue with 1-2-4-All input gathering

Attendees

  • Priya M., Product Manager
  • Dan K., Product Designer
  • Alex R., Backend Engineer (Resurface feature lead)
  • Jess T., Backend Engineer
  • Sam W., Frontend Engineer
  • Chloe B., Data Analyst
  • Jordan L., Growth (part-time; joined for the retro only)

Context

Sprint 8 was the Resurface launch sprint. Resurface is the first major user-facing feature shipped in six months and delivered on schedule. Within 48 hours of launch, a timezone offset calculation bug caused the morning digest to send at 5:00 AM local time to approximately 340 users [fictional] in UTC+1 and UTC+2 timezones (UK and Central Europe). The bug was detected via user complaints on social media and direct support emails 6 hours post-launch and patched within 2 additional hours. Total exposure window: approximately 8 hours. A public apology was posted in the app’s community forum and via email to affected users.


Previous Retrospective Review

Action Items from Last Retro (Sprint 7)

ActionOwnerStatusNotes
Set up staging environment for email template testingAlex R.CompleteStaging email environment live; used successfully in Sprint 8 for template QA
Add integration test for notification delivery pipeline (timezone, send time)Alex R.Not StartedThis item was deprioritized due to Resurface sprint scope. Directly contributed to the timezone bug.
Weekly engineering / PM sync (30 min, Tuesdays)Priya M.In ProgressRunning for 3 of 4 weeks; canceled once due to conflicting external call

What Went Well

Team Highlights

  • Resurface shipped on time after six months without a major feature launch . the team stayed in scope and executed the sprint plan without unplanned cuts.
  • Email opt-in rate in the first 72 hours exceeded the 15% hypothesis threshold: 19% of eligible users opted in [fictional], validating the demand assumption before the bug was even fixed.
  • Bug response was fast once identified: root cause isolated in under 30 minutes, patch deployed in 2 hours, affected-user communication sent within the same hour as the patch. For a 20-person company, the incident response was professional.

Process Wins

  • The staging email environment (completed from Sprint 7 retro) caught two template rendering bugs before launch that would otherwise have shipped to users; the staging investment paid back immediately.
  • The decision to require opt-in before the A/B experiment enrollment (not just after) meant the timezone incident affected opted-in users only . a smaller and more forgiving audience than the full user base.

Individual Shoutouts

  • Alex R. for staying online through the incident, owning the root cause analysis, and writing a clear postmortem within 24 hours of the fix . making this retro easier to run.
  • Chloe B. for surfacing the anomalous send-time pattern in the analytics dashboard before user complaints reached support volume . the monitoring caught the bug faster than the support queue did.

What to Improve

Challenges Faced

  • The timezone offset bug: the notification scheduling code calculated send time using server UTC offset rather than the per-user stored timezone, causing sends to arrive at the wrong local hour for users outside the team’s default timezone (US Pacific). No integration test existed for this calculation.
  • No pre-launch notification dry-run capability: the team had no way to send a test digest to a list of internal testers across representative timezones before the production push. The staging email environment tests templates but does not simulate the full scheduling pipeline.
  • Incident detection relied on social media monitoring and user support emails rather than a proactive alert. The data anomaly (unusually high bounce rate in UTC+1 accounts at 5 AM) was detectable but had no alert threshold configured.

Process Pain Points

  • The Sprint 7 action item for notification delivery integration tests was deprioritized into Sprint 8 and then into the backlog without a recorded decision. The gap between “deferred” and “abandoned” was not visible until the bug surfaced it.
  • The weekly PM/eng sync is the right cadence but scheduling it on Tuesdays conflicts with a recurring external partner call for the growth function. Sprint 8 had one canceled sync that caused an alignment gap on the opt-in UX copy change.

Themes Identified

ThemeItemsVotes
Test coverage gaps on time-sensitive featuresTimezone bug; no integration test for scheduling pipeline; no dry-run tool5
Deferred action items losing visibilityNotification test deprioritized without a recorded decision; no aging alert on backlog items4
Reactive vs. proactive incident detectionBug surfaced via social media and support emails, not monitoring alerts3

Discussion Notes

Timezone Bug Root Cause

What was discussed: Alex R. walked through the postmortem. The scheduling code stored send times in UTC and converted to local time using the server’s timezone configuration rather than the per-user timezone field in the user table. The field existed and was populated for 94% of users [fictional] but was not referenced in the scheduling path. The bug was introduced in a refactor three sprints ago when the scheduling module was rewritten for Resurface; the original transactional email code handled timezone correctly.

Root cause identified: Missing test coverage for the timezone conversion path in the scheduling module. The refactor changed the code path; no integration test caught the regression.

Proposed solution: Write an integration test suite for the notification scheduling pipeline that covers UTC, UTC+1, UTC+2, UTC-5, UTC-8, and UTC+9 test users, and add a pre-send dry-run mode that simulates full scheduling (including timezone conversion) against a seed user list before any production push.


Deprioritized Action Items Losing Visibility

What was discussed: The notification integration test was on the Sprint 7 retro list. Dan K. pointed out that when action items are added to the backlog they lose the “came from a retro” context . there’s no way to distinguish a retro-sourced item from a normal feature request. Three sprint planning sessions passed without anyone noting the item was aging.

Root cause identified: Retro action items are written into the team’s linear backlog without a label or priority designation that distinguishes them from other work. In planning, they compete on equal footing with feature work and lose.

Proposed solution: Tag all retro action items with a retro-action label and a target sprint. If the item slips past its target sprint, it surfaces at the next retro automatically as an aging item . not a separate tracking system, just a label-based filter on the backlog review.


Action Items

PriorityActionOwnerDue DateStatus
1Write integration test suite for notification scheduling pipeline (timezone conversion, per-user timezone field, UTC send calculation)Alex R.Mar 1, 2026Not Started
2Build pre-send dry-run mode: simulate full scheduling pipeline against a seed list of internal test users before any production notification pushAlex R. + Sam W.Mar 15, 2026Not Started
3Configure proactive alert for email send anomalies: alert if send-time distribution for a given digest deviates by more than 1 hour from expected per-user local timeChloe B.Mar 1, 2026Not Started

Action Item Details

Action 1: Notification Scheduling Integration Tests

  • What: Write tests covering the full timezone conversion path in the scheduling module, asserting that send_at (UTC) matches the expected local send time for users in at least 6 representative timezones. Include regression test for the specific bug: UTC+1 user, configured 7:30 AM send time, must produce a UTC timestamp of 06:30, not 07:30.
  • Why: Closes the test coverage gap that caused the Sprint 8 timezone incident.
  • Success criteria: CI pipeline includes notification scheduling tests; no merge to main can break the timezone assertion without a failing test.

Action 2: Pre-Send Dry-Run Mode

  • What: Add a dry-run flag to the notification dispatch job. When dry_run=true, the job runs the full scheduling and template-rendering pipeline but routes output to a seed list of internal test users (5 - 10 people across US, UK, and CET timezones) rather than production. Output includes a preview of send times in each recipient’s local timezone for human review.
  • Why: Provides a final human-visible check before any notification push reaches production users.
  • Success criteria: Dry-run mode is used for all future notification feature launches; no production push goes out without a dry-run sign-off from the PM.

Parking Lot

  • In-app resurface cards (vs. email-only): Several users in the community forum asked for an in-app card alternative to the email digest. Deferred . the A/B experiment on email format needs to complete before adding a third surface variant. Will revisit post-experiment.
  • Notification quiet hours settings: Two users specifically complained about the 5 AM send in the UK and mentioned they would want quiet hours controls even after the bug is fixed. This is a legitimate product gap, not just a bug consequence. Add to the Resurface v1.1 refinement backlog.

Team Health Indicators

IndicatorThis RetroLast RetroTrend
Team morale (1 - 5)3.8 [fictional]3.4 [fictional]
Process satisfaction (1 - 5)3.2 [fictional]3.5 [fictional]
Collaboration (1 - 5)4.1 [fictional]3.9 [fictional]

Morale up: the team is energized from shipping. Process satisfaction down: the timezone incident and the deferred-test miss left the team feeling the QA process is fragile.

Recurring Themes

  • Test coverage gaps on new pipeline code have appeared in two consecutive retros (Sprint 7: missing test on email template rendering; Sprint 8: missing test on scheduling timezone conversion). This is a pattern, not an isolated incident . the root cause is scope pressure on test writing during launch sprints.

Facilitator Notes

  • The 1-2-4-All format worked well for a 7-person team; individual writing time before group discussion prevented the two most vocal engineers from dominating the “what to improve” section.
  • The timezone incident dominated the discussion; spent 25 of 60 minutes on it. For future retros when a specific incident is top of mind, timeboxing the incident discussion to 15 minutes and moving to the broader themes prevents other process issues from being crowded out.
  • Action item 1 and 3 have the same due date (Mar 1); check in at the Tuesday sync to confirm Alex R. and Chloe B. are not blocked.

Next Retrospective

Scheduled: March 19, 2026 (end of Sprint 9) Focus areas: Follow up on all three action items; review early Resurface A/B experiment data if available; revisit test coverage pattern if Sprint 9 surfaces another coverage gap.


Retrospective documented by Priya M. on February 19, 2026.