empayre/fleet

Fork 0

mirror of https://github.com/empayre/fleet.git synced 2024-11-06 17:05:18 +00:00

Michal Nicpon 6697c57b5d

add on-call script (#4781 )

2022-03-28 10:00:33 -06:00

9.8 KiB

Raw Blame History

Engineering

Release process

This section outlines the release process at Fleet.

The current release cadence is once every 3 weeks and concentrated around Wednesdays.

Release freeze period

In order to ensure quality releases, Fleet has a freeze period for testing prior to each release. Effective at the start of the freeze period, new feature work will not be merged.

Release blocking bugs are exempt from the freeze period and are defined by the same rules as patch releases, which include:

Regressions
Security concerns
Issues with features targeted for current release

Non-release blocking bugs may include known issues that were not targeted for the current release, or newly documented behaviors that reproduce in older stable versions. These may be addressed during a release period by mutual agreement between Product and Engineering teams.

Release day

Documentation on completing the release process can be found here.

On-call rotation

This section outlines the on-call rotation at Fleet.

The on-call engineer is responsible for responding to technical Slack comments, Slack threads, and GitHub issues raised by customers and the community which cannot be handled by the Customer Success team.

Goals

At Fleet, our primary quality objectives are customer service and defect reduction. This entails Key Performance Indicators such as customer response time and the number of bugs resolved per cycle and:

Become familiar with and stay abreast of what our community wants and the problems they're having.
Make people feel heard and understood.
Celebrate contributions.
Triage bugs, identify community feature requests, community pull requests, and community questions.

How?

No matter what, folks who post a new comment in Slack or issue in GitHub get a response from the on-call engineer within 1 business day. The response doesn't need to include an immediate answer.
The on-call engineer can discuss any items that require assistance at the end of the daily standup. They are also requested to attend the "Customer experience standup" where they can bring questions and stay abreast of what's happening with our customers.
If you do not understand the question or comment raised, request more details to best understand the next steps.
If an appropriate response is outside your scope, please post to #help-oncall, a confidential Slack channel in the Fleet Slack workspace.
If things get heated, remember to stay positive and helpful. If you aren't sure how best to respond in a positive way, or if you see behavior that violates the Fleet code of conduct, get help.

Requesting more details

Typically, the questions, bug reports, and feature requests raised by members of the community will be missing helpful context, recreation steps, or motivations respectively.

❓ For questions that you don't immediately know the answer to, it's helpful to ask follow-up questions to receive additional context.

Let's say a community member asks the question "How do I do X in Fleet?" A follow question could be "What are you attempting to accomplish by doing X?"
This way, you have additional details when the primary question is brought to the Roundup meeting. In addition, the community member receives a response and feels heard.

🦟 For bug reports, it's helpful to ask for recreation steps so you're later able to verify the bug exists.

Let's say a community member submits a bug report. An example follow up question could be "Can you please walk me through how you encountered this issue so that I can attempt to recreate it?"
This way, you now have steps that verify whether the bug exists in Fleet or if the issue is specific to the community member's environment. If the latter, you now have additional information for further investigation and question-asking.

💡 For feature requests, it's helpful to ask follow-up questions in an attempt to understand the "Why?" or underlying motivation behind the request.

Let's say a community member submits the feature request "I want the ability to do X in Fleet." A follow-up question could be "If you were able to do X in Fleet, what's the next action you would take?" or "Why do you want to do X in Fleet?."
Both of these questions provide helpful context on the underlying motivation behind the feature request when it is brought to the Roundup meeting. In addition, the community member receives a response and feels heard.

Feature requests

If the feature is requested by a customer, the on-call engineer is requested to create a feature request issue and follow up with the customer by linking them to this issue. This way, the customer can add additional comments or feedback to the newly filed feature request issue.

If the feature is requested by anyone other than a customer (ex. user in #fleet Slack), the on-call engineer is requested to point to the user to the feature request GitHub issue template and kindly ask the user to create a feature request.

Closing issues

It is often a good idea to let the original poster (OP) close their issue themselves since they are usually well equipped to decide whether the issue is resolved. In some cases, circling back with the OP can be impractical, and for the sake of speed, issues might get closed.

Keep in mind that this can feel jarring to the OP. The effect is worse if issues are closed automatically by a bot (See balderashy/sails#3423 and balderdashy/sails#4057 for examples of this.)

To provide another way of tracking status without closing issues altogether, consider using the green labels that begin with "+". To explore them, type + from GitHub's label picker.

Version support

In order to provide the most accurate and efficient support, Fleet will only target fixes based on the latest released version. Fixes in current versions will not be backported to older releases.

Community version supported for bug fixes: Latest version only

Community support for support/troubleshooting: Current major version

Premium version supported for bug fixes: Latest version only

Premium support for support/troubleshooting: All versions

Sources

There are four sources that the on-call engineer should monitor for activity:

Customer Slack channels - Found under the "Connections" section in Slack. These channels are usually titled "at-insert-customer-name-here"
Community chatroom - https://osquery.slack.com, #fleet channel
Reported bugs - GitHub issues with the "bug" and ":reproduce" label. Please remove the ":reproduce" labels after you've followed up in the issue.
Pull requests opened by the community - GitHub open pull requests

Tools

There is a script located in scripts/on-call for use during on-call rotation (only been tested on macOS and linux). It's use is completely optional, but contains several useful commands for checking issues and prs that may require attention. You will need to install the following tools in order to use it:

Resources

There are several locations in Fleet's public and internal documentation that can be helpful when answering questions raised by the community:

The frequently asked question (FAQ) documents in each section found in the /docs folder. These documents are the Using Fleet FAQ, Deploying FAQ, and Contributing FAQ.
The Internal FAQ document.

Handoff

Every week, the on-call engineer changes. Here are some tips for making this handoff go smoothly:

The new on-call engineer should change the @oncall alias in Slack to point to them. In the search box, type "people" and select "People & user groups". Switch to the "User groups" tab. Click @oncall. In the right sidebar, click "Edit Members". Remove the former on-call, and add yourself.
Hand off newer conversations. For newer threads, the former on-call can unsubscribe from the thread, and the new on-call should subscribe. The former on-call should explicitly share each of these threads, and the new on-call can select "Get notified about new replies" in the "..." menu. The former on-call can select "Turn off notifications for replies" in that same menu. It can be helpful for the former on-call to remain available for any conversations they were deeply involved in, so use your judgment on which threads to handoff.

Slack channels

These are the Slack channels the core engineering team maintains. If the channel has a directly responsible individual (DRI), they will be specified. These people are responsible for keeping up with all new messages, even if they aren't mentioned.

#g-core-engineering - DRI: Zach Wasserman
#help-oncall - DRI: Zach Wasserman
#help-golang - DRI: Zach Wasserman
#help-qa - DRI: Reed Haynes
#help-frontend - DRI: Luke Heath
#_pov-environments - DRI: Ben Edwards

Who should have these channels unmuted? Members of this group, everyone else is encouraged to mute them.

9.8 KiB Raw Blame History