mirror of https://github.com/empayre/fleet.git synced 2024-11-06 17:05:18 +00:00

Add agile principles and populate why scrum (#10318 )

2023-03-08 15:57:52 -06:00

27 KiB

Raw Blame History

Why this way?

At Fleet, we rarely label ideas as drafts or theories. Everything is always in draft and subject to change in future iterations.

To increase clarity and encourage teams to make decisions quickly, leaders and DRIs sometimes need to explicitly mention when they are voicing an opinion or a decision. When an opinion is voiced, there's space for near-term debate. When a decision is voiced, team commitment is required.

Any past decision is open to questioning in a future iteration, as long as you act in accordance with it until it is changed. When you want to reopen a conversation about a past decision, communicate with the DRI (directly responsible individual) who can change the decision instead of someone who can't. Show your argument is informed by previous conversations, and assume the original decision was made with the best intent.

Here are some of Fleet's decisions about the best way to work, and the reasoning for them.

Why open source?

Fleet's source code, website, documentation, company handbook, and internal tools are public and accessible to everyone, including engineers, executives, and end users. (Even paid features are source-available.)

Meanwhile, the company behind Fleet is built on the open-core business model. Openness is one of our core values, and everything we do is public by default. Even the company handbook is open to the world.

Is open-source collaboration really worth all that? Is it any good?

Here are some of the reasons we build in the open:

Transparency. You are not dealing with a black box. Anyone can read the code and confirm it does what it's supposed to. When it comes to security and device management, great power should come with great openness.
Modifiability. You are not stuck. Anybody can make changes to the code at any time. You can build on existing ideas or start something brand new. Every contribution benefits the project as a whole. Plugins and configuration settings you need may already exist. If not, you can add them.
Community. You are not alone. Open source contributors are real people who love solving real problems and sharing solutions. As we gain experience and our careers grow, so does the community. As we learn, we get better at helping each other, which makes it easier to get started with the project, which drives even more adoption, and so on.
Less waste. You are not redundant. Contributing back to open source benefits everybody: Instead of other organizations and individuals wasting time rediscovering bug fixes and reinventing the same new features in a vacuum, everybody can just upgrade to the latest version of Fleet and take advantage of all those improvements automatically.
Perspective. You are not siloed. Anyone can contribute. That means startups, enterprises, and humans all over the world push fixes, add features, and influence the roadmap. Diversity of thought accelerates the cycle time for stability and innovation. Instead of waiting months to discover rare edge cases, or last-minute gaps in "enterprise-readiness", or how that cool new unsupported networking protocol your CISO wants to use isn't supported yet, you get to take advantage of the investment from the last contributor who had the same problem. It's like seeing around corners.
Sustainability. You are not the only contributor. Open-source software is public and highly visible. Mistakes are more obvious, which activates the community to discover (and fix) vulnerabilities and bugs more quickly. Open-source projects like osquery and Fleet have an incentive to be proactive and thoughtful about responsible disclosure, code reviews, strict semantic versioning, release notes, documentation, and other secure development best practices. For example, anybody in the community can suggest and review changes, but only maintainers with appropriate subject matter expertise can merge them.
Accessibility. You are smart and cool enough. Open source isn't just the Free Software movement anymore. Today, there are many other reasons to contribute and opportunities to contribute, even if you don't yet know how to write code. (For example, try clicking "Edit this page" to make an improvement to this page of Fleet's handbook.) Since 2020, Fleet has given visibility into over 1.65 million servers and workstations at Fortune 1000 companies like Comcast, Twilio, Uber, Atlassian, and Wayfair. But did you know that during that time, Fleet inspired one 9-year-old kid to learn coding, when almost no one else believed she could do it?
More timeless. You are not doomed to disappear forever when you change jobs. Why should your code? In most jobs, most of the work you do becomes inaccessible when you quit. But open source is forever.

Why handbook-first strategy?

The Fleet handbook provides team members with up-to-date information about how to do things in the company.

At Fleet, we make changes to the handbook first. That means, before any change to how we run the business is "live" oror "official", it is first changed in the relevant handbook pages and issue templates.

Making changes to the handbook first encourages a culture of self-reliance, which is essential for daily asynchronous work as part of an all-remote team. It keeps everyone in sync across the all-remote team in different timezones, avoids miscommunications, and ensures the right people have reviewed every change.

The Fleet handbook is inspired by the GitLab team handbook. It shares the same advantages and will probably undergo a similar evolution.

To contribute to the handbook, click "Edit this page" and make your edits in Markdown.

Why the emphasis on training?

Investing in people and providing generous, prioritized training, especially up front, helps contributors understand what is going on at Fleet. By making training a prerequisite at Fleet, we can:

help team members feel confident in the better decisions they make at work.
create a culture of helping others, which results in team members feeling more comfortable even if they aren’t familiar with the osquery, security, startup, or IT space.

Here are a few examples of how Fleet prioritizes training:

the first 3 days at the company for every new team member are reserved for working on the tasks and training in their onboarding issue.
during the first 2 weeks at the company, every new fleetie joins a daily 1:1 meeting with their manager to check in and see how they're doing, and if they have any questions or blockers. If the manager is not available for this meeting, the CEO (pending availability) or Charlie will join this short daily meeting with them instead.
In their first few days, every new fleetie joins:
- hands-on contributor experience training session with Charlie where they share their screen, check the configuration of their tools, complete any remaining setup, and discuss best practices.
- a short sightseeing tour with Charlie and (pending availability) Fleet's CEO to show them around and welcome them to the company.

Why direct responsibility?

Like Apple and GitLab, Fleet uses the concept of directly responsible individuals (DRIs) to know who is responsible for what.

A DRI is a person who is singularly responsble for a given aspect of the open-source project, the product, or the company. A DRI is responsible for making decisions, accomplishing goals, and getting any resources necessary to make a given area of Fleet successful.

For example, every department maintains its own dedicated handbook page, with a single DRI, and which is kept up to date with accurate, current information, including the group's kanban board, Slack channels, and recurring tasks ("rituals").

DRIs help us collaborate efficiently by knowing exactly who is responsible and can make decisions about the work they're doing. This saves time by eliminating a requirement for consensus decisions or political presenteeism, enables faster decision-making, and ensures a single individual is aware of what to do next.

Reporting structure

In addition to Fleet's organizational chart, the company also organizes cross-functional product groups to allow for faster collaboration and fewer roundtrips.

Reviewers

Fleet aims to make picking the right reviewer for your change as easy and automatic as possible. In many cases, you won't need to select a particular reviewer for your pull request. (It will just happen automatically.)

To check out the right person to review a given piece of content or source code path, consider:

The CODEOWNERS files of the fleetdm/fleet and fleetdm/confidential repositories.
The name="maintainedBy" tags at the very bottom of the raw markdown source for every handbook page and individual article.
The job titles and reporting structure indicated by the company's organizational chart and the roles in our cross-functional product groups.

In some cases, multiple subject-matter experts can merge changes to files even though there is a dedicated DRI configured as the "CODEOWNER". For examples of this, see the auto-approval flows configured as sails.config.custom.githubRepoDRIByPath and sails.config.custom.confidentialGithubRepoDRIByPath in website/config/custom.js.

Why do we use a wireframe-first approach?

Wireframing (or "drafting," as we often refer to it at Fleet) provides a clear overview of page layout, information architecture, user flow, and functionality. The wireframe-first approach extends beyond what users see on their screens. Wireframe-first is also excellent for drafting APIs, config settings, CLI options, and even business processes.

Here's why we use a wireframe-first approach at Fleet.

We create a wireframe for every change we make and favor small, iterative changes to deliver value quickly.
We can think through the functionality and user experience more deeply by wireframing before committing any code. As a result, our coding decisions are clearer, and our code is cleaner and easier to maintain.
Content hierarchy, messaging, error states, interactions, URLs, API parameters, and API response data are all considered during the wireframing process (often with several rounds of review). This initial quality assurance means engineers can focus on their code and confidently catch any potential edge-cases or issues along the way.
Wireframing is accessible to people who understand our users but are not necessarily code-literate. So anyone can contribute a suggestion (at any level of fidelity). At the very least, you'll need a napkin and a pen, although we prefer to use Figma.
Wireframes can be shown to customers and other users in the community for feedback.
Designing from the "outside, in" gives us the opportunity to obsess over details in the interaction design. An undefined "what" exposes the results to the chaos of unplanned extra work and context shifting for engineers. This way, every engineer doesn't have to personally spend the time to get and stay up to speed with:
- the latest reactions from users
- all of the motivations and discussions from the previous rounds of wireframe revisions that were thrown away
- how the UI has evolved in previous releases to better serve the people using and building it
Wireframing is important for both maintaining the quality of our work and outlining what work needs to be done.
With Figma, thanks to its powerful component and auto-layout features, we can create high-fidelity wireframes - fast. We can iterate quickly without costing more work and less sunk-cost fallacy.
But wireframes don't have to be high fidelity. It is OK to communicate ideas for changes using ugly, marked-up screenshots, a photo of a piece of paper. Fleet's drafting process helps turn these rough wireframes into product changes that can be implemented quickly with minimal UX and technical debt.
Wireframes created to describe individual changes are disposable and may have slight stylistic inconsistencies. Fleet's user interface styleguide in Figma is the source of truth for overarching design decisions like spacing, typography, and colors.

Got a question about creating wireframes or the drafting process? Mention Noah Talerman or Luke Heath in #help-product.

Why do we use one repo?

At Fleet, we keep everything in one repo (fleetdm/fleet). Here's why:

One repo is easier to manage. It has less surface area for keeping content up to date and reduces the risk of things getting lost and forgotten.
Our work is more visible and accessible to the community when all project pieces are available in one repo.
One repo pools GitHub stars and more accurately reflects Fleet’s presence.
One repo means one set of automations and labels to manage, resulting in a consistent GitHub experience that is easier to keep organized.

The only exception (fleetdm/confidential) is when we're working on something confidential since GitHub does not allow confidential issues inside public repos.

Tip: Did you know that you can search through issues from both repos at the same time? In addition to the built-in search in the handbook on fleetdm.com, you can also search for any content from the handbook, documentation, or issue templates from either repo using GitHub search.

Why not continuously generate REST API reference docs from javadoc-style code comments?

Here are a few of the drawbacks that we have experienced when generating docs via tools like Swagger or OpenAPI, and some of the advantages of doing it by hand with Markdown.

Markdown gives us more control over how the docs are compiled, what annotations we can include, and how we present the information to the end-user.
Markdown is more accessible. Anyone can edit Fleet's docs directly from our website without needing coding experience.
A single Markdown file reduces the amount of surface area to manage that comes from spreading code comments across multiple files throughout the codebase. (see "Why do we use one repo?").
Autogenerated docs can become just as outdated as handmade docs, except since they are siloed, they require more skills to edit.
When docs live at separate repo paths from source code, we are able to automate approval processes that allow contributors to make small improvements and notes, directly from the website. This leads to more contributions, since it lowers the barrier of entry for becoming a contributor.
Autogenerated docs are typically hosted on a subdomain. This means we have less control over a user's journey through our website and lose the SEO benefits of self-hosted documentation.
Autogenerating docs from code comments is not always the best way to make sure reference docs accurately reflect the API.
As the Fleet REST API, documentation, and tools mature, a more declarative format such as OpenAPI might become the source of truth, but only after investing in a format and processes to make it continually accurate as well as visible, accessible, and modifiable for all contributors.

Why group Slack channels?

Groups (g-*) are organized around goals. Connecting people with the same goals helps them produce better results by fostering freer communication. Some groups align with teams in the org chart. Other groups, such as product groups, are cross-functional, with some group members who do not report to the same manager.

Every group at Fleet maintains their own Slack channel, which all group members join and keep unmuted. Everyone else at Fleet is encouraged to mute these channels, using them only as needed. Each channel has a directly responsible individual responsible for keeping up with all new messages, even if they aren't explicitly mentioned (@).

Why organize work in team-based kanban boards?

It's helpful to have a consistent framework for how every team works, plans, and requests things from each other. Fleet's kanban boards are that framework, and they cover three goals:

Intake: Give people from anywhere in the world the ability to request something from a particular team (i.e., add it to their backlog).
Planning: Give the team's manager and other team members a way to plan the next three-week iteration of what the team is working on in a world (the board) where the team has ownership and feels confident making changes.
Shared to-do list: What should I work on next? Who needs help? What important work is blocked? Is that bug fix merged yet? When will it be released? When will that new feature ship? What did I do yesterday?

Why agile?

Releasing software iteratively gets changes and improvements into the hands of users faster and generally results in software that works. This makes contributors fitter, happier, and more productive. We apply the twelve principles of agile to our development process:

Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
Welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage.
Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
Business people and developers must work together daily throughout the project.
Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
Working software is the primary measure of progress.
Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
Continuous attention to technical excellence and good design enhances agility.
Simplicity--the art of maximizing the amount of work not done--is essential.
The best architectures, requirements, and designs emerge from self-organizing teams.
At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

See the agile manifesto for more information.

Why scrum?

Scrum is an agile framework for software development that helps teams deliver high quality software faster. It emphasizes teamwork, collaboration, and continuous improvement to achieve business objectives. Here are some of the key reasons why we use scrum at Fleet:

Improved collaboration and communication: Scrum emphasizes teamwork and collaboration, which leads to better communication between team members and stakeholders. This helps ensure that everyone is aligned and working towards the same goals.
Flexibility and adaptability: Scrum allows teams to respond quickly to changing requirements and market conditions. By working in short sprints, teams can continuously adapt to new information and feedback, and adjust their approach as needed.
Continuous improvement: Scrum encourages teams to reflect on their processes and identify areas for improvement. The regular sprint retrospective meetings provide a forum for the team to discuss what went well and what could be improved, and to make changes to their processes accordingly.
Faster delivery of working software: Scrum helps teams deliver working software faster by breaking down the development process into manageable chunks that can be completed within a sprint. Stakeholders can see progress and provide feedback more quickly, which helps ensure the final product meets their needs.
Higher quality software: Scrum includes regular testing and quality assurance activities, which help ensure that the software being developed is of high quality and meets the required standards.

Why lean software development?

Lean software development is an iterative and incremental approach to software development that aims to eliminate waste and deliver value to customers quickly. It is based on the principles of lean manufacturing and emphasizes continuous improvement, collaboration, and customer focus.

Lean development can be summarized by its seven principles:

Eliminate waste: Eliminate anything that doesn't add value to the customer, such as unnecessary features, extra processing, and waiting times.
Amplify learning: Share knowledge and expertise across the team to continuously improve the process and increase efficiency.
Decide as late as possible: Defer major decisions and commitments until the last possible moment to enable more informed and optimal decisions.
Deliver as fast as possible: Deliver value to customers as quickly as possible to ensure their needs are met and to receive feedback for continuous improvement.
Empower the team: Respect and empower the team, including customers, stakeholders, and developers, by providing a supportive environment and clear communication.
Build integrity in: Build quality into the software by continuously testing, reviewing, and improving the code throughout the development process.
Optimize the whole: Optimize the entire process and focus on the system's overall performance rather than just individual parts to ensure the most efficient and effective use of resources.

Why a three-week cadence?

The Fleet product is released every three weeks. By syncing the whole company to this schedule, we can:

keep all team members (especially those who aren't directly involved with the core product) aware of the current version of Fleet and when the next release is shipping.
align project planning and milestones across all teams, which helps us schedule our content calendar and manage company-wide goals.

Why spend so much energy responding to every potential production incident?

At Fleet, we consider every 5xx error, timeout, or errored scheduled job a P1 incident. We create an outage issue for it, no matter the environment, as soon as the issue is detected, even before we understand. We always determine impact quickly, reach out to affected users to acknowledge their problem, and determine the root cause. Why?

It helps us learn.
You never know whether an error like this is a real issue until you take a close look. Even if you think it probably isn't.
It incentivizes us to fix the root cause sooner.
It keeps the number of errors low.
It ensures the team understands exactly what errors are happening.
It helps us fix bugs sooner, preventing them from stacking and bleeding into one another and making fixes harder.
It gets everyone on the same page about what an issue is.
It prevents stoppage of information about bugs and problems. Every outage is visible.
It allows us to reach out to affected users ASAP and acknowledge their challenge, showing them that Fleet takes quality and stability seriously.

What is a P1?

Every 5xx error, timeout, or failed scheduled job is a P1.

That means:

It gets a postmortem issue created within the production issue response time SLA, even before we know the impact, the root cause, or even what the error message says.
It gets a close look right away, even if we think it might not matter. If there is any chance of it affecting even one user, we keep digging.
Including a situation where a user has to wait longer than 5 seconds during signup on fleetdm.com (or any time we breach an agreed upon response time guarantee)
Including when a scheduled job fails and we aren't sure yet whether or not any real users are affected.

27 KiB Raw Blame History Unescape Escape