Copy-edit and Markdown lint, remove old comment about CI

This commit is contained in:
Mike Myers 2020-09-22 18:41:55 -07:00 committed by Teddy Reed
parent 523a256691
commit a85b02f660

View File

@ -1,4 +1,6 @@
Performance is a core feature of osquery's visibility capability. However, the tool is very powerful and there are opportunities to ruin the performance guarantees with ill-formed queries.
# Performance safety
High-performance visibility capability is a core feature of osquery. However, user-formed queries are very powerful, and generate opportunities to ruin the performance guarantees of osquery using ill-formed queries.
This guide provides an overview and tutorial for assuring performance of the osquery scheduled queries, as well as performance-centric development practices/enforcements.
@ -43,7 +45,7 @@ Each query provides useful information and will run every minute. But what sort
For this we can use `./tools/analysis/profile.py` to profile the queries by running them for a configured number of rounds and reporting the pre-defined performance category of each. A higher category result means higher impact. High impact queries should be avoided, but if the information is valuable, consider running them less-often.
```
```bash
$ sudo -E python ./tools/analysis/profile.py --config osquery.conf
Profiling query: SELECT * FROM kernel_extensions WHERE name NOT LIKE 'com.apple.%' AND name != '__kernel__';
D:0 C:0 M:0 F:0 U:1 non_apple_kexts (1/1): duration: 0.519426107407 cpu_time: 0.096729864 memory: 6447104 fds: 5 utilization: 9.5
@ -61,9 +63,10 @@ Profiling query: SELECT DISTINCT process.name, listening.port, listening.protoco
The results (utilization=2) suggest running `processes_binding_to_ports` less often.
To estimate how often these should run you should evaluate what a differential in the information means from your visibility requirement's perspective (how meaningful is a change vs. how often you check for the change). Then weigh that value against the performance impact of running the query.
To estimate how often these should run, you should evaluate what a differential in the information means from your visibility requirement perspective (how meaningful is a change vs. how often you check for the change). Then weigh the value of that information against the performance impact of running the query.
### Understanding the output from profile.py
The osquery `profile.py` script uses `utils.py` in `tools/tests/` which uses pythons `psutil` library to collect process stats for osqueryi as its running given queries.
The script returns 5 stats:
@ -92,7 +95,7 @@ Duration is calculated by taking the subtracting `start_time` - 2 from the curre
The numbers next to the stats in the script output (categories) are determined by the `RANGES` dictionary in `profile.py`
```
```python
KB = 1024 * 1024
RANGES = {
"colors": (utils.blue, utils.green, utils.yellow, utils.red),
@ -106,42 +109,37 @@ RANGES = {
The script will take the value of the stat and compare it with the tuple at the corresponding stat's key in `RANGES`. If the value is less than the value in the tuple then the index for the value in the tuple is what appears in the script output. If the value for the stat is greater than all values of the tuple, then the length of the tuple is what appears in the script output. For example, if `cpu_time` for a query is 0.2, then you'll see `C: 0` in the script output. If `cpu_time` is 11, then you'll see `C:3` in the script output.
Queries that fail to execute (for example, due to a non-existent table) will return the highest category result '3' and the value '-1' for all statistics.
Queries that fail to execute (for example, due to a non-existent table) will return the highest category result `3` and the value `-1` for all statistics.
## Continuous Build
The continuous integration for osquery is currently under development. The previous CI solution was unreliably failing builds due to network and memory issues.
Each build on the Continuous Integration server will run on each of the supported operating system platform/versions and include the following phases:
The build will run each of the support operating system platform/versions and include the following phases:
* Build and run `make test`
* Attempt to detect memory leaks using `./tools/analysis/profile.py --leaks`
* Run a performance measurement using `./tools/analysis/profile.py`
* Check performance against the latest release tag and commit to master
* Build docs and API spec on release tag or commit to master
- Build and run tests
- Attempt to detect memory leaks using `./tools/analysis/profile.py --leaks`
- Run a performance measurement using `./tools/analysis/profile.py`
- Check performance against the latest release tag and commit to master
- Build docs and API spec on release tag or commit to master
## Virtual table denylist
Performance impacting virtual tables are most likely the result of missing features/tooling in osquery. Because of their dependencies on core optimizations, there is no harm including the table generation code in master as long as the table is denylisted when a non-developer builds the tool suite.
If you are developing latent tables that would be denylisted, please make sure you are relying on a feature with a clear issue and traction. Then add your table name (as it appears in the `.table` spec) to [`specs/denylist`](https://github.com/osquery/osquery/blob/master/specs/denylist) and adopt:
If you are developing latent tables that would be denylisted, please make sure you are relying on a feature with a clear issue and traction. Then add your table name (as it appears in the `.table` spec) to [`specs/denylist`](https://github.com/osquery/osquery/blob/master/specs/denylist) and define the following in your build step:
```bash
DISABLE_DENYLIST=1 make
```
$ DISABLE_DENYLIST=1 make
```
For your build iteration.
## Deployment profiling
Before deploying an osquery config use:
Before deploying an osquery config, use:
```
```sh
./tools/analysis/profile.py --config /path/to/osquery.conf --count 1 --rounds 4
```
To estimate the amount of CPU/memory load the system will incur for each query.
to estimate the amount of CPU/memory load that the system will incur for each query.
## Wishlist