watcher: Set default memory limit to 200M (#3086)

This commit is contained in:
Teddy Reed 2017-03-18 16:38:47 -07:00 committed by GitHub
parent 9715fdbd84
commit 43eddc0bf3
2 changed files with 11 additions and 7 deletions

View File

@ -71,12 +71,17 @@ Path to the daemon pidfile mutex. The file is used to prevent multiple osqueryd
Disable userland watchdog process. **osqueryd** uses a watchdog process to monitor the memory and CPU utilization of threads executing the query schedule. If any performance limit is violated the "worker" process will be restarted.
`--watchdog_level=1`
`--watchdog_level=0`
Performance limit level (0=loose, 1=normal, 2=restrictive, 3=debug). The default watchdog process uses a "level" to configure performance limits.
The higher the level the more strict the limits become. The "debug" level disables the performance limits completely.
Performance limit level (0=normal, 1=restrictive, -1=disabled). The watchdog process uses a "level" to configure performance limits.
The watchdog "profiles" can be overridden for Memory and CPU Utilization.
The level limits are as follows:
Memory: default 200M, restrictive 100M
CPU: default 25% (for 9 seconds), restrictive 18% (for 9 seconds)
The normal level allows for 10 restarts if the limits are violated. The restrictive allows for only 4, then the service will be disabled. For both there is a linear backoff of 5 seconds, doubling each retry.
It is better to set the level to disabled `-1` compared to disabling the watchdog outright as the worker/watcher concept is used for extensions autoloading too. The watchdog "profiles" can be overridden for Memory and CPU Utilization.
`--watchdog_memory_limit=0`

View File

@ -50,11 +50,11 @@ using WatchdogLimitMap = std::map<WatchdogLimitType, LimitDefinition>;
const WatchdogLimitMap kWatchdogLimits = {
// Maximum MB worker can privately allocate.
{WatchdogLimitType::MEMORY_LIMIT, {100, 50, 1000}},
{WatchdogLimitType::MEMORY_LIMIT, {200, 100, 10000}},
// User or system CPU worker can utilize for LATENCY_LIMIT seconds.
{WatchdogLimitType::UTILIZATION_LIMIT, {90, 80, 1000}},
// Number of seconds the worker should run, else consider the exit fatal.
{WatchdogLimitType::RESPAWN_LIMIT, {20, 20, 1000}},
{WatchdogLimitType::RESPAWN_LIMIT, {10, 4, 1000}},
// If the worker respawns too quickly, backoff on creating additional.
{WatchdogLimitType::RESPAWN_DELAY, {5, 5, 1}},
// Seconds of tolerable UTILIZATION_LIMIT sustained latency.
@ -504,7 +504,6 @@ void WatcherRunner::createExtension(const std::string& extension) {
Watcher::resetExtensionCounters(extension, getUnixTime());
VLOG(1) << "Created and monitoring extension child (" << ext_process->pid()
<< "): " << extension;
}
void WatcherWatcherRunner::start() {