Merge pull request #1072 from theopolis/arirubinstein-master

First iteration of FIM documentation
2024-11-07 18:08:53 +00:00 · 2015-04-29 13:38:46 -07:00 · 2015-04-29 13:38:46 -07:00 · b1bd02c754
commit b1bd02c754
parent 0def8ec8a6 a69a4b1903
10 changed files with 104 additions and 22 deletions
--- a/docs/wiki/deployment/anomaly-detection.md
+++ b/docs/wiki/deployment/anomaly-detection.md
@ -25,7 +25,7 @@ We can use osquery's log aggregation capabilities to easily pinpoint when the at

 ## Looking at the logs

-Using the [log aggregation guide](deployment/log-aggregation), you will receive log lines like the following in your datastore (ElasticSearch, Splunk, etc):
+Using the [log aggregation guide](log-aggregation), you will receive log lines like the following in your datastore (ElasticSearch, Splunk, etc):

 ```json
 {
--- a/docs/wiki/deployment/configuration.md
+++ b/docs/wiki/deployment/configuration.md
@ -3,7 +3,7 @@ An osquery deployment consists of:
 * Installing the tools for [OS X](../installation/install-osx) or [Linux](../installation/install-linux)
 * Reviewing the [osqueryd](../introduction/using-osqueryd) introduction
 * Configuring and starting the osqueryd service (this page)
-* Managing and [collecting](deployment/log-aggregation) the query results
+* Managing and [collecting](log-aggregation) the query results

 In the future, osquery tools may allow for **ad-hoc** or distributed queries
 that are not part of the configured query schedule and return results
@ -61,7 +61,7 @@ This config tells osqueryd to schedule two queries, **macosx_kextstat** and **fo
 * the schedule keys must be unique
 * the "interval" specifies query frequency, in seconds

-The first query will document changes to an OS X host's kernel extensions, with a query interval of 10 seconds. Consider using osquery's [performance tooling](deployment/performance-safety) to understand the performance impact for each query.
+The first query will document changes to an OS X host's kernel extensions, with a query interval of 10 seconds. Consider using osquery's [performance tooling](performance-safety) to understand the performance impact for each query.

 The results of your query are cached on disk via [RocksDB](http://rocksdb.org/). On first query run, all of the results are stored in RocksDB. On subsequent runs, only result-set changes are logged to RocksDB.

--- a/docs/wiki/deployment/file-integrity-monitoring.md
+++ b/docs/wiki/deployment/file-integrity-monitoring.md
@ -0,0 +1,82 @@
+As of osquery version 1.4.2, file integrity monitoring support was introduced
+for linux and darwin variants.  This module reads a list of directories to
+monitor from the osquery config and details changes and hashes to those
+selected files in the `file_events` table.
+
+To get started with FIM (file integrity monitoring), you must first identify
+which files and directories you wish to monitor.
+Following the [wildcard rules](../development/wildcard-rules/), you can specify
+a directory or filename filter to limit the selection of files to monitor.
+
+For example, you may want to monitor `/etc` along with other files on a linux
+system.  After you identify your target files and directories you wish to monitor,
+add them to a new section in the config `file_paths`.  
+
+### Example Config
+
+```json
+{
+  "schedule": {
+    "crontab": {
+      "query": "select * from crontab;",
+      "interval": 300
+    },
+    "file_events": {
+      "query": "select * from file_events;",
+      "interval": 300
+    }
+  },
+  "file_paths": {
+    "homes": [
+      "/root/%%",
+      "/home/%/%%"
+    ],
+    "etc": [
+      "/etc/%%"
+    ],
+    "tmp": [
+      "/tmp/%%"
+    ]
+  }
+}
+```
+
+### Sample output
+
+As file changes happen, events will appear in the `file_events` table.  During
+a file change event, the md5, sha1, and sha256 for the file will be calculated
+if possible.  A sample event looks like this:
+
+```json
+{
+  "action":"ATTRIBUTES_MODIFIED",
+  "category":"homes",
+  "md5":"bf3c734e1e161d739d5bf436572c32bf",
+  "sha1":"9773cf934440b7f121344c253a25ae6eac3e3182",
+  "sha256":"d0d3bf53d6ae228122136f11414baabcdc3d52a7db9736dd256ad81229c8bfac",
+  "target_path":"\/root\/.ssh\/authorized_keys",
+  "time":"1429208712",
+  "transaction_id":"0"
+}
+```
+
+### Tuning inotify limits
+
+For linux, osquery uses inotify to subscribe to file changes at the kernel
+level for performance.  This introduces some limitations on the number of files
+that can be monitored since each inotify watch takes up memory in kernel space
+(non-swappable memory).  Adjusting your limits accordingly can help increase
+the file limit at a cost of kernel memory.
+
+#### Example sysctl.conf modifications
+
+```
+#/proc/sys/fs/inotify/max_user_watches = 8192
+fs.inotify.max_user_watches = 524288
+
+#/proc/sys/fs/inotify/max_user_instances = 128
+fs.inotify.max_user_instances = 256
+
+#/proc/sys/fs/inotify/max_queued_events = 16384
+fs.inotify.max_queued_events = 32768
+```
--- a/docs/wiki/deployment/logging.md
+++ b/docs/wiki/deployment/logging.md
@ -17,7 +17,7 @@ lrwxr-xr-x   1 root  wheel    77 Sep 30 17:37 osqueryd.INFO -> osqueryd.INFO.201

 ### Status logs

-Status logs are generated by the [glog logging framework](https://code.google.com/p/google-glog/). The default **filesystem** logger plugin writes these logs to disk the same way glog would. Logging plugins may intercept these status logs and write them to system or otherwise.
+Status logs are generated by the [glog logging framework](https://github.com/google/glog/). The default **filesystem** logger plugin writes these logs to disk the same way glog would. Logging plugins may intercept these status logs and write them to system or otherwise.

 As the above directory listing reveals,
 *osqueryd.INFO* is a symlink to the most recent execution's INFO log.
@ -103,7 +103,7 @@ Example output of `SELECT name, path, pid FROM processes;` (whitespace added for
 }
 ```

-Most of the time the **Event format** is the most appropriate. The next section in the deployment guide describes [log aggregation](deployment/log-aggregation) methods. The aggregation methods describe collecting, searching, and alerting on the results from a query schedule.
+Most of the time the **Event format** is the most appropriate. The next section in the deployment guide describes [log aggregation](log-aggregation) methods. The aggregation methods describe collecting, searching, and alerting on the results from a query schedule.

 ## Unique host identification

--- a/docs/wiki/development/building.md
+++ b/docs/wiki/development/building.md
@ -73,7 +73,7 @@ $ ls -la ./build/linux/osquery/

 Building osquery on OS X or Linux requires a significant number of dependencies, which
 are not needed when deploying. It does not make sense to install osquery on
-your build hosts. See the [Custom Packages](installation/custom-packages) guide
+your build hosts. See the [Custom Packages](../installation/custom-packages) guide
 for generating pkgs, debs or rpms.

 ## Notes and FAQ
@ -94,7 +94,7 @@ You must run `make deps` to make sure you are pulling in the most-recent depende

 ## Build Performance

-Generating a virtual table should NOT impact system performance. This is easier said than done as some tables may _seem_ inherently latent such as `SELECT * from suid_bin;` if your expectation is a complete filesystem traversal looking for binaries with suid permissions. Please read the osquery features and guide on [performance safety](../deployment/performance-safety.md). 
+Generating a virtual table should NOT impact system performance. This is easier said than done as some tables may _seem_ inherently latent such as `SELECT * from suid_bin;` if your expectation is a complete filesystem traversal looking for binaries with suid permissions. Please read the osquery features and guide on [performance safety](../deployment/performance-safety.md).

 Some quick features include:

@ -102,4 +102,3 @@ Some quick features include:
 * Blacklisting performance-impacting virtual tables.
 * Scheduled query optimization and profilling.
 * Query implementation isolation options.
-
--- a/docs/wiki/development/pubsub-framework.md
+++ b/docs/wiki/development/pubsub-framework.md
@ -1,24 +1,24 @@
-Most of osquery's virtual tables are generated when an SQL statement requests data. For example, the [time](https://github.com/facebook/osquery/blob/master/osquery/tables/utility/time.cpp) gets the current time and returns it as a single row. So whenever a call selects data from time, e.g., `SELECT * from time;` the current time of the call will return. 
+Most of osquery's virtual tables are generated when an SQL statement requests data. For example, the [time](https://github.com/facebook/osquery/blob/master/osquery/tables/utility/time.cpp) gets the current time and returns it as a single row. So whenever a call selects data from time, e.g., `SELECT * from time;` the current time of the call will return.

-From an operating systems perspective, query-time synchronous data retrieval is lossy. Consider the [processes](https://github.com/facebook/osquery/blob/master/osquery/tables/system/linux/processes.cpp) table: if a process like `ps` runs for a fraction of a moment there's no way `SELECT * from processes;` will ever include the details. 
+From an operating systems perspective, query-time synchronous data retrieval is lossy. Consider the [processes](https://github.com/facebook/osquery/blob/master/osquery/tables/system/linux/processes.cpp) table: if a process like `ps` runs for a fraction of a moment there's no way `SELECT * from processes;` will ever include the details.

-To solve for this osquery exposes a [pubsub framework](https://github.com/facebook/osquery/tree/master/osquery/events) for aggregating operating system information asynchronously at event time, storing related event details in the osquery backing store, and performing a lookup and report at query time. This reporting pipeline is much more complicated than typical query-time virtual table generation. The time of event, storage history, and applicable (final) virtual table data information must be carefully considered. As events occur the rows returned by a query will compound, as such selecting from an event-based virtual table generator should always include a time range. 
+To solve for this osquery exposes a [pubsub framework](https://github.com/facebook/osquery/tree/master/osquery/events) for aggregating operating system information asynchronously at event time, storing related event details in the osquery backing store, and performing a lookup and report at query time. This reporting pipeline is much more complicated than typical query-time virtual table generation. The time of event, storage history, and applicable (final) virtual table data information must be carefully considered. As events occur the rows returned by a query will compound, as such selecting from an event-based virtual table generator should always include a time range.

 ## Architecture

-An osquery event publisher is a combination of a threaded run loop and event storage abstraction. The publisher loops on some selected resource or uses operating system APIs to register callbacks. The loop or callback introspects on the event and sends it to every appropriate subscriber. An osquery event subscriber may instruct a publisher, save published data, and must react to a query by returning appropriate data. 
+An osquery event publisher is a combination of a threaded run loop and event storage abstraction. The publisher loops on some selected resource or uses operating system APIs to register callbacks. The loop or callback introspects on the event and sends it to every appropriate subscriber. An osquery event subscriber may instruct a publisher, save published data, and must react to a query by returning appropriate data.

 The pubsub runflow is exposed as a publisher `setUp()`, a series of `addSubscription(const SubscriptionRef)` by subscribers, a publisher `configure()`, and finally a new thread scheduled with the publisher's `run()` static method as the entrypoint. For every event the publisher receives it will loop through every `Subscription` and call `fire(const EventContextRef, EventTime)` to send the event to the subscriber.  

 ## Example: inotify

-Filesystem events are the simplest example, let's consider Linux's inotify framework. [osquery/events/linux/inotify.cpp](https://github.com/facebook/osquery/blob/master/osquery/events/linux/inotify.cpp) is exposed as an osquery publisher. 
+Filesystem events are the simplest example, let's consider Linux's inotify framework. [osquery/events/linux/inotify.cpp](https://github.com/facebook/osquery/blob/master/osquery/events/linux/inotify.cpp) is exposed as an osquery publisher.

 There's an array of yet-to-be-implemented uses of the inotify publisher, but a simple example includes querying for every change to "/etc/passwd". The [osquery/tables/events/linux/passwd_changes.cpp](https://github.com/facebook/osquery/blob/master/osquery/tables/events/linux/passwd_changes.cpp) table uses a pubsub subscription and implements a subscriber.

 ## Event Subscribers

-Let's continue to use the inotify event publisher as an example. And let's implement a table that reports new files created in "/etc/`" The first thing we need is a [table spec](development/creating-tables):
+Let's continue to use the inotify event publisher as an example. And let's implement a table that reports new files created in "/etc/`" The first thing we need is a [table spec](creating-tables):

 ```python
 table_name("new_etc_files")
@ -51,7 +51,7 @@ Let's implement `NewETCFilesEventSubscriber::init()` to add the subscription:

 ```cpp
 void NewETCFilesEventSubscriber::init() {
-  // We templated our subscriber to create an inotify publisher-specific 
+  // We templated our subscriber to create an inotify publisher-specific
  // subscription context.
  auto sc = createSubscriptionContext();
  sc->path = "/etc";
@ -88,4 +88,4 @@ Status NewETCFilesEventSubscriber::Callback(const INotifyEventContextRef ec) {
 }
 ```

-Simple. Notice that `ec->time_string` provides a string-formatted time to remove casting too.
+Simple. Notice that `ec->time_string` provides a string-formatted time to remove casting too.
--- a/docs/wiki/development/unit-tests.md
+++ b/docs/wiki/development/unit-tests.md
@ -6,7 +6,7 @@ All commits to osquery should be well unit-tested. Having tests is useful for ma

 This guide is going to take you through the process of creating and building a new unit test in the osquery project.

-Ensure that you can properly build the code by running `make` at the root of the osquery repository. If your build fails, refer to the ["building the code"](https://github.com/facebook/osquery/wiki/building-the-code) guide.
+Ensure that you can properly build the code by running `make` at the root of the osquery repository. If your build fails, refer to the ["building the code"](building) guide.

 Before you modify osquery code (or any code for that matter), make sure that you can successfully execute all tests. Run `make test` to run all tests.

@ -40,7 +40,7 @@ The above code is very simple. If you're unfamiliar with the syntax/concepts of

 ## Building a test

-Whatever component of osquery you're working on has it's own "CMakeLists.txt" file. For example, the _tables_ component (folder) has it's own "CMakeLists.txt"`" file at [osquery/tables/CMakeLists.txt](https://github.com/facebook/osquery/blob/master/osquery/tables/CMakeLists.txt). The file that we're going to be modifying today is [osquery/examples/CMakeLists.txt](https://github.com/facebook/osquery/tree/master/osquery/examples/CMakeLists.txt). Edit that file to include the following contents:
+Whatever component of osquery you're working on has it's own "CMakeLists.txt" file. For example, the _tables_ component (folder) has it's own "CMakeLists.txt"`" file at [osquery/tables/CMakeLists.txt](https://github.com/facebook/osquery/blob/master/osquery/tables/CMakeLists.txt). The file that we're going to be modifying today is [osquery/CMakeLists.txt](https://github.com/facebook/osquery/tree/master/osquery/CMakeLists.txt). Edit that file to include the following contents:

 ```CMake
 ADD_OSQUERY_TEST(example_test example_test.cpp)
--- a/docs/wiki/installation/custom-packages.md
+++ b/docs/wiki/installation/custom-packages.md
@ -4,11 +4,11 @@ We support building custom deployment packages (pkg/deb/rpm) for less common use
 - Proprietary modifications to "core" features that aren't simple additional plugins
 - Custom dependency modifications (patched versions of glog, thrift, etc)

-The first step to creating custom packages is having [built](development/building) and tested osquery. This means reading the development guides and in most cases having a dedicated "build host".
+The first step to creating custom packages is having [built](../development/building) and tested osquery. This means reading the development guides and in most cases having a dedicated "build host".

 ## Linux

-In your cloned osquery repository, once you have [built the code](development/building) (hopefully a tagged release):
+In your cloned osquery repository, once you have [built the code](../development/building) (hopefully a tagged release):

 ```sh
 $ make packages
@ -18,7 +18,7 @@ This will use CMake and *fpm*, installed as an osquery build dependency, to gene

 ## OS X

-In your cloned osquery repository, once you have [built the code](development/building) (hopefully a tagged release):
+In your cloned osquery repository, once you have [built the code](../development/building) (hopefully a tagged release):

 ```sh
 $ make packages
--- a/docs/wiki/introduction/using-osqueryi.md
+++ b/docs/wiki/introduction/using-osqueryi.md
@ -104,4 +104,4 @@ $
 ```

 The shell does not keep much state or connect to a osqueryd daemon.
-If you would like to run queries and log changes to the output or log operating system events consider deploying a query **schedule** using [osqueryd](introduction/using-osqueryd).
+If you would like to run queries and log changes to the output or log operating system events consider deploying a query **schedule** using [osqueryd](using-osqueryd).
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -21,6 +21,7 @@ pages:
 - ['deployment/performance-safety.md', 'Deployment', 'Performance Safety']
 - ['deployment/anomaly-detection.md', 'Deployment', 'Anomaly Detection']
 - ['deployment/kernel-linux.md', 'Deployment', 'Kernel Extensions']
+- ['deployment/file-integrity-monitoring.md', 'Deployment', 'File Integrity Monitoring']
 - ['deployment/yara.md', 'Deployment', 'YARA Scanning']

 - ['development/building.md', 'Development', 'Building osquery']