Proposal for improving vulnerability detection MacOS
5.6 KiB
Improving vulnerability detection for MacOS
6001 identified some problems with our current approach to vulnerability detection on MacOS:
- The version reported by software does not fit the standard format. For example, Zoom reports the version as 5.8.3 (2240).
- The app name includes extra terms that don't appear in the title. For example, zoom.us is treated as zoom us (2 terms) and does not match the title commonly used for zoom eg "Zoom 4.6.9 for macOS" or "Zoom Meetings 5.8.0 for macOS".
- Sometimes the CPE dictionary is incomplete. For example, CVE-2021-24043 should have a matching CPE
cpe:2.3:a:whatsapp:whatsapp:2.2145.0:*:*:*:desktop:*:*:*
, but it is absent. Also not that it would not match on windows because target_sw is empty, but we try to match on windows*. Removing the target_sw would lead to many false positives.
Our current approach to CPE binding consists of matching the software name
against the CPE title
along with the software version
. Instead, I propose we
try to match the software vendor and name parts against the CPE vendor and product parts (standardizing the values when
needed) and then we can programmatically look at the version (and the rest of the CPE parts) to
determine what CVEs match a given CPE. In other words, instead of looking at CPEs as just strings,
we should be looking at them as sets:
So this:
cpe:2.3:a:microsoft:edge:79.0.309.68:*:*:*:*:*:*:*
cpe:2.3:a:microsoft:edge:80.0.361.48:*:*:*:*:*:*:*
cpe:2.3:a:microsoft:edge:80.0.361.50:*:*:*:*:*:*:*
cpe:2.3:a:microsoft:edge:80.0.361.50:*:*:*:*:windows:*:*
Can be visualized as this:
flowchart TD
id1((vendor: microsoft)) --> id2((product: edge))
id2((product: edge)) --> id3((version: 79.0.309.68))
id2((product: edge)) --> id4((version: 80.0.361.48))
id2((product: edge)) --> id5((version: 80.0.361.50))
id3((version: 79.0.309.68)) --> id6((cve_1))
id3((version: 79.0.309.68)) --> id7((cve_2))
id4((version: 80.0.361.48)) --> id8((cve_3))
id5((version: 80.0.361.50)) --> id9((cve_4))
id5((version: 80.0.361.50)) --> id10((target_sw: windows))
id10((target_sw: windows)) --> id11((cve_5))
So having version 80.0.361.50
of Edge
installed on MacOS should only return cve_4
but having
the same program in Windows should return both cve_4
and cve_5
.
So basically our vulnerability detection problem can be broken down into two sub-problems:
- Binding the software
vendor
andname
attributes to known CPEvendor
andproduct
attributes (a.k.a the binding problem). - Once we have the
vendor
andproduct
, we will need to match that along with the version and other characteristics (like language, platform, etc) to one or more target CPEs contained in the NVD dataset (a.k.a the matching problem).
Binding the vendor portion
For binding the vendor
, we can use the bundle_identifier
- using
this as a
guideline - we can extract a 'pseudo vendor id' and filter out any top level domain names (since
the bundle_identifier
is assumed to be in reverse-DNS format) and finally, transform the resulting
value if necessary.
flowchart LR
bundle_identifier --> vendor_id
vendor_id --> remove_top_lv_domains
remove_top_lv_domains --> map_values
Using the data in here the following vendor translations where required (this list is not exhaustive):
bundle_identifier | extracted vendor | mapped vendor |
---|---|---|
com.postmanlabs.mac | postmanlabs | getpostman |
com.tinyspeck.slackmacgap | tinyspeck | slack |
com.getdropbox.dropbox | getdropbox | dropbox |
ru.keepcoder.Telegram | keepcoder | telegram |
org.virtualbox.app.VirtualBox | virtualbox | oracle |
org.virtualbox.app.VirtualBox | Cisco-Systems | cisco |
net.kovidgoyal.calibre | kovidgoyal | calibre-ebook |
We will need to host and maintain some kind of metadata like this somewhere.
Binding the product portion
For binding the product
, we can use both the bundle_executable
and the bundle_name
(sometimes we get matches with the bundle_executable
sometimes we get matches with the
bundle_name
) - the data processing pipeline would look something like this:
flowchart LR
bundle_executable --> map_values
map_values --> to_lower
to_lower --> replace_spaces
Again, like with the vendor portion, some translation was required. When testing this approach the following translation were used:
vendor | bundle name/ executable | translation |
---|---|---|
oracle | VirtualBox | vm_virtualbox |
agilebits | 1Password 7 | 1password |
zoom | zoom.us | zoom |
microsoft | Microsoft AutoUpdate | autoupdate |
microsoft | Microsoft Edge | edge |
microsoft | Code | visual_studio_code |
osquery | oqueryd | osquery |
Reference
To test this approach I used
this
data as input (the apps sheet). Both the not_found and found sheets contain the apps that were not
found and found in the NVD dataset respectively. I checked that all entries in the not_found
sheet
did have entries in the NVD dataset.
For extracting the vendor and product portions from the NVD dataset I used the following script.
To determine matches/mismatches I used following script