The file.managed state, which is used by the archive.extracted state to
download the source archive, at some point recently was modified to
clear the file from the minion cache. This caused unnecessary
re-downloading on subsequent runs, which slows down states considerably
when dealing with larger archives.
This commit makes the following changes to improve this:
1. The fileclient now accepts a `source_hash` argument, which will cause
the client's get_url function to skip downloading http(s) and ftp
files if the file is already cached, and its hash matches the passed
hash. This argument has also been added to the `cp.get_url` and
`cp.cache_file` function.
2. We no longer try to download the file when it's an http(s) or ftp URL
when running `file.source_list`.
3. Where `cp.cache_file` is used, we pass the `source_hash` if it is
available.
4. A `cache_source` argument has been added to the `file.managed` state,
defaulting to `True`. This is now used to control whether or not the
source file is cleared from the minion cache when the state
completes.
5. Two new states (`file.cached` and `file.not_cached`) have been added
to managed files in the minion cache.
In addition, the `archive.extracted` state has been modified in the
following ways:
1. For consistency with `file.managed`, a `cache_source` argument has
been added. This also deprecates `keep`. If `keep` is used,
`cache_source` assumes its value, and a warning is added to the state
return to let the user know to update their SLS.
2. The variable name `cached_source` (used internally in the
`archive.extracted` state) has been renamed to `cached` to reduce
confusion with the new `cache_source` argument.
3. The new `file.cached` and `file.not_cached` states are now used to
manage the source tarball instead of `file.managed`. This improves
disk usage and reduces unnecessary complexity in the state as we no
longer keep a copy of the archive in a separate location within the
cachedir. We now only use the copy downloaded using `cp.cache_file`
within the `file.cached` state. This change has also necessitated a
new home for hash files tracked by the `source_hash_update` argument,
in a subdirectory of the minion cachedir called `archive_hash`.
This fixes a corner case in which someone is using the `name` config
param for a given gitfs/git_pillar remote, and then changes the URL for
that remote (for instance, between https and ssh). We've simply never
enforced the fetch URL in the git config for a given remote's cachedir,
since the cachedir is typically determined by hashing the URL (or branch
+ URL for git_pillar). In those cases, changing the URL changes the
cachedir path, and results in a new repo being init'ed and the correct
URL being added to the git config as part of the initialization. But,
when using the `name` param, the path to the cachedir would remain
constant no matter what the URL is. This means that when the URL is
changed in the gitfs/git_pillar config, it isn't actually updated in the
git config file for that cachedir.
With this change, the new GitConfigParser is used to examine the fetch
URL and update it if necessary.