if the awx_capacity module runs on an isolated node with missing
libraries (i.e., psutil) or bad permissions, then the runner status will
be "failed"
in this scenario, we *still* want to react by recording a capacity=0
a status of error makes more sense, because failed generally points to
an issue with the playbook itself, while error is more generally used
for reporting issues internal to Tower
see: https://github.com/ansible/awx/issues/4909
this attempts to surface the underlying runner exception for tracebacks
like this one:
FileNotFoundError: [Errno 2] No such file or directory:
'/tmp/awx_41_93gtgv25/artifacts/41/status'
some users won't want to utilize the RSA key we auto-generate for
isolated node SSH access, but will instead want to manage SSH
authentication by hand outside of Tower
see: https://github.com/ansible/ansible-tower/issues/7380
it's not unusual for the secret data we pass into the `run_isolated.yml`
playbook to be quite long, namely because it can contain RSA key
data; by passing this value into the ansible-playbook process using
`vars_prompt`, we're limited by pexpect's tty line limit (which looks
like it caps out around 4k). Because of this, large payloads are
being truncated and causing job run failures.
this changes the implementation to use a named pipe instead, which
doesn't have the same limitation
see: #7183
* use tower-expect command to determine job status when running
the isolated heartbeat playbook
* grok JSON output of playbook to obtain result information
* run playbook against multiple isolated hosts at the same time
(addresses scalability concerns)
Previously we were running the playbook on a host-by-host
basis, but this changes it to pass in the list of all
isolated isntances the machine is responsible for.
Using the `json` Ansible stdout module, we are able to
parse the output for information on each host.
* set a more reasonable default `AWX_ISOLATED_CHECK_INTERVAL`
* make manual cancellation work for high values of
`AWX_ISOLATED_CHECK_INTERVAL`
* remove the `/tmp/ansible_tower/jobs/` symlink directory
see: #6616
instead of launching isolated tasks via `systemctl`, treat
`awx.main.isolated.run` as an executable that knows how to daemonize
additionally, add `setup.py isolated_build` for isolated Tower source
distribution