Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CERTTF-388] [CERTTF-443] Implement masking on the agent #404

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

boukeas
Copy link
Contributor

@boukeas boukeas commented Nov 13, 2024

Description

As described in CR080 - Handling secrets and sensitive information in Testflinger, one of the tasks involved in supporting secrets in Testflinger is implementing functionality for masking sensitive information in the output generated by the Testflinger agent.

This PR:

  • Introduces a standalone generic Masker class for masking sensitive information in text. What is considered sensitive is specified through a list of regular expressions. Note that the Masker currently replaces sensitive information with a hash of a specified length, which means that the same masked pieces of information will appear in the output with the same hash.
  • Introduces the MaskingCommandRunner class, derived from the regular CommandRunner. Their only difference lies in the post_output method, where the masking variant of the runner employs a Masker to the output before posting it. The Masker to be used is actually provided to the runner upon its construction, so the runner itself is not aware of the specifics, i.e. it is not aware of the kind of information that is considered sensitive.
  • Extends the TestflingerJob class to check if "sensitive patterns" are specified in the agent's configuration file or if secrets are included in the job data. In any of these cases, it employs a MaskingCommandRunner instead of a regular CommandRunner, to make sure that sensitive information is masked in the output. In the case of secrets, it also injects them into the corresponding environment variables of the runner (CERTTF-443).

Resolved issues

Resolves CERTTF-388 and CERTTF-443.

Some (reasonable) assumptions have been made regarding the way the server will pass secrets onto the agent along with other job data. This is separate work to be undertaken within CERTTF-441 and possibly CERTTF-444, so some minor modifications might be necessary when these tasks are also completed.

Documentation

The newly introduced sensitive_patterns in the agent configuration file has been added to the relevant documentation.

Tests

Tests have been included for all the newly introduced classes and functionality, including integration-style tests that use the fake_connector device connector to run test_cmds that include sensitive information and/or secrets.

Also, this job was submitted to a local Testflinger deployment:

job_queue: myqueue
test_data:
  test_cmds: |
    echo Device IP is: $DEVICE_IP

The testflinger agent had these lines added to its configuration file, specifying IP addresses as sensitive information (note the necessary escapes).

sensitive_patterns:
   - "\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b"

This was the output of the job, where the device IP has been masked.

************************************************
* Starting testflinger test phase on agent-007 *
************************************************
2024-11-13 15:12:56,409 agent-007 INFO: DEVICE CONNECTOR: BEGIN testrun
2024-11-13 15:12:56,412 agent-007 INFO: DEVICE CONNECTOR: END testrun
Device IP is: **408822**

@boukeas boukeas marked this pull request as ready for review November 13, 2024 18:25
@boukeas boukeas requested review from plars and val500 November 13, 2024 18:26
Comment on lines +49 to +52
# retrieve sensitive patterns from the agent configuration
sensitive_patterns = self.client.config.get("sensitive_patterns", [])
# retrieve secrets from the job
secrets_dict = self.job_data.get("secrets")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is self-explanatory, no need for comments here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say that in this particular case the comments were deliberate: I mean to emphasize how the sources of these patterns (sensitive vs. secrets) are different, i.e. that the former is static and comes from the agent configuration whereas the latter is dynamic and differs for every job. I believe one can easily miss that by skimming over the code.

Comment on lines +55 to +62
# inject secrets into the runner environment
environment = {
**self.client.config,
**secrets_dict,
}
# add secrets to the sensitive variables
sensitive_patterns.extend(secrets_dict.values())
# secrets and (possibly) sensitive information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

}

log_data = self.run_testcmds(job_data, client, tmp_path)
print(log_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you supposed to print log data to stdout, not stderr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test, so neither stdout nor stderr are printed when the test is successful. They are however both captured and printed (separately) when the test fails, so this print is meant as a diagnostic, i.e. you only see it when the test fails and might help you understand what went wrong. I'd happily change this to print to stderr if you feel it makes any difference.

@plars
Copy link
Contributor

plars commented Dec 12, 2024

@boukeas I have a really stupid question to ask first... when/why do we want to use regexp to specify patterns that need to be masked, and why would they be in the agent config? I guess I just assumed that the only masking would be secrets that were sent to us from the server - which would be literal string matching in the output and not defined in the agent config. The specific example cited here is masking the IP address, but I don't think we ever want to mask that information do we? That could make interpreting the job really hard in many cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants