Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CERTTF-457] feat: Github action for retrieving data from multi-device jobs #434

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

boukeas
Copy link
Contributor

@boukeas boukeas commented Dec 20, 2024

Description

This PR introduces a Testflinger Github action that polls a multi-device job (receiving its ID as its input) and returns the job IDs and machine IPs of its child jobs as a JSON string.

Resolved issues

Resolves CERTTF-457.

Tests

The workflow below was used to submit a multi-device job without polling, use the poll-multi action to retrieve the data for the child jobs, and then display that data and confirm it's valid JSON.

name: Temporary workflow for testing multi-device polling action
on:
  workflow_dispatch:

jobs:
  submit-and-poll:
    name: Submit multi-device job and poll it for its data
    runs-on: [self-hosted, testflinger]
    steps:

    - name: Submit job to Testflinger
      id: submit
      uses: canonical/testflinger/.github/actions/submit@main
      with:
        poll: false
        job: |
          job_queue: multi-1
          name: multi-device-job
          output_timeout: 43200
          provision_data:
            jobs: 
            - job_queue: 202407-34216
              name: dell-xps-13-9350-0cc9-c34216
              reserve_data:
                ssh_keys:
                - lp:boukeas
                timeout: 600
            - job_queue: 202008-28168
              name: hp-eliteone800-g627-all-in-one-pc-c28168
              reserve_data:
                ssh_keys:
                - lp:boukeas
                timeout: 600

    - name: Retrieve multi-device job data
      id: poll
      uses: canonical/testflinger/.github/actions/poll-multi@CERTTF-457-multi-device-polling
      with:
        job-id: ${{ steps.submit.outputs.id }}
        sentinel-phase: reserve
    
    - name: Verify
      shell: bash
      env:
        JOBS: ${{ steps.poll.outputs.jobs }}
      run: |
        echo $JOBS | jq

Here is a successful test run. This is the output of the action:

{
  "7fcc82b1-1ab8-4fcc-9e3f-b0f9afbcec49": "10.102.182.56",
  "52fa5098-eb7d-4084-b9b0-b830214c089f": "10.102.161.241"
}

A syntactically incorrect version of the job was also submitted, to test that the action detects provisioning (i.e. allocation) failures for multi-device jobs.

@marosg42
Copy link

Hi George, sorry about the late response.
What we do currently is we generate a file for each agent.

job_queue: {{ job_queue }}
name: {{ job_name }}
provision_data:
    distro: {{ distro_series }}
    disks:
      - id: disk0
        disk: 0
        type: disk
        ptable: gpt
        name: disk0
   ...
reserve_data:
    ssh_keys:
      - lp:oil-ci-bot
    timeout: 21600
  • where do I specify distro_series? I am assuming it is on individual job level
  • regarding disks definition, do I put it to main job provision_data or do I have to put it to each individual job ?
  • is job_queue: multi-1 in your example real multidevice queue or is it just a descriptive string?

@marosg42
Copy link

I think I am using agent and queue terminology wrong in my thinking, let me do some tests and I will try to make better comments.

@marosg42
Copy link

@boukeas One important issue is that the job reports all done before devices are reserved

job output

{"35541dff-73d0-4a4f-b0d0-fc9a1cde2f56":"10.241.3.17","2c5e3c4b-886b-40c1-8e07-81795dddfb8a":"10.241.3.25","45ea63f1-551c-4bd1-a52f-cf7ea9237340":"10.241.3.26"}
{
  "35541dff-73d0-4a4f-b0d0-fc9a1cde2f56": "10.241.3.17",
  "2c5e3c4b-886b-40c1-8e07-81795dddfb8a": "10.241.3.25",
  "45ea63f1-551c-4bd1-a52f-cf7ea9237340": "10.241.3.26"
}

status of jobs in Testflinger

45ea63f1-551c-4bd1-a52f-cf7ea9237340	elvey	allocated	2025-01-20 15:02:17
2c5e3c4b-886b-40c1-8e07-81795dddfb8a	eevee	allocated	2025-01-20 15:02:17
35541dff-73d0-4a4f-b0d0-fc9a1cde2f56	ditto	allocated	2025-01-20 15:02:17
d1757fcc-2539-47d5-8464-09d667736cfb	multi-1	complete	2025-01-20 15:02:00

@boukeas boukeas force-pushed the CERTTF-457-multi-device-polling branch from a9a7fab to 98e8f6b Compare January 22, 2025 11:15
@boukeas boukeas marked this pull request as ready for review January 22, 2025 12:16
@boukeas boukeas requested a review from marosg42 January 22, 2025 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants