Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

myriad running in ci when nothing has changed #180

Open
joprice opened this issue Jan 7, 2025 · 10 comments
Open

myriad running in ci when nothing has changed #180

joprice opened this issue Jan 7, 2025 · 10 comments

Comments

@joprice
Copy link

joprice commented Jan 7, 2025

I'm seeing my myriad generators run in CI and as far as I can tell they're the most time consuming step for the build as I'm using them quite heavily and haven't spent time optimizing them.

I see that there's a cache file written to the build folder to control whether they run locally https://github.com/MoiraeSoftware/myriad/blob/3c9818faabf9d508c10c28d5ecd26e66fafb48a1/src/Myriad.Sdk/build/Myriad.Sdk.targets#L130.

Ideally, in CI, the task wouldn't run unless the inputs changed, and then if it does run, it's due to a change not being checked in, a later build step in my github actions would catch the repo modification.

I haven't checked how the hash is calculated, whether it will be platform-independent. Is there any downside to modifying my gitignore and checking these in?

@7sharp9
Copy link
Collaborator

7sharp9 commented Jan 7, 2025

Its this bit: https://github.com/MoiraeSoftware/myriad/blob/3c9818faabf9d508c10c28d5ecd26e66fafb48a1/src/Myriad.Sdk/build/Myriad.Sdk.targets#L140-L142

And the section below thats creates the hash. Its supposed to combine the config file, executable (myriad), codegen identity elements, as well as the extra bits below to determine if it should run. Ive sometimes had issues where it refuses to run and Ive had to force a run, theres an msbuild prop to force it i think. Ive never relly had an issue with it runing al the time though. I use myraid to gen code then check it in, as having the code missing normally means broken code checked in if theres a generation issue. It is possible there is a bug in the cache but It was somewhat copied and modified from MS internal msbuild code so there might just be a minor issue, or just a CI gotcha...

@joprice
Copy link
Author

joprice commented Jan 7, 2025

Yea I experienced something along those lines when iterating on my custom generators. I ended up touching input files in the consuming project to make sure the generator gets re-run when the generator itself changes. So perhaps some other dependency artifact needs to be included in the hash.

But I'll still need to figure out if that hash differs in a CI versus local env. If it includes the myriad binary, and I'm on osx and testing on ubuntu, I would assume the hash would differ, unless the MyriadSdk_Generator_Exe refers to a platform-independent dll. I'll check what hashes I end up getting for the same commit in CI and local.

On the performance side at the root of why I'm looking into it, I wonder if my generators are just slow because parsing the same files multiple times and I'd assume that parsing and generating the trees would dominate the runtime. I currently have separate generators for each attribute. In some cases, I need this because for example, I have one for json decoders and one for npgsql decoders and I want to be able to selectively disable them for fable vs non-fable projects. I have another that generates similar functionality as https://github.com/janestreet/ppx_fields_conv that would run for either type of project so could benefit from a single call to the parser. Do you typically merge them into a single generator and have it work with multiple attributes or have each generator "own" a single attribute?

@7sharp9
Copy link
Collaborator

7sharp9 commented Jan 8, 2025

Well as things get bigger I would have a generator that catalogued the bits its interested in, then generate all at once. The fileds plugin is supposed to be a subset of ppx_fields. I just never got round to adding all the functions. Plus I think there were a few generated functions that I didnt understand the use.

@joprice
Copy link
Author

joprice commented Jan 8, 2025

I'll look into some optimizations and the caching when I get some more free time.

On the fields ppx functionality, there's a lot in there that I haven't found a use for, but I've just done a few that are useful to me like fold

  let model: Tuples = { a = 1, 2; b = [| 3, 4; 5, 6 |] }
  let len =
    Tuples.fold
      (a = (fun acc (a, b) -> acc + a + b),
       b = (fun acc values -> acc + (values |> Seq.sumBy (fun (b, c) -> b + c))))
      0
      model
  Expect.equal len 21 "len"

(also want to do one that passes the Field type with field name and getter, which is defined, but don't pass in this simple case)

and a polymorphic make

  let model: Tuples = { a = 1, 2; b = [| 3, 4; 5, 6 |] }
  let map = Map [ "a", model.a :> obj; "b", model.b ]
  Tuples.make (
      { new FieldFn<Tuples> with
          member _.eval f = map[f.name] |> unbox
      }
    )

I plan on open sourcing it eventually but the repo is a mess right now and there's a few hardcoded assumptions I'd want to generalize.

@joprice
Copy link
Author

joprice commented Jan 20, 2025

I finally got around to checking this out and found that the hashes are completely different in ci than locally, even when providing a runtime to match the remote build's environment. I wonder if the inputs could only take into account the logical identity of the myriad generator, instead of the binary itself.

@7sharp9
Copy link
Collaborator

7sharp9 commented Jan 21, 2025

What do you mean by logical identity? Could do a version match, or FQN

@joprice
Copy link
Author

joprice commented Jan 21, 2025

Yea something that stands for the behavior of the inputs, not based on the binary layout of the binary, for instance assuming that the myriad binary produced for different architectures would behave the same across them. So version makes sense when there's a lockfile ensuring the integrity of it.

@7sharp9
Copy link
Collaborator

7sharp9 commented Jan 21, 2025

GetAssemblyIdentity might be able to do this. Then hash the result of that instead.

@joprice
Copy link
Author

joprice commented Jan 21, 2025

I'm not familiar with it, but I checked the doc and it seems to fit the bill.

As an aside, another potentially related issue I've noticed is that when using dotnet watch, I sometimes see a series of logs like

File changed: ...
Started
Exited
File changed: ...
Started
...

Where each file is myriad generated file gets updated in succession. I wonder if both this and the CI system could benefit from not overwriting the file when there's no changes (at the cost of a potentially large string comparison). The issue in my project (and probably othres) is that generated file tend to be early in the DAG, so invalidate many others, causing a waterfall of recompilation, which is especially noticeable in the case of dotnet watch

@7sharp9
Copy link
Collaborator

7sharp9 commented Jan 21, 2025

I've not used Myriad in a watch situation so I cant comment on that part. The complexity of the file writing is made a bit worse by the fact its possible to write in-file as well as an independent seperate file. I think technically the file would change as the myriad generated part would be removed, and then changed again when it is re-written. I would have to re-vist the code though as I just cant remember how that bit works right now :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants