Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added support for audio timestamp understanding to Google Vertex #4061

Merged
merged 9 commits into from
Dec 17, 2024

Conversation

timconnorz
Copy link
Contributor

Changes to have the Google Vertex provider support audioTimestamp understanding

  • Updated the google-cloud/vertex package to latest (v1.9.2) which is required for..
  • Added audioTimestamp to GoogleVertexSettings, which is passed to the GenerationConfig of the sdk

@shaper
Copy link
Contributor

shaper commented Dec 11, 2024

Hi there, thank you for the contribution! As you may have seen, we recently shipped a 2.0 update to the google-vertex provider:

https://x.com/aisdk/status/1866044262409765270

As part of this we moved to using the Vertex AI Gemini REST API instead of the google-cloud/vertex package. It is likely pretty straightforward to add it using REST instead.

Just looking briefly at the example on the page you linked it looks like the submitted audio would be handled as a file attachment, which we already have support for, so I am not sure we need the cachedContent setting. We would need a way to add the "generatationConfig": { "audioTimestamp": true }. I think this would require using experimental_providerMetadata to tag the message with the file, and then in message conversion or just outside of it we'd add it to the request as needed. @lgrammel may have further thoughts.

We would need unit tests for new logic, demo scripts in examples/ai-core/src/{generate,stream}Text with a sample audio snippet, and added test cases similarly for generate/stream in the examples/ai-core/src/e2e/google-vertex.test.ts file.

If this sounds like a lot we can put it in our feature request queue, please file an issue or link to one if it already exists.

@timconnorz
Copy link
Contributor Author

timconnorz commented Dec 11, 2024

@shaper I've updated the PR, it's only two edits to support this now! You can use it by passing audioTimestamp param to the model settings. This settings object is also where you configure other output-effecting parameters like structuredOutputs, safetySettings, etc. so I figured it made sense to live here.

image

@lgrammel
Copy link
Collaborator

Lgtm. We would need an example under examples/ai-core to see how this works, a changeset (patch release), and docs updated for vertex.

@timconnorz
Copy link
Contributor Author

@lgrammel I've added an example, updated the docs, and added a changeset file. let me know if this is satisfactory! thanks for your guidance 😎

@colinyoung
Copy link

Would love to see this one get in!

Copy link
Contributor

@shaper shaper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, just a few small things, thanks for the continued work! Would like to help land this with you.

* Optional. Enables timestamp understanding for audio-only files.
* This is a preview feature.
*
* Available for the following models:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of listing the supported models here, is there a page we can link to in vertex docs?

@@ -109,6 +109,7 @@ export class GoogleGenerativeAILanguageModel implements LanguageModelV1 {
this.supportsStructuredOutputs
? convertJSONSchemaToOpenAPISchema(responseFormat.schema)
: undefined,
audioTimestamp: this.settings.audioTimestamp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time this.settings.audioTimestamp won't be defined, but as written this will then add audioTimestamp: undefined to every request.

Can we alter this to something like the below to avoid that?

...(this.settings.audioTimestamp && { audioTimestamp: this.settings.audioTimestamp }

Optional. Enables timestamp understanding for audio files. Defaults to false.

This is useful for generating transcripts with accurate timestamps.
Only available for `gemini-1.5-pro-002` and `gemini-1.5-flash-002`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as below re: is there a Vertex doc page we can link to rather than specifying the models here, just to simplify maintenance.

@timconnorz timconnorz requested a review from shaper December 17, 2024 14:49
@shaper shaper merged commit db31e74 into vercel:main Dec 17, 2024
8 of 9 checks passed
@shaper
Copy link
Contributor

shaper commented Dec 17, 2024

Thanks again!

@timconnorz
Copy link
Contributor Author

Thanks for your help @shaper! How does the release process work? Any idea when this would be rolled out?

@shaper
Copy link
Contributor

shaper commented Dec 17, 2024

Thanks for your help @shaper! How does the release process work? Any idea when this would be rolled out?

It is automated and triggered by one of our team members. I will publish this today and follow up when it's live, should be within an hour or so.

@shaper
Copy link
Contributor

shaper commented Dec 17, 2024

Ah, @lgrammel already did it, it should be live in the versions noted here: #4118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants