feat: added support for audio timestamp understanding to Google Vertex #4061

timconnorz · 2024-12-11T03:55:30Z

Changes to have the Google Vertex provider support audioTimestamp understanding

Updated the google-cloud/vertex package to latest (v1.9.2) which is required for..
Added audioTimestamp to GoogleVertexSettings, which is passed to the GenerationConfig of the sdk

shaper · 2024-12-11T05:37:55Z

Hi there, thank you for the contribution! As you may have seen, we recently shipped a 2.0 update to the google-vertex provider:

https://x.com/aisdk/status/1866044262409765270

As part of this we moved to using the Vertex AI Gemini REST API instead of the google-cloud/vertex package. It is likely pretty straightforward to add it using REST instead.

Just looking briefly at the example on the page you linked it looks like the submitted audio would be handled as a file attachment, which we already have support for, so I am not sure we need the cachedContent setting. We would need a way to add the "generatationConfig": { "audioTimestamp": true }. I think this would require using experimental_providerMetadata to tag the message with the file, and then in message conversion or just outside of it we'd add it to the request as needed. @lgrammel may have further thoughts.

We would need unit tests for new logic, demo scripts in examples/ai-core/src/{generate,stream}Text with a sample audio snippet, and added test cases similarly for generate/stream in the examples/ai-core/src/e2e/google-vertex.test.ts file.

If this sounds like a lot we can put it in our feature request queue, please file an issue or link to one if it already exists.

timconnorz · 2024-12-11T18:17:00Z

@shaper I've updated the PR, it's only two edits to support this now! You can use it by passing audioTimestamp param to the model settings. This settings object is also where you configure other output-effecting parameters like structuredOutputs, safetySettings, etc. so I figured it made sense to live here.

lgrammel · 2024-12-12T08:54:16Z

Lgtm. We would need an example under examples/ai-core to see how this works, a changeset (patch release), and docs updated for vertex.

timconnorz · 2024-12-13T19:07:03Z

@lgrammel I've added an example, updated the docs, and added a changeset file. let me know if this is satisfactory! thanks for your guidance 😎

colinyoung · 2024-12-16T18:20:31Z

Would love to see this one get in!

shaper

Hi, just a few small things, thanks for the continued work! Would like to help land this with you.

shaper · 2024-12-17T06:31:43Z

packages/google/src/google-generative-ai-settings.ts

+   * Optional. Enables timestamp understanding for audio-only files.
+   * This is a preview feature.
+   *
+   * Available for the following models:


Instead of listing the supported models here, is there a page we can link to in vertex docs?

shaper · 2024-12-17T06:34:47Z

packages/google/src/google-generative-ai-language-model.ts

@@ -109,6 +109,7 @@ export class GoogleGenerativeAILanguageModel implements LanguageModelV1 {
        this.supportsStructuredOutputs
          ? convertJSONSchemaToOpenAPISchema(responseFormat.schema)
          : undefined,
+      audioTimestamp: this.settings.audioTimestamp,


Most of the time this.settings.audioTimestamp won't be defined, but as written this will then add audioTimestamp: undefined to every request.

Can we alter this to something like the below to avoid that?

...(this.settings.audioTimestamp && { audioTimestamp: this.settings.audioTimestamp }

shaper · 2024-12-17T06:37:34Z

content/providers/01-ai-sdk-providers/11-google-vertex.mdx

+  Optional. Enables timestamp understanding for audio files. Defaults to false.
+
+  This is useful for generating transcripts with accurate timestamps.
+  Only available for `gemini-1.5-pro-002` and `gemini-1.5-flash-002`.


Same comment as below re: is there a Vertex doc page we can link to rather than specifying the models here, just to simplify maintenance.

shaper · 2024-12-17T18:24:01Z

Thanks again!

timconnorz · 2024-12-17T18:31:15Z

Thanks for your help @shaper! How does the release process work? Any idea when this would be rolled out?

shaper · 2024-12-17T19:16:50Z

Thanks for your help @shaper! How does the release process work? Any idea when this would be rolled out?

It is automated and triggered by one of our team members. I will publish this today and follow up when it's live, should be within an hour or so.

shaper · 2024-12-17T19:27:33Z

Ah, @lgrammel already did it, it should be live in the versions noted here: #4118

vercel bot had a problem deploying to Preview December 11, 2024 03:56 Failure

support audioTimestamp

d0afd31

timconnorz force-pushed the vertex-audiotimestamp-support branch from d27dc5f to d0afd31 Compare December 11, 2024 18:12

vercel bot deployed to Preview December 11, 2024 18:14 View deployment

Merge branch 'main' into vertex-audiotimestamp-support

22667cb

vercel bot deployed to Preview December 12, 2024 08:54 View deployment

timconnorz and others added 3 commits December 13, 2024 12:07

Merge branch 'main' into vertex-audiotimestamp-support

f2bdb1f

added examples and updated docs

abdd59b

changeset

72b15c1

lgrammel and others added 2 commits December 16, 2024 19:46

Merge branch 'main' into vertex-audiotimestamp-support

96711f6

prettier fix

6a24509

shaper reviewed Dec 17, 2024

View reviewed changes

address comments

c82444b

timconnorz requested a review from shaper December 17, 2024 14:49

prettier fix

6cabd73

shaper approved these changes Dec 17, 2024

View reviewed changes

shaper merged commit db31e74 into vercel:main Dec 17, 2024
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: added support for audio timestamp understanding to Google Vertex #4061

feat: added support for audio timestamp understanding to Google Vertex #4061

timconnorz commented Dec 11, 2024

shaper commented Dec 11, 2024

timconnorz commented Dec 11, 2024 •

edited

Loading

lgrammel commented Dec 12, 2024

timconnorz commented Dec 13, 2024

colinyoung commented Dec 16, 2024

shaper left a comment

shaper Dec 17, 2024

shaper Dec 17, 2024

shaper Dec 17, 2024

shaper commented Dec 17, 2024

timconnorz commented Dec 17, 2024

shaper commented Dec 17, 2024

shaper commented Dec 17, 2024

feat: added support for audio timestamp understanding to Google Vertex #4061

feat: added support for audio timestamp understanding to Google Vertex #4061

Conversation

timconnorz commented Dec 11, 2024

shaper commented Dec 11, 2024

timconnorz commented Dec 11, 2024 • edited Loading

lgrammel commented Dec 12, 2024

timconnorz commented Dec 13, 2024

colinyoung commented Dec 16, 2024

shaper left a comment

Choose a reason for hiding this comment

shaper Dec 17, 2024

Choose a reason for hiding this comment

shaper Dec 17, 2024

Choose a reason for hiding this comment

shaper Dec 17, 2024

Choose a reason for hiding this comment

shaper commented Dec 17, 2024

timconnorz commented Dec 17, 2024

shaper commented Dec 17, 2024

shaper commented Dec 17, 2024

timconnorz commented Dec 11, 2024 •

edited

Loading