Add new article on how to choose embedder type #3058

guimachiavelli · 2024-11-28T19:12:55Z

Closes #3040

meili-bot · 2024-11-28T19:13:11Z

How to see the preview of this PR?

⚠️ Private link, only accessible to Meilisearch employees.

Go to this URL: https://website-git-deploy-preview-mei-16-meili.vercel.app/docs/branch:3040-choose-embedder-guide

Credentials to access the page are in the company's password manager as "Docs deploy preview".

learn/ai_powered_search/choose_an_embedder.mdx

macraig · 2025-01-21T20:49:36Z

learn/ai_powered_search/choose_an_embedder.mdx

+
+In these cases, you will have to supply your own embedder.
+
+## Only choose Hugging Face when self-hosting small static datasets


This was initially advice for Cloud users using HF embedders because we were generating the embeddings locally (running on our Cloud infra). This is no longer the case, we removed the option on the Cloud and we've replaced it with the Hugging Face Inference points using the REST embedder option.

Self-hosted users can still use HuggingFace as an embedder option, as they can tweak their infrastructure to fit their specific needs.

We can either remove this section, or point users in the direction of how to set a HF embedder using the REST option (for Cloud) and the API reference (for self hosted)

I think there may be something a bit confusing about how we handle hugging face right now.

We have a huggingFace embedder source, but it works very differently from all other built-in embedders in that it must run locally. At the same time, if I understand correctly, there's HF's inference points, which instead work more or less exactly like open AI and ollama. So we end up treating hugging face as a monolithic choice due to it having its dedicated source, which may not be ideal for users who must use hugging face but don't want to run it locally.

This may be my ignorance speaking, but it almost seems as if our official hugging face source gives the worst of the two available options for existing HF users.

Yes, I see how it gets a bit confusing. I'll try to describe the options clearly and we can go from there.

Options available on Cloud:

OpenAI

Custom (user-provided)

REST: used to set up HF inference points, ollama, Cloudflare, or any embedder running on a third-party that supports REST format

Options available for self-hosted:

OpenAI

user-provided

REST: set up any embedder running on a third-party that supports REST

huggingface: set up a local HF model of choice that will run locally

ollama: similar to REST but specific to ollama config

macraig · 2025-01-21T20:53:39Z

learn/ai_powered_search/choose_an_embedder.mdx

+
+OpenAI returns relevant search results across different subjects and datasets. It is suited for the majority of applications and Meilisearch actively supports and improves OpenAI functionality with every new release.
+
+In the majority of cases, and especially if this is your first time working with LLMs and AI-powered search, choose OpenAI.


(Feel free to ignore this one as it might be just me) I wonder if this doesn't sound a bit too biased towards OpenAI, making it sound like it's our provider of choice over others. Maybe we can phrase it around "ease of config" or "easiest for beginners" as it only requires pasting the OpenAI key.

I'm working off @dureuill's comments on #3040 for this. It is very biased, and I do feel a bit uncomfortable with pushing a third-party service so hard when documenting such a strategic feature.

At the same time, it seems this really is the best guidance we can give from a technical perspective: if you don't have any strong preference or aren't already working with another service, choose OpenAI.

macraig · 2025-01-21T20:54:12Z

learn/ai_powered_search/choose_an_embedder.mdx

+
+If you are already using a specific model from a compatible embedder, choose Meilisearch's REST embedder. This ensures you continue building upon tooling and workflows already in place with minimal configuration necessary.
+
+## If dealing with non-textual content, choose the user-provided embedder


Suggested change

## If dealing with non-textual content, choose the user-provided embedder

## If dealing with non-textual content, choose the Custom (user-provided) embedder

I don't want parentheses in the heading, but I'll include "custom embedder" somewhere in the section body.

macraig · 2025-01-21T20:59:59Z

learn/ai_powered_search/choose_an_embedder.mdx

+
+## If dealing with non-textual content, choose the user-provided embedder
+
+Meilisearch does not support searching images, audio, or any other content not presented as text. This limitation applies to both queries and documents. For example, Meilisearch's built-in embedder sources cannot search using an image instead of text. They also cannot use text to search for images without attached textual metadata.


pinging @dureuill as I'm not 100% sure about this one

yes that is correct. We may want to specify that, by supplying the embeddings generated using their own embedder, the user can indeed achieve these use cases.

Co-authored-by: macraig <[email protected]>

add rough draft

bd4a8a7

guimachiavelli added 2 commits January 21, 2025 15:58

minor copy changes

12492a1

change title, add article to sidebar

4117935

guimachiavelli marked this pull request as ready for review January 21, 2025 15:12

guimachiavelli requested review from macraig and dureuill January 21, 2025 15:12

fix incorrect statement regarding user-provided embedders on the cloud

1b12476

macraig requested changes Jan 21, 2025

View reviewed changes

Update learn/ai_powered_search/choose_an_embedder.mdx

856e7dd

Co-authored-by: macraig <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new article on how to choose embedder type #3058

Add new article on how to choose embedder type #3058

guimachiavelli commented Nov 28, 2024

meili-bot commented Nov 28, 2024

macraig Jan 21, 2025

macraig Jan 21, 2025

guimachiavelli Jan 23, 2025 •

edited

Loading

guimachiavelli Jan 23, 2025

macraig Jan 23, 2025 •

edited

Loading

macraig Jan 21, 2025

guimachiavelli Jan 23, 2025

macraig Jan 21, 2025

guimachiavelli Jan 23, 2025

macraig Jan 21, 2025

dureuill Jan 22, 2025


		In these cases, you will have to supply your own embedder.

		## Only choose Hugging Face when self-hosting small static datasets


		OpenAI returns relevant search results across different subjects and datasets. It is suited for the majority of applications and Meilisearch actively supports and improves OpenAI functionality with every new release.

		In the majority of cases, and especially if this is your first time working with LLMs and AI-powered search, choose OpenAI.


		If you are already using a specific model from a compatible embedder, choose Meilisearch's REST embedder. This ensures you continue building upon tooling and workflows already in place with minimal configuration necessary.

		## If dealing with non-textual content, choose the user-provided embedder


		## If dealing with non-textual content, choose the user-provided embedder

		Meilisearch does not support searching images, audio, or any other content not presented as text. This limitation applies to both queries and documents. For example, Meilisearch's built-in embedder sources cannot search using an image instead of text. They also cannot use text to search for images without attached textual metadata.

Add new article on how to choose embedder type #3058

Are you sure you want to change the base?

Add new article on how to choose embedder type #3058

Conversation

guimachiavelli commented Nov 28, 2024

meili-bot commented Nov 28, 2024

How to see the preview of this PR?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guimachiavelli Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macraig Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guimachiavelli Jan 23, 2025 •

edited

Loading

macraig Jan 23, 2025 •

edited

Loading