Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemini Rate Limit Issue #786

Open
gurugecl opened this issue Jan 3, 2025 · 4 comments
Open

Gemini Rate Limit Issue #786

gurugecl opened this issue Jan 3, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@gurugecl
Copy link

gurugecl commented Jan 3, 2025

I am running into a rate limit issue when using Gemini 2.0 Flash. According to this thread in LiteLLM I believe it is supported 7188. The rate limit errors comes on the very first query to the docs object and no amount of additional waiting or exponential backoff seems to resolve it so I feel that it might be due to something happening behind the scenes with either PaperQA or LiteLLM however they do not believe this to be an issue on their end 7358.

Here is how I am using paperqa. Can you please confirm if this is utilized correctly to avoid rate limits or if not how I should set it up correctly. Thank you!

docs = Docs()

docs.add(pdf_path, pdf_file, docname=pdf_file)

pickle_buffer = io.BytesIO()
pickle.dump(docs, pickle_buffer)
pickle_buffer.seek(0)

docs = pickle.load(f)

model = 'gemini/gemini-2.0-flash-exp'

settings = Settings(
        llm=model,
        summary_llm=model,
        llm_config={
            "model_list": [{
                "model_name": model,
                "litellm_params": {
                    "model": model,
                    "api_key": os.environ.get('GEMINI_API_KEY')
                }
            }],
            "rate_limit": {
                "gemini/gemini-2.0-flash-exp": "10 per 1 minute"  
            }
        },
        summary_llm_config={
            "model_list": [{
                "model_name": model,
                "litellm_params": {
                    "model": model,
                    "api_key": os.environ.get('GEMINI_API_KEY')
                }
            }],
            "rate_limit": {
                "gemini/gemini-2.0-flash-exp": "10 per 1 minute"
            }
        }
    )
    
  
 max_choices = len(list(docs.docnames))
 settings.answer.answer_max_sources = max_choices
 settings.answer.evidence_k = relevancy * max_choices
    
 model_response = docs.query(model_input, settings=settings)

Here are some of the logs/errors:

litellm.acompletion(model=gpt-4o-2024-08-06) 200 OK
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Retrying request to /embeddings in 0.377732 seconds
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Using in-memory rate limiter.
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"

Error during paper query after 0 retries: Timeout (60.00999999999663 secs): rate limit for key: ('client', 'gemini/gemini-2.0-flash-exp')
Copy link

dosubot bot commented Jan 3, 2025

Hey there, @gurugecl! I'm here to help you with any bugs, questions, or contributions you may have.

The rate limit issue you're encountering with the Gemini 2.0 Flash model in the paperqa library might be related to how the Docs object and the settings are configured. Here are a few points to consider:

  1. Concurrent Requests: The Docs class in paperqa uses asynchronous methods, which can lead to multiple concurrent requests. Ensure that the max_concurrent_requests setting is configured to a value that respects the rate limits of the Gemini API. If this setting is too high, it could cause rate limit issues even on the first query [1].

  2. Rate Limit Configuration: You have specified a rate limit of "10 per 1 minute" for the Gemini model. Verify that this matches the actual rate limit provided by the Gemini API. If the API's rate limit is lower, adjust your configuration accordingly.

  3. In-Memory Rate Limiter: The logs mention using an in-memory rate limiter. Ensure that this rate limiter is correctly configured to handle the rate limits specified in your settings. If the rate limiter is not properly managing the requests, it could lead to immediate rate limit errors.

  4. Retry Logic: The error message indicates a timeout after 0 retries. Consider implementing a more robust retry mechanism with exponential backoff to handle transient rate limit errors more effectively.

  5. API Key and Model Configuration: Double-check that the API key and model configurations in litellm_params are correct and that the API key has the necessary permissions for the Gemini model.

By addressing these points, you should be able to mitigate the rate limit issues you're experiencing. If the problem persists, further investigation into the specific interactions between paperqa, LiteLLM, and the Gemini API might be necessary.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@gurugecl
Copy link
Author

gurugecl commented Jan 4, 2025

Resolved by modifying the Docs setup even though it seemed to work without an issue initially

@derspotter
Copy link

i think your rate limit is wrong:
"gemini/gemini-2.0-flash-exp": "10 per 1 minute"

this is not rpm but tpm. so 4 Million.

@Exemocaro
Copy link

Hi @gurugecl , could you please share your working code? I'm having trouble setting this up with Gemini and you seem to have made it work. I tried your code in here but it's missing a lot of stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants