Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Polars Plugin for High-Volume Email Validation #5

Open
bnkc opened this issue Nov 30, 2024 · 0 comments
Open

Integrate Polars Plugin for High-Volume Email Validation #5

bnkc opened this issue Nov 30, 2024 · 0 comments
Labels
enhancement New feature or request Feature

Comments

@bnkc
Copy link
Owner

bnkc commented Nov 30, 2024

Description:

I propose adding a Polars plugin to the emval library to enable high-performance email validation directly within Polars DataFrames. This integration would allow users to efficiently validate large datasets of email addresses, leveraging emval's speed and Polars' data manipulation strengths.

Benefits:

  • Performance: Validate entire DataFrames of emails quickly using Rust's performance.
  • Integration: Seamlessly incorporate email validation into existing Polars workflows.
  • Scalability: Handle large datasets efficiently with minimal performance overhead.

Proposed Usage:

The plugin would enable email validation with the following syntax:

import polars as pl
from emval.polars import validate_email

df = pl.DataFrame({
    'email': [
        '[email protected]',
        'invalid-email',
        '[email protected]',
        'user@[192.168.1.1]',
        ''
    ]
})

# Apply the email validation plugin
df = df.with_columns(
    validated=validate_email(
        pl.col('email'),
        allow_smtputf8=True,
        allow_empty_local=False,
        allow_quoted_local=False,
        allow_domain_literal=False,
        deliverable_address=True,
    )
)

# Access the fields from the Struct column
df = df.with_columns(
    original=pl.col('validated').struct.field('original'),
    normalized=pl.col('validated').struct.field('normalized'),
    local_part=pl.col('validated').struct.field('local_part'),
    domain_name=pl.col('validated').struct.field('domain_name'),
    domain_address=pl.col('validated').struct.field('domain_address'),
    is_deliverable=pl.col('validated').struct.field('is_deliverable'),
).drop('validated')

print(df)

Proposed Project Structure:

emval/
├── __init__.py
├── validator.py
├── model.py
├── polars/
│   ├── __init__.py
│   └── plugin.py
src/
├── lib.rs             # Main module for emval
├── validators/        # Additional validation logic
├── polars_plugin.rs   # Polars plugin module

Optional Installation:

The Polars plugin should be an optional dependency, installable via:

pip install emval[polars]

This ensures the base emval library remains lightweight for users who don’t require the plugin.

Reference Documentation:

@bnkc bnkc added enhancement New feature or request Feature labels Nov 30, 2024
@bnkc bnkc changed the title Proposal: Integrate Polars Plugin for High-Performance Email Validation Integrate Polars Plugin for High-Performance Email Validation Nov 30, 2024
@bnkc bnkc changed the title Integrate Polars Plugin for High-Performance Email Validation Integrate Polars Plugin for High-Volume Email Validation Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Feature
Projects
None yet
Development

No branches or pull requests

1 participant