Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookupatts cannot be NoneType even if defaultid is set #71

Open
fromm1990 opened this issue Oct 1, 2024 · 3 comments
Open

Lookupatts cannot be NoneType even if defaultid is set #71

fromm1990 opened this issue Oct 1, 2024 · 3 comments

Comments

@fromm1990
Copy link
Contributor

Setting the defaultidvalue does not prevent insert when lookupatt is None

PostgreSQL throws the following error:

Exception has occurred: NotNullViolation
null value in column "country_name" of relation "country_dim" violates not-null constraint

Expected behaviour:
defaultidvalue is used when lookupatt(s) is None.

CREATE TABLE IF NOT EXISTS dim.country_dim(
    country_id               SMALLINT     PRIMARY KEY,
    country_name             VARCHAR(56)  UNIQUE NOT NULL,
    alpha2                   CHAR(2)      UNIQUE NOT NULL,
    alpha3                   CHAR(3)      UNIQUE NOT NULL
);

INSERT INTO dim.country_dim(country_id, country_name, alpha2, alpha3)
    VALUES (-1, 'unknown', 'XX', 'XXX');
from pygrametl import ConnectionWrapper
from pygrametl.tables import CachedDimension

cw = ConnectionWrapper(...)

dim = CachedDimension(
        name="dim.country_dim",
        key="country_id",
        attributes=[
            "country_name",
            "alpha2",
            "alpha3"
        ],
        lookupatts=["country_name"],
        targetconnection=cw,
        defaultidvalue=-1,
)
rows = [
    {"country_name": "Australia", "alpha2": "AU", "alpha3": "AUS"},
    {"country_name": "Brazil", "alpha2": "BR", "alpha3": "BRA"},
    {"country_name": "Canada", "alpha2": "CA", "alpha3": "CAN"},
    {"country_name": "Denmark", "alpha2": "DK", "alpha3": "DNK"},
    {"country_name": None, "alpha2": None, "alpha3": None},
]


for row in rows:
    dim.ensure(row)
@fromm1990
Copy link
Contributor Author

The error persist even with the lookupatt being set to "unknown".

rows = [
    {"country_name": "Australia", "alpha2": "AU", "alpha3": "AUS"},
    {"country_name": "Brazil", "alpha2": "BR", "alpha3": "BRA"},
    {"country_name": "Canada", "alpha2": "CA", "alpha3": "CAN"},
    {"country_name": "Denmark", "alpha2": "DK", "alpha3": "DNK"},
    {"country_name": "unknown", "alpha2": None, "alpha3": None},
]

Only by removing the defaultidvalue parameter does the error disappear.

dim = CachedDimension(
        name="dim.country_dim",
        key="country_id",
        attributes=[
            "country_name",
            "alpha2",
            "alpha3"
        ],
        lookupatts=["country_name"],
        targetconnection=cw
)

Am I using the defaultidvalue parameter incorrectly?

@chrthomsen
Copy link
Owner

Hi,

ensure tries to lookup the given row and if that does not succeed, ensure will insert the row. It does not make special assumptions about what to do if (some of) the lookupatts contain None. So for the first example, it tries to insert the given row (which happens to contain None for country_name), but this gets rejected by PostgreSQL (due to a NOT NULL violation). I'm not sure if it would make sense to have special behavior in pygrametl for None values in lookupatts. Should the special behavior only apply if they are all None or should it also apply when some of them are?

In the second example, ensure does lookup a row with "country_name"=="unknown" and that will find your preloaded member. But due to tables.py#L332, that value will not be returned when defaultidvalue has the same value. Instead, pygrametl will attempt to insert the row and thus there is another NOT NULL violation (due to alpha2/alpha3).

To solve your current problem, I guess you could make your code only call ensure if there are non-None values for all lookupatts (and when they contain None, you can make your fact row point to the preloaded (-1, "unknown", ...) dimension member.

And if dim.country_dim is preloaded, you can use lookup instead of ensure.

Best regards,
Christian

@fromm1990
Copy link
Contributor Author

@chrthomsen thanks for the quick response.

My current "workaround" is simply to substitute None values in lookupatt (country_name) with the "unknown" string. However this does only works if i remove the ´defaultidvalue´ parameter from the CachedDimension constructor. Otherwise I receive the NOT NULL violation on alpha2 and alpha3 (which is a bit surprising to me).

To sum it up, it seems that the ´defaultidvalue´ changes the insert behavior of the ensure method which I did not expect. This may be intended behavior, but surprising to me non the less : )

It's not at all a dealbreaker, just something i found interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants