-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Api key framework #100
base: master
Are you sure you want to change the base?
Api key framework #100
Conversation
"""Constants | ||
Changes here should be reflected in the corresponding keys.sql file if | ||
necessary | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't actually a docstring /shrug
user (str): user to get the key for | ||
|
||
Returns: | ||
(str) the user's key, or None if there is no matching row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this docstring format is different than what we use elsewhere in ocflib/ocfweb
cursor.execute( | ||
'SELECT `key`' | ||
'FROM `keys`' | ||
'WHERE `user` LIKE %s', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LIKE should probably be =, we don't want to do wildcard matching here or to be case insensitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, should be exact for this
cursor.execute( | ||
'SELECT `user`' | ||
'FROM `keys`' | ||
'WHERE `key` LIKE %s', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key should probably be hashed in the db? imo it'd be preferable to require api calls to provide both the user and the api key (you can combine them together if you want so that you use one Authorization header or whatever) and have the api keys hashed in the db. then we're not storing plaintext secrets in a database which can easily be abused in the event of an unnoticed leak. (you'll need to store the users in addition to the keys if you want to salt the hashes)
same comments here wrt docstring and LIKE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some online research and thought also agree salt+hash is good
|
||
# Re-generate key in case of collision | ||
while key_exists(key): | ||
key = _generate_key() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you just make the key long, you'll never have a collision. i think this is slightly overengineered.
-- length of key defined by KEY_LENGTH in keys.py | ||
CREATE TABLE IF NOT EXISTS `keys` ( | ||
`key` varchar(32) NOT NULL, | ||
`user` varchar(255) NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is user meant to be unique? or are you allowing users to have multiple api keys?
@@ -0,0 +1,6 @@ | |||
-- length of key defined by KEY_LENGTH in keys.py | |||
CREATE TABLE IF NOT EXISTS `keys` ( | |||
`key` varchar(32) NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something is weird with your lengths. the keys you generate are 64 characters long since you hex'd them (even though you "only" have 32 bytes of random), either you'll need to make this column binary 32 bytes and convert back and forth, or make this longer.
'(`key`, `user`)' | ||
'VALUES (%s, %s, %s)', | ||
(key, user) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we won't be able to launch this unless there's also a way to expire/revoke compromised or unused keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely, this is just part 1 of a longer process
CREATE TABLE IF NOT EXISTS `keys` ( | ||
`key` varchar(32) NOT NULL, | ||
`user` varchar(255) NOT NULL | ||
PRIMARY KEY(`key`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd suggest also storing some data like time generated
@@ -0,0 +1,124 @@ | |||
from binascii import hexlify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why put the api key code in ocflib? should this be in ocfweb instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought the nice thing about ocfweb was that it wasn't tied to any databases and just used ocflib for all of its state-needing things, which does all of the database work? unless i am mistaken
idea was to have ocflib handle stateful operations and ocfweb just calls out to it when necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be a terrible idea to have the code that interacts with the database itself, e.g. key creation, be in ocflib and have ocfweb call out to it.
There's not really much state being carried here at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought the nice thing about ocfweb was that it wasn't tied to any databases and just used ocflib for all of its state-needing things, which does all of the database work? unless i am mistaken
I guess you could argue this, but it was never designed this way. I think the "database in a library" is a really bad pattern for a few reasons:
- There is no single source of truth; you can't make schema migrations in a sane way, because everything can be running old versions of ocflib (think virtualenvs especially) which tries to perform the old queries on a new schema. As a general rule, I'd argue datastores should have one owner and that owner should provide a reliable interface for consumers which allows for backend changes.
- The same argument for logic: let's say you change the length of API keys; ocfweb gets updated instantly but some other tool doesn't for a few days, and it's still generating the wrong API keys. If instead you make ocfweb the owner of the data and logic and provide an interface that things like CLIs can use, you both simplify the CLI code (and credential management!) and avoid buggy logic rollouts.
- ocflib is used everywhere and it already has too many dependencies. Why does everything that needs ocfweb need to depend on pymysql, pysnmp, sqlalchemy, paramiko, etc.
- Libraries can't do things like connection pooling, unlike applications; as a concrete example, we've had a couple cases of having to bump MySQL limits because we had too many connections from desktops using ocflib functions. They should've been hitting ocfweb API endpoints instead.
- Even if you do add it to ocflib, it's not going to be useful to most consumers anyway: only ocfweb is going to care about API keys or have the credentials to deal with them. You still can't write a CLI using the functions because you don't have a secure way to get it access to that database (unless you're gonna use sudo/setuid like makemysql, which we hate).
- Trying out your code gets harder; now you need to link your new ocflib against ocfweb so that you can actually see if you broke anything.
What is the benefit of adding another layer here? I think it just makes your lives harder in addition to the points above.
There's not really much state being carried here at all.
State isn't why I think we should avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it seems I was mistaken on the point regarding ocfweb not being tied to any databases. For some reason I thought that this was intentionally done.
In that case, I believe the other points you made make sense, and it would be much easier to test this if it was just on ocfweb instead of ocflib.
If we were to place this in ocfweb instead, would we want to actually connect ocfweb to a mysql database in settings.py
? And then for non-prod runs we would have it connect to like a sqlite instance instead of mysql so we don't have the problem of ocfweb being harder to start up. Just trying to gauge if that would be a good approach or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, yeah. We don't have prior art for that yet, but I think that's the right approach. If we're going to keep writing raw SQL (rather than using sqlalchemy or the django ORM), we'd want to use mysql in dev too though. Maybe make a shared dev database and put creds in the ocfweb config file (settings.py would read this, we don't want to embed creds in settings.py itself).
@@ -0,0 +1,6 @@ | |||
-- length of key defined by KEY_LENGTH in keys.py | |||
CREATE TABLE IF NOT EXISTS `keys` ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this definitely needs more fields, e.g. an ID, creation/expiration time, revocation status, and if possible, what endpoints this grants access to, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what endpoints this grants access to
we have 0 endpoints currently (except like tv/hours i guess lmao) so we'd need to figure out a good format for storing this
@@ -0,0 +1,124 @@ | |||
from binascii import hexlify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be a terrible idea to have the code that interacts with the database itself, e.g. key creation, be in ocflib and have ocfweb call out to it.
There's not really much state being carried here at all.
Initial Framework for API key-based auth system
This is a rough draft, would love to hear any concerns with the proposed schema.
Also are there any security concerns with storing keys in plain tied to users?
API for functions in keys.py could also use critique for signature or etc.
@Baisang
@douglaslwong