-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added db.select_from_TABLE methods #1828
base: master
Are you sure you want to change the base?
Conversation
I was going to not add the
|
gramps/gen/db/generic.py
Outdated
@@ -2739,3 +2739,73 @@ def set_serializer(self, serializer_name): | |||
self.serializer = BlobSerializer | |||
elif serializer_name == "json": | |||
self.serializer = JSONSerializer | |||
|
|||
def select_from_table( | |||
self, table_name, what=None, where=None, order_by=None, env=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that currently you only pass a single table_name but if you were to make this low level method take an array of table_names then I think everything exists to do cross table selects.
With that it would be possible to add a generic filter UI that allows the user to enter a query defined in terms of tables, what, where and order_by strings. I readily acknowledge that this might be a more "advanced user" feature but it would be really powerful.
It will also allow code to execute more complex queries directly within the DB ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenyoungs, I like this idea! First, let's make sure that we can get this type of method added, and then think about expanding it.
(I'd like to see an idea of how you would represent a JOIN, but I don't want you to spend too much time if this PR gets rejected.)
@Nick-Hall and other reviewers, I've been thinking for a very long time about how to add a select-style method in Gramps, while keeping with a Python interface. This PR represents the best that I can come up with. Note that the syntax for the The strings are parsed by Python into an Abstract Syntax Tree (ast) that is then used to generate the SQL syntax. I wrote the Evaluator with different DB engines in mind, in case they use different syntax for JSON extraction, etc. The code is fairly minimal and with low complexity to make it easy to maintain and extend. Let me know if you have concerns or ideas for improvement. |
@DavidMStraub this could serve as a replacement in gramps-web for both gramps-ql and object-ql as it is converted into SQL. (It doesn't yet allow everything that the others do though). |
One thing that I realize that this doesn't respect is filters and proxies. But I think that can be fixed. Some options:
Other ideas? |
@Nick-Hall, actually, I'm realizing that we have a bigger issue: if you have a proxy/filter in place, then you might not be able to access all of the items in the JSON data. That means that: person_data.family_list != person_object.family_list if a family does not appear in the filter/proxy. It could be that if we have a filter or proxy, we must force the DataDict to generate the object through methods like |
If I may ... I think it's really great that so much refactoring and improvement is happening, but I find it a bit strange that so many things are merged so quickly without (sorry - at least my impression) considering all the implications (I was triggered by the example with proxies and filters), while at the same time my simple PR which does nothing but enable static type checking has been open for half a year. Static type checking would make the refactoring less dangerous. |
@DavidMStraub, nothing has been merged yet that has any effect on the implications I have raised above. The implication is for the things being considered for merging. It would be great if we had more developers (like yourself) that would be able to comment on such implications. So, no, things aren't being merged "too quickly" and without thinking about consequences. Working on the what is next gives us insight into complex issues. So no need to get triggered by such a realization. Regarding type checking: yes, I would have merged that PR many months ago because I am very familiar with the benefits of typing, and realize there are no down sides. But also, the implication above is the realization that a "type" (eg, In any event, we need to refactor this PR, and the filter refactor PR. And probably adjust the |
One of your optimisations is to keep the data in a |
@dsblank This PR reminds me of the db.collcetion.find method in MongoDB. It may be worth a quick look if you are unfamiliar with it. You may get some ideas. I like how you have made the query pythonic. This is better than previous SQL-like designs and the JSON queries of MongoDB. @DavidMStraub We seem to have been discussing this on and off for about 7 or 8 years now, so I don't think that the progress is too fast. There have also been a couple of prototypes. The static type checking PR makes changes to 51 files. I tend to leave this type of change until fairly close to release in order to avoid potential conflicts when merging up fixes from the maintenance branch. Also the smaller changes tend to be easier to fit in when I have time available. Your PR is on my schedule though. @stevenyoungs Yes. Proxies are mainly used in the report and export code. I don't mind if these are not optimised to use the new code, but we must make sure that they don't run significantly slower than at present. Some people already have to wait a long time for certain reports to run. I don't regard this PR as essential for the next release, but it may be worth continuing to investigate our options. |
BTW, (The problem is that |
Here is an example of where we would need to be careful about falling back to a regular loop through the data in the case of a proxy: If #1794 is merged, it has an optimization for looking for But we could add code to the
But, if A fix is to add:
then the rule.map is not created, and the standard rule.apply_to_one() would do the regular check. Here are the time comparisons (seconds) without and with the select/map:
Finally, if you wanted to force a loop in a proxy, you could still do this:
|
Bah! @stevenyoungs pointed out that the proxies properly process raw data. All my worrying above, and some of my comments about |
the |
#1839 will allow efficient get_raw_* functions in proxies. |
This PR adds methods designed to be implemented in a low-level DB system, like SQL. The human-facing code is all Python, and gets parsed into SQL. All of the code that is converted into SQL is written as strings. This allows coders to write in the same syntax that is supported by the
DataDict
interface (minus the object-creation variation).For example, you could select all of the male people with:
(Person is defined in the environment evaluated in.)
By default, the methods returns a
DataDict
per row. But you can optionally select one attribute ("person.handle") or a list of attributes (["person.handle", "person.gramps_id"]) using thewhat
parameter.All arguments are optional.
Further Examples: