You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
db_cursor.execute(f"SELECT data FROM {self.db_source_table} WHERE data->>'hostname' = %s", [hostname])
This gives, for example, the following query plan:
# explain(analyse,buffers) select data from hosts_source WHERE data->>'hostname' = 'battersea.uio.no';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on hosts_source (cost=0.00..1364.44 rows=64 width=696) (actual time=0.829..11.086 rows=1 loops=1)
Filter: ((data ->> 'hostname'::text) = 'battersea.uio.no'::text)
Rows Removed by Filter: 12762
Buffers: shared hit=1217
Planning Time: 0.054 ms
Execution Time: 11.120 ms
(6 rows)
PostgreSQL has no choice but to perform a sequential scan on the whole table. This is pretty fast, around 11ms, as the table is only 9.5MB. However, the query can run approx 250 times faster if we create a GIN index and rewrite the query to use the jsonb containment operator:
# create index hosts_source_gin_idx on hosts_source using GIN (data jsonb_path_ops);
CREATE INDEX
# explain(analyse,buffers) select data from hosts_source WHERE data @> '{"hostname": "battersea.uio.no"}';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on hosts_source (cost=8.01..10.02 rows=1 width=696) (actual time=0.020..0.021 rows=1 loops=1)
Recheck Cond: (data @> '{"hostname": "battersea.uio.no"}'::jsonb)
Heap Blocks: exact=1
Buffers: shared hit=5
-> Bitmap Index Scan on hosts_source_gin_idx (cost=0.00..8.01 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=1)
Index Cond: (data @> '{"hostname": "battersea.uio.no"}'::jsonb)
Buffers: shared hit=4
Planning:
Buffers: shared hit=16
Planning Time: 0.190 ms
Execution Time: 0.042 ms
(11 rows)
We can see that postgreSQL could use the index and required significantly less IO (buffers) than before.
Note the this will also speed up other queries (including updates and deletes) that search for hostname. For example:
db_cursor.execute(f"SELECT DISTINCT data->>'hostname' FROM {self.db_source_table} WHERE data->'sources' ? %s", [source])
It is possible to create an index that can be used here, eg
# create index hosts_source_gin_sources_idx on hosts_source using GIN ((data -> 'sources'));
But any performance improvement is somewhat dependent on how many rows can be discarded by the filter and how big the table is. The postgresql documentation says that if querying for particular items within the "sources" key is common, defining an index like this may be worthwhile. Some quick experiments led to 0x to 3-4x speed improvements depending on the search key used.
A final note is that on for example postgresql 14, jsonpath queries will use the jsonb_path_ops index created above.
The text was updated successfully, but these errors were encountered:
Database tables
hosts
andhosts_source
do not define any indexes. Even with small(ish) tables, this can lead to extra IO and 'slow' queries.Background info:
https://www.postgresql.org/docs/14/datatype-json.html#JSON-INDEXING
https://www.postgresql.org/docs/11/datatype-json.html#JSON-INDEXING
In these examples
hosts_source
has 12763 rows and is approx 9.5MB.A typical query:
zabbix-auto-config/zabbix_auto_config/processing.py
Line 258 in 13d42cf
This gives, for example, the following query plan:
PostgreSQL has no choice but to perform a sequential scan on the whole table. This is pretty fast, around 11ms, as the table is only 9.5MB. However, the query can run approx 250 times faster if we create a GIN index and rewrite the query to use the jsonb containment operator:
We can see that postgreSQL could use the index and required significantly less IO (buffers) than before.
Note the this will also speed up other queries (including updates and deletes) that search for hostname. For example:
zabbix-auto-config/zabbix_auto_config/processing.py
Line 182 in 13d42cf
Here, PostgreSQL could use the index to find all the rows matching the hostname first and then apply the filter on
sources
.However this query will not use the index:
zabbix-auto-config/zabbix_auto_config/processing.py
Line 171 in 13d42cf
It is possible to create an index that can be used here, eg
But any performance improvement is somewhat dependent on how many rows can be discarded by the filter and how big the table is. The postgresql documentation says that if querying for particular items within the "sources" key is common, defining an index like this may be worthwhile. Some quick experiments led to 0x to 3-4x speed improvements depending on the search key used.
A final note is that on for example postgresql 14, jsonpath queries will use the
jsonb_path_ops
index created above.The text was updated successfully, but these errors were encountered: