The api.jobs_GET endpoint currently has very poor performance
(~8s latency) because it has to do a full table scan. The same query is
performed by jobs.index but it appears that the performance measurement
there is drowned out by not logged in users opening the page, which
skips the query. Adding an index on owner_id should change the full
table scan to an index lookup, improving performance.
---
Found this while looking through metrics.sr.ht. I don't have a database
large enough to check if this improves performance, so some testing on that
side might be required.
The chart for the api.jobs_GET endpoint latency can be seen here
https://metrics.sr.ht/chart.svg?title=api.jobs_GET%20request%20latency%20in%20seconds&query=rate(request_time_sum{instance%3D%22builds.sr.ht%3A80%22%2Croute%3D%22api.jobs_GET%22}[1d])%20%2F%20rate(request_time_count[1d])&max=10&min=0&since=336h&step=10000&height=3&width=10
...14e_add_index_for_owner_id_to_job_table.py | 22 +++++++++++++++++++
buildsrht/types/job.py | 2 +-
2 files changed, 23 insertions(+), 1 deletion(-)
create mode 100644 buildsrht/alembic/versions/808a8f41714e_add_index_for_owner_id_to_job_table.py
diff --git a/buildsrht/alembic/versions/808a8f41714e_add_index_for_owner_id_to_job_table.py b/buildsrht/alembic/versions/808a8f41714e_add_index_for_owner_id_to_job_table.py
new file mode 100644
index 0000000..bcd3003
--- /dev/null
@@ -0,0 +1,22 @@
+"""Add index for owner_id to Job table
+
+Revision ID: 808a8f41714e
+Revises: e0956b115701
+Create Date: 2021-03-12 01:35:07.919372
+
+"""
+
+# revision identifiers, used by Alembic.
+revision = '808a8f41714e'
+down_revision = 'e0956b115701'
+
+from alembic import op
+import sqlalchemy as sa
+
+
+def upgrade():
+ op.create_index(op.f('ix_job_owner_id'), 'job', ['owner_id'], unique=False)
+
+
+def downgrade():
+ op.drop_index(op.f('ix_job_owner_id'), table_name='job')
diff --git a/buildsrht/types/job.py b/buildsrht/types/job.py
index 6456b43..2ef807e 100644
--- a/buildsrht/types/job.py
@@ -19,7 +19,7 @@ class Job(Base):
created = sa.Column(sa.DateTime, nullable=False)
updated = sa.Column(sa.DateTime, nullable=False)
manifest = sa.Column(sa.Unicode(16384), nullable=False)
- owner_id = sa.Column(sa.Integer, sa.ForeignKey('user.id'), nullable=False)
+ owner_id = sa.Column(sa.Integer, sa.ForeignKey('user.id'), index=True, nullable=False)
owner = sa.orm.relationship('User', backref=sa.orm.backref('jobs'))
job_group_id = sa.Column(sa.Integer, sa.ForeignKey('job_group.id'))
job_group = sa.orm.relationship('JobGroup', backref=sa.orm.backref('jobs'))
--
2.25.1
On 12/03/2021 14.33, Drew DeVault wrote:
> This doesn't make sense. Foreign keys are always indexed.
PostgreSQL does not automatically create indices for foreign keys, it
only does so on unique constraints and primary keys.
I've looked through several other endpoints with high latency on
builds.sr.ht, it seems like most of them would benefit from this index
(/~username/tag, /~username/tag.svg, /~username.svg, /~username/rss.xml)
which is 6 out of 8 builds.sr.ht endpoints with average latency >100ms.