Commit Graph

34 Commits

Author SHA1 Message Date
Neil Alexander
6348486a13
Transactional isolation for /sync (#2745)
This should transactional snapshot isolation for `/sync` etc requests.

For now we don't use repeatable read due to some odd test failures with
invites.
2022-09-30 12:48:10 +01:00
Neil Alexander
3f9e38e80a
Consistent *sql.Tx usage across sync API (#2744)
This tidies up the `storage` package so that everything takes a
transaction parameter instead of something things that do and some that
don't.
2022-09-28 10:18:03 +01:00
PiotrKozimor
12649ccedd
Improve selectRoomIDsWithAnyMembershipSQL performance (#2738)
Recently I have observed that dendrite spends a lot of time (~390s) in
`selectRoomIDsWithAnyMembershipSQL` query

```
dendrite_syncapi=# select total_exec_time, left(query,100) from pg_stat_statements order by total_exec_time desc limit 5 ;
  total_exec_time   |                                                 left
--------------------+------------------------------------------------------------------------------------------------------
  747826.5800519128 | SELECT event_id, id, headered_event_json, session_id, exclude_from_sync, transaction_id, history_vis
  389130.5490339942 | SELECT DISTINCT room_id, membership FROM syncapi_current_room_state WHERE type = $2 AND state_key =
 376104.17514700035 | SELECT psd.datname, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_
   363644.164092031 | SELECT event_type_nid, event_state_key_nid, event_nid FROM roomserver_events WHERE event_nid = ANY($
  58570.48104699995 | SELECT event_id, headered_event_json FROM syncapi_current_room_state WHERE room_id = $1 AND ( $2::te
(5 rows)
```

Explain analyze showed correct usage of `syncapi_room_state_unique`
index:

```
dendrite_syncapi=#
explain analyze SELECT distinct room_id, membership FROM syncapi_current_room_state WHERE type = 'm.room.member' AND state_key = '@qjfl:dendrite.stg.globekeeper.com';
                                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2749.38..2749.56 rows=24 width=52) (actual time=2.933..2.956 rows=65 loops=1)
   ->  Sort  (cost=2749.38..2749.44 rows=24 width=52) (actual time=2.932..2.937 rows=65 loops=1)
         Sort Key: room_id, membership
         Sort Method: quicksort  Memory: 34kB
         ->  Index Scan using syncapi_room_state_unique on syncapi_current_room_state  (cost=0.41..2748.83 rows=24 width=52) (actual time=0.030..2.890 rows=65 loops=1)
               Index Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
 Planning Time: 0.140 ms
 Execution Time: 2.990 ms
(8 rows)
```

Multi-column indexes in Postgres shall perform well for leftmost
columns, but I gave it a try and created
`syncapi_current_room_state_type_state_key_idx` index. I could observe
significant performance improvement. Execution time dropped from 2.9 ms
to 0.24 ms:

```
explain analyze SELECT distinct room_id, membership FROM syncapi_current_room_state WHERE type = 'm.room.member' AND state_key = '@qjfl:dendrite.stg.globekeeper.com';
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=96.46..96.64 rows=24 width=52) (actual time=0.199..0.218 rows=65 loops=1)
   ->  Sort  (cost=96.46..96.52 rows=24 width=52) (actual time=0.199..0.202 rows=65 loops=1)
         Sort Key: room_id, membership
         Sort Method: quicksort  Memory: 34kB
         ->  Bitmap Heap Scan on syncapi_current_room_state  (cost=4.53..95.91 rows=24 width=52) (actual time=0.048..0.139 rows=65 loops=1)
               Recheck Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
               Heap Blocks: exact=59
               ->  Bitmap Index Scan on syncapi_current_room_state_type_state_key_idx  (cost=0.00..4.53 rows=24 width=0) (actual time=0.037..0.037 rows=65 loops=1)
                     Index Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
 Planning Time: 0.236 ms
 Execution Time: 0.242 ms
(11 rows)
```

Next improvement is skipping DISTINCT and rely on map assignment in
`SelectRoomIDsWithAnyMembership`. Execution time drops by almost half:

```
explain analyze SELECT room_id, membership FROM syncapi_current_room_state WHERE type = 'm.room.member' AND state_key = '@qjfl:dendrite.stg.globekeeper.com';
                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on syncapi_current_room_state  (cost=4.53..95.91 rows=24 width=52) (actual time=0.032..0.113 rows=65 loops=1)
   Recheck Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
   Heap Blocks: exact=59
   ->  Bitmap Index Scan on syncapi_current_room_state_type_state_key_idx  (cost=0.00..4.53 rows=24 width=0) (actual time=0.021..0.021 rows=65 loops=1)
         Index Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
 Planning Time: 0.087 ms
 Execution Time: 0.136 ms
(7 rows)
```

In our env we spend only 1s on inserting to table, so the write penalty
of creating an index should be small.
```
dendrite_syncapi=# select total_exec_time, left(query,100) from pg_stat_statements where query like '%INSERT%syncapi_current_room_state%' order by total_exec_time desc;
  total_exec_time   |                                                 left
--------------------+------------------------------------------------------------------------------------------------------
 1139.9057619999971 | INSERT INTO syncapi_current_room_state (room_id, event_id, type, sender, contains_url, state_key, he
(1 row)
``` 

This PR does not require test modifications.

### Pull Request Checklist

<!-- Please read docs/CONTRIBUTING.md before submitting your pull
request -->

* [x] I have added added tests for PR _or_ I have justified why this PR
doesn't need tests.
* [x] Pull request includes a [sign
off](https://github.com/matrix-org/dendrite/blob/main/docs/CONTRIBUTING.md#sign-off)

Signed-off-by: `Piotr Kozimor <p1996k@gmail.com>`
2022-09-27 09:41:36 +01:00
Neil Alexander
955e69a3b7
Optimise SharedUsers again by using complete composite index 2022-09-09 14:18:45 +01:00
Neil Alexander
6ee758df63
Optimise shared users query in Synx API slightly by removing a potential sort 2022-09-09 13:50:50 +01:00
Till
9fe509b18d
Fix syncapi shared users query & device lists (#2614)
* Fix query issue, only add "changed" users if we actually share a room

* Avoid log spam if context is done

* Undo changes to filterSharedUsers

* Add logging again..

* Fix SQLite shared users query

* Change query to include invited users
2022-08-03 18:35:17 +02:00
Till
081f5e7226
Update database migrations, remove goose (#2264)
* Add new db migration

* Update migrations
Remove goose

* Add possibility to test direct upgrades

* Try to fix WASM test

* Add checks for specific migrations

* Remove AddMigration
Use WithTransaction
Add Dendrite version to table

* Fix linter issues

* Update tests

* Update comments, outdent if

* Namespace migrations

* Add direct upgrade tests, skipping over one version

* Split migrations

* Update go version in CI

* Fix copy&paste mistake

* Use contexts in migrations

Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-07-25 10:39:22 +01:00
Till
a7e92f8cb9
History visibility database changes (#2533)
* Add new history_visibility column

* Update SQL queries to include history_visibility

* Store the history visibilty calculated by the roomserver

* Update GMSL

* Update migrations

* Fix migration

* Update GMSL

* Fix `go.sum`

* Update GMSL to use sql.Scanner & sql.Valuer

* Re-order migration/table creation

* Update gomatrixserverlib

* Add history_visibility column to current_room_state

* Fix migrations

* Return error instead of Fatal log

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-07-18 14:46:15 +02:00
Neil Alexander
90bf01d8b1
Use sync API database in filterSharedUsers (#2572)
* Add function to the sync API storage package for filtering shared users

* Use the database instead of asking the RS API

* Fix unit tests

* Fix map handling in `filterSharedUsers`
2022-07-15 16:25:26 +01:00
Till
2a5b8e0306
Only load members of newly joined rooms (#2389)
* Only load members of newly joined rooms

* Comment that the query is prepared at runtime

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-28 18:53:28 +02:00
Till
29f2168789
Make /messages filterable (#2347)
* Make /messages filterable
Fix bug when determining if an event contains an URL

* Add newly passing test

* Fix test
2022-04-13 13:16:02 +02:00
kegsay
6d25bd6ca5
syncapi: add more tests; fix more bugs (#2338)
* syncapi: add more tests; fix more bugs

bugfixes:
 - The postgres impl of TopologyTable.SelectEventIDsInRange did not use the provided txn
 - The postgres impl of EventsTable.SelectEvents did not preserve the ordering of the input event IDs in the output events slice
 - The sqlite impl of EventsTable.SelectEvents did not use a bulk `IN ($1)` query.

Added tests:
 - `TestGetEventsInRangeWithTopologyToken`
 - `TestOutputRoomEventsTable`
 - `TestTopologyTable`

* -p 1 for now
2022-04-08 17:53:24 +01:00
S7evinK
d8facd6308
Fix SQL statement for PurgeRoomState (#2280) 2022-03-16 11:25:50 +01:00
Neil Alexander
507a8e6773
Don't range entire state for /sync (#2270)
* Don't range entire state for rooms the user has no reason to care about

* Remove unnecessary db field in postgresql
2022-03-11 12:48:45 +00:00
S7evinK
cf525d1f61
Implement /context (#2207)
* Add QueryEventsAfter

* Add /context

* Make all tests pass on sqlite

* Add queries to get the events for /context requests

* Move /context to the syncapi

* Revert "Add QueryEventsAfter"

This reverts commit 440a771d10632622e8c65d35fe90f0804bc98862.

* Simplify getting the required events

* Apply RoomEventFilter when getting events

* Add passing tests

* Remove logging

* Remove unused SQL statements
Update comments & add TODO
2022-02-21 17:12:22 +01:00
Neil Alexander
6e44450cc9
Don't re-request state events that are already in the timeline (#1739)
* Don't request state events if we already have the timeline events (Postgres only)

* Rename variable

* nocyclo

* Add SQLite

* Tweaks

* Revert query change

* Don't dedupe if asking for full state

* Update query
2021-02-04 12:20:37 +00:00
Neil Alexander
b70238f2d5
Basic sync filtering (#1721)
* Add some filtering (postgres only for now)

* Fix build error

* Try to use request filter

* Use default filter as a template when retrieving from the database

* Remove unused strut

* Update sytest-whitelist

* Add filtering to SelectEarlyEvents

* Fix Postgres selectEarlyEvents query

* Attempt filtering on SQLite

* Test limit, set field for limit/order in prepareWithFilters

* Remove debug logging, add comments

* Tweaks, debug logging

* Separate SQLite stream IDs

* Fix filtering in current state table

* Fix lock issues

* More tweaks

* Current state requires room ID

* Review comments
2021-01-19 18:00:42 +00:00
Neil Alexander
e1ace7e44a
Add event ID index on current state table (helps performance) (#1649) 2020-12-16 18:16:39 +00:00
Neil Alexander
bad81c028f
Don't recalculate event ID so often in sync (#1624)
* Don't bail so quickly in fetchMissingStateEvents

* Don't recalculate event IDs so often in sync API

* Add comments

* Fix comments

* Update to matrix-org/gomatrixserverlib@eb6a890
2020-12-09 18:07:17 +00:00
Neil Alexander
20a01bceb2
Pass pointers to events — reloaded (#1583)
* Pass events as pointers

* Fix lint errors

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update to matrix-org/gomatrixserverlib#240
2020-11-16 15:44:53 +00:00
Neil Alexander
965f068d1a
Handle state with input event as new events (#1415)
* SendEventWithState events as new

* Use cumulative state IDs for final event

* Error wrapping in calculateAndSetState

* Handle overwriting same event type and state key

* Hacky way to spot historical events

* Don't exclude from sync

* Don't generate output events when rewriting forward extremities

* Update output event check

* Historical output events

* Define output room event type

* Notify key changes on state

* Don't send our membership event twice

* Deduplicate state entries

* Tweaks

* Remove unnecessary nolint

* Fix current state upsert in sync API

* Send auth events as outliers, state events as rewrite

* Sync API don't consume state events

* Process events actually

* Improve outlier check

* Fix local room check

* Remove extra room check, it seems to break the whole damn world

* Fix federated join check

* Fix nil pointer exception

* Better comments on DeduplicateStateEntries

* Reflow forced federated joins

* Don't force federated join for possibly even local invites

* Comment SendEventWithState better

* Rewrite room state in sync API storage

* Add TODO

* Clean up all room data when receiving create event

* Don't generate output events for rewrites, but instead notify that state is rewritten on the final new event

* Rename to PurgeRoom

* Exclude backfilled messages from /sync

* Split out rewriting state from updating state from state res

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2020-09-15 11:17:46 +01:00
Neil Alexander
9d53351dc2
Component-wide TransactionWriters (#1290)
* Offset updates take place using TransactionWriter

* Refactor TransactionWriter in current state server

* Refactor TransactionWriter in federation sender

* Refactor TransactionWriter in key server

* Refactor TransactionWriter in media API

* Refactor TransactionWriter in server key API

* Refactor TransactionWriter in sync API

* Refactor TransactionWriter in user API

* Fix deadlocking Sync API tests

* Un-deadlock device database

* Fix appservice API

* Rename TransactionWriters to Writers

* Move writers up a layer in sync API

* Document sqlutil.Writer interface

* Add note to Writer documentation
2020-08-21 10:42:08 +01:00
Neil Alexander
5aaf32bbed
Select distinct on room memberships in sync API (#1292) 2020-08-21 09:57:52 +01:00
Neil Alexander
b24747b305
Transaction writer changes, move roomserver writers (#1285)
* Updated TransactionWriters, moved locks in roomserver, various other tweaks

* Fix redaction deadlocks

* Fix lint issue

* Rename SQLiteTransactionWriter to ExclusiveTransactionWriter

* Fix us not sending transactions through in latest events updater
2020-08-19 15:38:27 +01:00
Henrik Sölver
83f038e12b
Don't use more than 999 variables in SQLite querys. (#1224)
Closes #1223

Signed-off-by: Henrik Sölver <henrik.solver@gmail.com>

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-07-27 13:19:30 +01:00
Neil Alexander
b6bc132485
Use TransactionWriter in other component SQLite (#1209)
* Use TransactionWriter on other component SQLites

* Fix sync API tests

* Fix panic in media API

* Fix a couple of transactions

* Fix wrong query, add some logging output

* Add debug logging into StoreEvent

* Adjust InsertRoomNID

* Update logging
2020-07-21 15:48:21 +01:00
Kegsay
ecd7accbad
Rehuffle where things are in the internal package (#1122)
renamed:    internal/eventcontent.go -> internal/eventutil/eventcontent.go
	renamed:    internal/events.go -> internal/eventutil/events.go
	renamed:    internal/types.go -> internal/eventutil/types.go
	renamed:    internal/http/http.go -> internal/httputil/http.go
	renamed:    internal/httpapi.go -> internal/httputil/httpapi.go
	renamed:    internal/httpapi_test.go -> internal/httputil/httpapi_test.go
	renamed:    internal/httpapis/paths.go -> internal/httputil/paths.go
	renamed:    internal/routing.go -> internal/httputil/routing.go
	renamed:    internal/basecomponent/base.go -> internal/setup/base.go
	renamed:    internal/basecomponent/flags.go -> internal/setup/flags.go
	renamed:    internal/partition_offset_table.go -> internal/sqlutil/partition_offset_table.go
	renamed:    internal/postgres.go -> internal/sqlutil/postgres.go
	renamed:    internal/postgres_wasm.go -> internal/sqlutil/postgres_wasm.go
	renamed:    internal/sql.go -> internal/sqlutil/sql.go
2020-06-12 14:55:57 +01:00
Kegsay
24d8df664c
Fix #897 and shuffle directory around (#1054)
* Fix #897 and shuffle directory around

* Update find-lint

* goimports

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-05-21 14:40:13 +01:00
Kegsay
1b34130a5b
Finish merging syncserver.go (#1033)
* Refactor all postgres tables; start work on sqlite

* wip sqlite merges; database is locked errors to investigate and failing tests

* Revert "wip sqlite merges; database is locked errors to investigate and failing tests"

This reverts commit 26cbfc5b75ae2dc4fb31a838b917aa39d758f162.

* convert current room state table

* port over sqlite topology table

* remove a few functions

* remove more functions

* Share more code

* factor out completesync and a bit more

* Remove remaining code
2020-05-14 16:11:37 +01:00
Neil Alexander
ad5849d222
HeaderedEvents in sync API (#922)
* Use HeaderedEvent in syncapi

* Update notifier test

* Fix persisting headered event

* Clean up unused API function

* Fix overshadowed err from linter

* Write headered JSON to invites table too

* Rename event_json to headered_event_json in syncapi database schemae

* Fix invites_table queries

* Update QueryRoomVersionCapabilitiesResponse comment

* Fix syncapi SQLite
2020-03-19 12:07:01 +00:00
Prateek Sachan
c019ad7086
Log errors from rows.Close (#920)
* Log errors from rows.Close

* fixed imports

* Added contextual messages

* fixed review changes
2020-03-18 10:17:18 +00:00
Kegsay
a97b8eafd4
Add peer-to-peer support into Dendrite via libp2p and fetch (#880)
* Use a fork of pq which supports userCurrent on wasm

* Use sqlite3_js driver when running in JS

* Add cmd/dendritejs to pull in sqlite3_js driver for wasm only

* Update to latest go-sqlite-js version

* Replace prometheus with a stub. sigh

* Hard-code a config and don't use opentracing

* Latest go-sqlite3-js version

* Generate a key for now

* Listen for fetch traffic rather than HTTP

* Latest hacks for js

* libp2p support

* More libp2p

* Fork gjson to allow us to enforce auth checks as before

Previously, all events would come down redacted because the hash
checks would fail. They would fail because sjson.DeleteBytes didn't
remove keys not used for hashing. This didn't work because of a build
tag which included a file which no-oped the index returned.

See https://github.com/tidwall/gjson/issues/157

When it's resolved, let's go back to mainline.

* Use gjson@1.6.0 as it fixes https://github.com/tidwall/gjson/issues/157

* Use latest gomatrixserverlib for sig checks

* Fix a bug which could cause exclude_from_sync to not be set

Caused when sending events over federation.

* Use query variadic to make lookups actually work!

* Latest gomatrixserverlib

* Add notes on getting p2p up and running

Partly so I don't forget myself!

* refactor: Move p2p specific stuff to cmd/dendritejs

This is important or else the normal build of dendrite will fail
because the p2p libraries depend on syscall/js which doesn't work
on normal builds.

Also, clean up main.go to read a bit better.

* Update ho-http-js-libp2p to return errors from RoundTrip

* Add an LRU cache around the key DB

We actually need this for P2P because otherwise we can *segfault*
with things like: "runtime: unexpected return pc for runtime.handleEvent"
where the event is a `syscall/js` event, caused by spamming sql.js
caused by "Checking event signatures for 14 events of room state" which
hammers the key DB repeatedly in quick succession.

Using a cache fixes this, though the underlying cause is probably a bug
in the version of Go I'm on (1.13.7)

* breaking: Add Tracing.Enabled to toggle whether we do opentracing

Defaults to false, which is why this is a breaking change. We need
this flag because WASM builds cannot do opentracing.

* Start adding conditional builds for wasm to handle lib/pq

The general idea here is to have the wasm build have a `NewXXXDatabase`
that doesn't import any postgres package and hence we never import
`lib/pq`, which doesn't work under WASM (undefined `userCurrent`).

* Remove lib/pq for wasm for syncapi

* Add conditional building to remaining storage APIs

* Update build script to set env vars correctly for dendritejs

* sqlite bug fixes

* Docs

* Add a no-op main for dendritejs when not building under wasm

* Use the real prometheus, even for WASM

Instead, the dendrite-sw.js must mock out `process.pid` and
`fs.stat` - which must invoke the callback with an error (e.g `EINVAL`)
in order for it to work:

```
    global.process = {
        pid: 1,
    };
    global.fs.stat = function(path, cb) {
        cb({
            code: "EINVAL",
        });
    }
```

* Linting
2020-03-06 10:23:55 +00:00
Kegsay
5caae6f3a0
sqlite: fixes from sytest (#872)
* bugfix: fix panic on new invite events from sytest

I'm unsure why the previous code didn't work, but it's
clearer, quicker and easier to read the `LastInsertID()` way.
Previously, the code would panic as the SELECT would fail
to find the last inserted row ID.

* sqlite: Fix UNIQUE violations and close more cursors

- Add missing `defer rows.Close()`
- Do not have the state block NID as a PRIMARY KEY else it breaks for blocks
  with >1 state event in them. Instead, rejig the queries so we can still
  have monotonically increasing integers without using AUTOINCREMENT (which
  mandates PRIMARY KEY).

* sqlite: Add missing variadic function

* Use LastInsertId because empirically it works over the SELECT form (though I don't know why that is)

* sqlite: Fix invite table by using the global stream pos rather than one specific to invites

If we don't use the global, clients don't get notified about any invites
because the position is too low.

* linting: shadowing

* sqlite: do not use last rowid, we already know the stream pos!

* sqlite: Fix account data table in syncapi by commiting insert txns!

* sqlite: Fix failing federation invite

Was failing with 'database is locked' due to multiple write txns
being taken out.

* sqlite: Ensure we return exactly the number of events found in the database

Previously we would return exactly the number of *requested* events, which
meant that several zero-initialised events would bubble through the system,
failing at JSON serialisation time.

* sqlite: let's just ignore the problem for now....

* linting
2020-02-20 09:28:03 +00:00
Kegsay
b6ea1bc67a
Support sqlite in addition to postgres (#869)
* Move current work into single branch

* Initial massaging of clientapi etc (not working yet)

* Interfaces for accounts/devices databases

* Duplicate postgres package for sqlite3 (no changes made to it yet)

* Some keydb, accountdb, devicedb, common partition fixes, some more syncapi tweaking

* Fix accounts DB, device DB

* Update naffka dependency for SQLite

* Naffka SQLite

* Update naffka to latest master

* SQLite support for federationsender

* Mostly not-bad support for SQLite in syncapi (although there are problems where lots of events get classed incorrectly as backward extremities, probably because of IN/ANY clauses that are badly supported)

* Update Dockerfile -> Go 1.13.7, add build-base (as gcc and friends are needed for SQLite)

* Implement GET endpoints for account_data in clientapi

* Nuke filtering for now...

* Revert "Implement GET endpoints for account_data in clientapi"

This reverts commit 4d80dff4583d278620d9b3ed437e9fcd8d4674ee.

* Implement GET endpoints for account_data in clientapi (#861)

* Implement GET endpoints for account_data in clientapi

* Fix accountDB parameter

* Remove fmt.Println

* Fix insertAccountData SQLite query

* Fix accountDB storage interfaces

* Add empty push rules into account data on account creation (#862)

* Put SaveAccountData into the right function this time

* Not sure if roomserver is better or worse now

* sqlite work

* Allow empty last sent ID for the first event

* sqlite: room creation works

* Support sending messages

* Nuke fmt.println

* Move QueryVariadic etc into common, other device fixes

* Fix some linter issues

* Fix bugs

* Fix some linting errors

* Fix errcheck lint errors

* Make naffka use postgres as fallback, fix couple of compile errors

* What on earth happened to the /rooms/{roomID}/send/{eventType} routing

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-02-13 17:27:33 +00:00