Commit Graph

91 Commits

Author SHA1 Message Date
Neil Alexander
f0c8a03649
Membership updater refactoring (#2541)
* Membership updater refactoring

* Pass in membership state

* Use membership check rather than referring to state directly

* Delete irrelevant membership states

* We don't need the leave event after all

* Tweaks

* Put a log entry in that I might stand a chance of finding

* Be less panicky

* Tweak invite handling

* Don't freak if we can't find the event NID

* Use event NID from `types.Event`

* Clean up

* Better invite handling

* Placate the almighty linter

* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand

* Fix the sytest after all (thanks @S7evinK for the spot)
2022-07-22 14:44:04 +01:00
Neil Alexander
3ea21273bc
Ristretto cache (#2563)
* Try Ristretto cache

* Tweak

* It's beautiful

* Update GMSL

* More strict keyable interface

* Fix that some more

* Make less panicky

* Don't enforce mutability checks for now

* Determine mutability using deep equality

* Tweaks

* Namespace keys

* Make federation caches mutable

* Update cost estimation, add metric

* Update GMSL

* Estimate cost for metrics better

* Reduce counters a bit

* Try caching events

* Some guards

* Try again

* Try this

* Use separate caches for hopefully better hash distribution

* Fix bug with admitting events into cache

* Try to fix bugs

* Check nil

* Try that again

* Preserve order jeezo this is messy

* thanks VS Code for doing exactly the wrong thing

* Try this again

* Be more specific

* aaaaargh

* One more time

* That might be better

* Stronger sorting

* Cache expiries, async publishing of EDUs

* Put it back

* Use a shared cache again

* Cost estimation fixes

* Update ristretto

* Reduce counters a bit

* Clean up a bit

* Update GMSL

* 1GB

* Configurable cache sizees

* Tweaks

* Add `config.DataUnit` for specifying friendly cache sizes

* Various tweaks

* Update GMSL

* Add back some lazy loading caching

* Include key in cost

* Include key in cost

* Tweak max age handling, config key name

* Only register prometheus metrics if requested

* Review comments @S7evinK

* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)

* Review comments

* Update sample configs

* Update GHA Workflow

* Update Complement images to Go 1.18

* Remove the cache test from the federation API as we no longer guarantee immediate cache admission

* Don't check the caches in the renewal test

* Possibly fix the upgrade tests

* Update to matrix-org/gomatrixserverlib#322

* Update documentation to refer to Go 1.18
2022-07-11 14:31:31 +01:00
Till
5087b36af0
Fix QuerySharedUsers for the SyncAPI keychange consumer (#2554)
* Make more use of base.BaseDendrite

* Fix QuerySharedUsers if no UserIDs are supplied
2022-07-05 14:50:56 +02:00
Neil Alexander
b50a24c666
Roomserver producers package (#2546)
* Give the roomserver a producers package

* Change init point

* Populate ACLs API

* Fix build issues

* `RoomEventProducer` naming
2022-07-01 10:54:07 +01:00
Neil Alexander
4c2a10f1a6
Handle state before, send history visibility in output (#2532)
* Check state before event

* Tweaks

* Refactor a bit, include in output events

* Don't waste time if soft failed either

* Tweak control flow, comments, use GMSL history visibility type
2022-06-13 15:11:10 +01:00
Neil Alexander
27948fb304
Optimise loadAuthEvents, add roomserver tracing 2022-06-07 14:23:26 +01:00
Neil Alexander
02e5c74101
Revert #2457
Squashed commit of the following:

commit 2bd0daf4d61376d2dd56628eaff267b0bc63e116
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jun 1 09:55:54 2022 +0100

    Revert resolving old extremities as well as new

    This may no longer be needed with the new state fixes and probably just burns more CPU time than is strictly necessary.
2022-06-01 10:09:27 +01:00
Neil Alexander
3d9fe20748
Fix bugs related to state resolution (#2507)
* Fix bugs related to state resolution

* Clean up `resolve-state`

* Don't panic when entries can't be found

* Ensure we have state entries for the auth events

* Revert "Ensure we have state entries for the auth events"

This reverts commit 9b13b7ed37f40ce6d1301d9cb423a27b0db9c897.

* Revert "Revert "Ensure we have state entries for the auth events""

This reverts commit d86db197e3e317f7d64ec6722cc60533872f4617.

* Fix bug

* Try that again

* Update gomatrixserverlib

* Remove recursion from `loadAuthEvents`
2022-06-01 09:46:21 +01:00
Neil Alexander
9eb4fec33b
Make logging output for state deletions a bit better 2022-05-26 10:38:46 +01:00
Neil Alexander
6940c7c7dd
Try to spot state deletions when they happen (#2489) 2022-05-25 16:40:31 +01:00
kegsay
6de29c1cd2
bugfix: E2EE device keys could sometimes not be sent to remote servers (#2466)
* Fix flakey sytest 'Local device key changes get to remote servers'

* Debug logs

* Remove internal/test and use /test only

Remove a lot of ancient code too.

* Use FederationRoomserverAPI in more places

* Use more interfaces in federationapi; begin adding regression test

* Linting

* Add regression test

* Unbreak tests

* ALL THE LOGS

* Fix a race condition which could cause events to not be sent to servers

If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.

* Unbreak new tests

* Linting
2022-05-17 13:23:35 +01:00
Neil Alexander
be9be2553f
Resolve over old and new extremities (#2457)
* Feed existing state into state res when calculating state from new extremities

* Remove duplicates

* Fix bug

* Sort and unique

* Update to matrix-org/gomatrixserverlib#308

* Trim the slice properly

* Update gomatrixserverlib again

* Update to matrix-org/gomatrixserverlib#308
2022-05-13 11:52:04 +01:00
Neil Alexander
09d754cfbf
One NATS instance per BaseDendrite (#2438)
* One NATS instance per `BaseDendrite`

* Fix roomserver
2022-05-09 14:15:24 +01:00
Neil Alexander
6bc6184d70
Simplify calculateLatest (#2430)
* Simplify `calculateLatest`

* Comments
2022-05-06 15:52:44 +01:00
kegsay
9957752a9d
Define component interfaces based on consumers (2/2) (#2425)
* convert remaining interfaces

* Tidy up the userapi interfaces
2022-05-05 19:30:38 +01:00
Neil Alexander
530fd488a9
Don't log consumer errors on shutdown 2022-05-05 13:29:39 +01:00
Neil Alexander
4ad5f9c982
Global database connection pool (for monolith mode) (#2411)
* Allow monolith components to share a single database pool

* Don't yell about missing connection strings

* Rename field

* Setup tweaks

* Fix panic

* Improve configuration checks

* Update config

* Fix lint errors

* Update comments
2022-05-03 16:35:06 +01:00
Till
e8dd37d533
Add metrics for internal API requests (#2310)
* Add response size and requests total to internal handler

* Move MustRegister calls to New* funcs

* Move MustRegister back to init

* Init at some place, minimize changes
2022-04-08 12:24:40 +02:00
kegsay
7499147550
Add test infrastructure code for dendrite unit/integ tests (#2331)
* Add test infrastructure code for dendrite unit/integ tests

Start re-enabling some syncapi storage tests in the process.

* Linting

* Add postgres service to unit tests

* dendrite not syncv3

* Skip test which doesn't work

* Linting

* Add `jetstream.PrepareForTests`

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-08 10:12:30 +01:00
Neil Alexander
4d9d9cc9b1
Update to matrix-org/gomatrixserverlib#300 2022-04-05 14:43:44 +01:00
Neil Alexander
98a5e410d7
Per-room consumers (#2293)
* Roomserver input refactoring — again!

* Ensure the actor runs again

* Preserve consumer after unsubscribe

* Another sprinkling of magic

* Rename `TopicFor` to `Prefixed`

* Recreate the stream if the config is bad

* Check streams too

* Prefix subjects, preserve inboxes

* Recreate if subjects wrong

* Remove stream subject

* Reconstruct properly

* Fix mutex unlock

* Comments

* Fix tests

* Don't drop events

* Review comments

* Separate `queueInputRoomEvents` function

* Re-jig control flow a bit
2022-03-23 10:20:18 +00:00
Neil Alexander
e30aa38fb0
Stream tweaks, use same codepath for sync vs async input room events, wait for error response via NATS messages (#2283) 2022-03-16 14:21:11 +00:00
Neil Alexander
67de4dbd0c
Don't send adds_state_events in roomserver output events anymore (#2258)
* Don't send `adds_state_events` in roomserver output events anymore

* Set `omitempty` on some output fields that aren't always set

* Add `AddsState` helper function

* No-op if no added state event IDs

* Revert "No-op if no added state event IDs"

This reverts commit 71a0ef3df10e0d94234d916246c30b0a4e82b26e.

* Revert "Add `AddsState` helper function"

This reverts commit c9fbe45475eb12ae44d2a8da7c0fc3a002ad9819.
2022-03-07 17:17:16 +00:00
Neil Alexander
24df85b428
Mark soft-failed events as rejected in roomserver_events (#2252) 2022-03-04 15:27:10 +00:00
Neil Alexander
a23fda6626
Update Events call-sites which now don't return an error, update parsedRespState to sort (#2227)
* Topologically sort with `SendEventWithState`, so that earlier events should satisfy auth for later ones

* Revert "Topologically sort with `SendEventWithState`, so that earlier events should satisfy auth for later ones"

This reverts commit b0cd706012b4c9b6724b11e16f19c4cb732ab286.

* Update to matrix-org/gomatrixserverlib#293

* `Events` no longer returns an error, other tweaks

* Make sure `Events` is sorted for `parsedRespState` too
2022-02-28 14:51:40 +00:00
Neil Alexander
fea8d152e7
Relax roomserver input transactional isolation (#2224)
* Don't force full transactional isolation on roomserver input

* Set succeeded

* Tweak `MissingAuthPrevEvents`
2022-02-23 15:41:32 +00:00
Neil Alexander
0b123b29f5
Use process context for roomserver input (#2198) 2022-02-17 15:58:54 +00:00
Neil Alexander
7dfc7c3d70
Don't re-send sent events in add_state_events (#2195)
* Only add events to `add_state_events` that haven't already been sent to the roomserver output before

* Filter on event NIDs instead, hopefully bring joy to SQLite

* UnsentFilter, review comments
2022-02-17 13:53:48 +00:00
Neil Alexander
5106cc807c
Ensure only one transaction is used for RS input per room (#2178)
* Ensure the input API only uses a single transaction

* Remove more of the dead query API call

* Tidy up

* Fix tests hopefully

* Don't do unnecessary work for rooms that don't exist

* Improve error, fix another case where transaction wasn't used properly

* Add a unit test for checking single transaction on RS input API

* Fix logic oops when deciding whether to use a transaction in storeEvent
2022-02-11 17:40:14 +00:00
Neil Alexander
2782ae3d56
Fix fetching missing state (#2163)
* Check that we have a populated state snapshot when determining if we closed the gap

* Do the same in the query API

* Use HasState more opportunistically

* Try to avoid falling down the hole of using a trustworthy but empty state snapshot for non-create events

* Refactor missing state and make sure that we really solve the problem for the new event

* Comments

* Review comments

* Tweak that check again

* Tidy up that create check further

* Fix build hopefully

* Update sendOutliers to use OrderAuthAndStateEvents

* Don't go out of bounds on missingEvents
2022-02-10 10:05:14 +00:00
Neil Alexander
37cbe263ce
Fix transaction issues in events table in PSQL (#2165)
* Revert "Revert "Fix storage bug in PSQL events table""

This reverts commit cf447dd52a.

* Membership updater to use updater

* Fix membership updater to use transactions properly
2022-02-10 09:30:16 +00:00
kegsay
aa5c3b88de
Unmarshal events at the Dendrite level not GMSL level (#2164)
* Use new event json types in gmsl

* Fix EventJSON to actually unmarshal events

* Update GMSL

* Bump GMSL and improve error messages

* Send back the correct RespState

* Update GMSL
2022-02-09 20:31:24 +00:00
Neil Alexander
457a07eac5
More relaxed auth event fetching (#2161)
* Tweaks around auth event fetching

* More tweaking
2022-02-08 17:06:13 +00:00
Neil Alexander
a572f4db03
Fix bugs that could wedge rooms (#2154)
* Don't flake so badly for rejected events

* Moar

* Fix panic

* Don't count rejected events as missing

* Don't treat rejected events without state as missing

* Revert "Don't count rejected events as missing"

This reverts commit 4b6139b62eb91ba059b47415b0275964b37d9b43.

* Missing events should be KindOld

* If we have state, use it, regardless of memberships which could be stale now

* Fetch missing state for KindOld too

* Tweak the condition again

* Clean up a bit

* Use room updater to get latest events in a race-free way

* Return the correct error

* Improve errors
2022-02-07 19:10:01 +00:00
Neil Alexander
532f445c4e
Remove roomserver input deadlines (#2144)
It isn't really clear that the deadlines actually help in any way. Currently we can use up our 2 minutes doing something, run out of context time and then return an error which causes the transaction to rollback and forgetting everything we've done. If the message came to us from NATS then we probably will end up retrying just to be in the same situation. We'd be really a lot better if we just spent the time reconciling the problem in the first place, and then we're much less likely to need to fetch those missing auth or prev events in the future.

Also includes matrix-org/gomatrixserverlib#287 so we don't wait so long for servers that are obviously dead.
2022-02-04 12:13:07 +00:00
Neil Alexander
eb352a5f6b
Full roomserver input transactional isolation (#2141)
* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input

* Better transaction management

* Tweak order

* Handle cases where the room does not exist

* Other fixes

* More tweaks

* Fill some gaps

* Fill in the gaps

* good lord it gets worse

* Don't roll back transactions when events rejected

* Pass through errors properly

* Fix bugs

* Fix incorrect error check

* Don't panic on nil txns

* Tweaks

* Hopefully fix panics for good in SQLite this time

* Fix rollback

* Minor bug fixes with latest event updater

* Some review comments

* Revert "Some review comments"

This reverts commit 0caf8cf53e62c33f7b83c52e9df1d963871f751e.

* Fix a couple of bugs

* Clearer commit and rollback results

* Remove unnecessary prepares
2022-02-04 10:39:34 +00:00
Neil Alexander
4d9f5b2e57
Fix panic from closing the input channel before the workers complete (it'll get GC'd either way) 2022-02-02 17:46:37 +00:00
Neil Alexander
893aa3b141
More logging tweaks 2022-01-31 16:01:54 +00:00
Neil Alexander
07d0e72a8b
Improve roomserver logging 2022-01-31 15:33:00 +00:00
Neil Alexander
d21f3eace0
Roomserver fixes (#2133)
* Improve server selection somewhat

* Remove things from the map when we're done

* Be less panicky about auth event signatures in case they are not fatal after all

* Accept HasState in all cases

* Send join asynchronously

* Revert "Send join asynchronously"

This reverts commit 5b685bfcd0b1150a66c7b1e70fb3a3eda509efd1.

* Joins and leaves use background context
2022-01-31 14:36:59 +00:00
Neil Alexander
f9547a53d2
Tweak roomserver logging for rejected events 2022-01-31 12:01:53 +00:00
Neil Alexander
ba1a9b98b7
Tweak some logging (#2130)
* Modify some log levels

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@336334f

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@cde7ac8

* Demote warning about key change producer

* Add more useful roomserver logging

* Further tweaking
2022-01-31 10:48:28 +00:00
Neil Alexander
eb8e770e99
Revert consumer change 2022-01-31 10:42:41 +00:00
Neil Alexander
a271fde8f5
Only limit context for fetching missing auth/prev events (#2131) 2022-01-31 10:39:33 +00:00
Neil Alexander
8e4002831f
Call hooks for outliers (#2119)
* Move hook call when processing room events

* Fix build

* Call hooks for outliers too
2022-01-28 13:11:56 +00:00
Neil Alexander
e9fbad6f20
Move hook call when processing room events (#2118)
* Move hook call when processing room events

* Fix build
2022-01-28 12:33:31 +00:00
Neil Alexander
48789ebec5
Don't flood Sentry with context cancelled/deadline exceeded errors (#2115) 2022-01-28 10:27:28 +00:00
Neil Alexander
a763cbb0e1
Roomserver/federation input refactor (#2104)
* Put federation client functions into their own file

* Look for missing auth events in RS input

* Remove retrieveMissingAuthEvents from federation API

* Logging

* Sorta transplanted the code over

* Use event origin failing all else

* Don't get stuck on mutexes:

* Add verifier

* Don't mark state events with zero snapshot NID as not existing

* Check missing state if not an outlier before storing the event

* Reject instead of soft-fail, don't copy roominfo so much

* Use synchronous contexts, limit time to fetch missing events

* Clean up some commented out bits

* Simplify `/send` endpoint significantly

* Submit async

* Report errors on sending to RS input

* Set max payload in NATS to 16MB

* Tweak metrics

* Add `workerForRoom` for tidiness

* Try skipping unmarshalling errors for RespMissingEvents

* Track missing prev events separately to avoid calculating state when not possible

* Tweak logic around checking missing state

* Care about state when checking missing prev events

* Don't check missing state for create events

* Try that again

* Handle create events better

* Send create room events as new

* Use given event kind when sending auth/state events

* Revert "Use given event kind when sending auth/state events"

This reverts commit 089d64d271b5fca8c104e1554711187420dbebca.

* Only search for missing prev events or state for new events

* Tweaks

* We only have missing prev if we don't supply state

* Room version tweaks

* Allow async inputs again

* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed

* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)

* Use work queue policy, deliver all on restart

* Reduce chance of duplicates being sent by NATS

* Limit the number of servers we attempt to reduce backpressure

* Some review comment fixes

* Tidy up a couple things

* Don't limit servers, randomise order using map

* Some context refactoring

* Update gmsl

* Don't resend create events

* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't

* Exclude our own servername

* Try backing off servers

* Make excluding self behaviour optional

* Exclude self from g_m_e

* Update sytest-whitelist

* Update consumers for the roomserver output stream

* Remember to send outliers for state returned from /gme

* Make full HTTP tests less upsetti

* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist

* Remove debugging test

* Fix blacklist again, remove unnecessary duplicate context

* Clearer contexts, don't use background in case there's something happening there

* Don't queue up events more than once in memory

* Correctly identify create events when checking for state

* Fill in gaps again in /gme code

* Remove `AuthEventIDs` from `InputRoomEvent`

* Remove stray field

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2022-01-27 14:29:14 +00:00
Neil Alexander
16035b9737
NATS JetStream tweaks (#2086)
* Use named NATS durable consumers

* Build fixes

* Remove dupe call to SetFederationAPI

* Use namespaced consumer name

* Fix namespacing

* Fix unit tests hopefully
2022-01-07 17:31:57 +00:00
Neil Alexander
a422321435
Fix panic at startup if roomserver was not given federation API reference by the time NATS consumes an event, tweak backpressure metrics 2022-01-07 13:41:53 +00:00