Commit Graph

271 Commits

Author SHA1 Message Date
Neil Alexander
b5a497a0c0
Allow defers to run in TestMain in federation API tests 2022-05-23 14:54:43 +01:00
Kegan Dougal
ac92e04772 Remove debug logging 2022-05-17 13:31:48 +01:00
kegsay
6de29c1cd2
bugfix: E2EE device keys could sometimes not be sent to remote servers (#2466)
* Fix flakey sytest 'Local device key changes get to remote servers'

* Debug logs

* Remove internal/test and use /test only

Remove a lot of ancient code too.

* Use FederationRoomserverAPI in more places

* Use more interfaces in federationapi; begin adding regression test

* Linting

* Add regression test

* Unbreak tests

* ALL THE LOGS

* Fix a race condition which could cause events to not be sent to servers

If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.

* Unbreak new tests

* Linting
2022-05-17 13:23:35 +01:00
kegsay
236b16aa6c
Begin adding syncapi component tests (#2442)
* Add very basic syncapi tests

* Add a way to inject jetstream messages

* implement add_state_ids

* bugfixes

* Unbreak tests

* Remove now un-needed API call

* Linting
2022-05-09 17:23:02 +01:00
Neil Alexander
79da75d483
Federation consumer adds_state_event_ids tweak (#2441)
* Don't ask roomserver for events we already have in federation API

* Check number of events returned is as expected

* Preallocate array

* Improve shape a bit
2022-05-09 16:19:35 +01:00
Neil Alexander
09d754cfbf
One NATS instance per BaseDendrite (#2438)
* One NATS instance per `BaseDendrite`

* Fix roomserver
2022-05-09 14:15:24 +01:00
kegsay
85704eff20
Clean up interface definitions (#2427)
* tidy up interfaces

* remove unused GetCreatorIDForAlias

* Add RoomserverUserAPI interface

* Define more interfaces

* Use AppServiceInternalAPI for consistent naming

* clean up federationapi constructor a bit

* Fix monolith in -http mode
2022-05-06 12:39:26 +01:00
kegsay
9957752a9d
Define component interfaces based on consumers (2/2) (#2425)
* convert remaining interfaces

* Tidy up the userapi interfaces
2022-05-05 19:30:38 +01:00
kegsay
506de4bb3d
Define component interfaces based on consumers (1/2) (#2423)
* Specify interfaces used by appservice, do half of clientapi

* convert more deps of clientapi to finer-grained interfaces

* Convert mediaapi and rest of clientapi

* Somehow this got missed
2022-05-05 13:17:38 +01:00
Neil Alexander
dd061a172e
Tidy up AddPublicRoutes (#2412)
* Simplify federation API `AddPublicRoutes`

* Simplify client API `AddPublicRoutes`

* Simplify media API `AddPublicRoutes`

* Simplify sync API `AddPublicRoutes`

* Simplify `AddAllPublicRoutes`
2022-05-03 17:17:02 +01:00
Neil Alexander
4ad5f9c982
Global database connection pool (for monolith mode) (#2411)
* Allow monolith components to share a single database pool

* Don't yell about missing connection strings

* Rename field

* Setup tweaks

* Fix panic

* Improve configuration checks

* Update config

* Fix lint errors

* Update comments
2022-05-03 16:35:06 +01:00
Neil Alexander
31799a3b2a
Device list display name fixes (#2405)
* Get device names from `unsigned` in `/user/devices`

* Fix display name updates

* Fix bug

* Fix another bug
2022-04-29 16:02:55 +01:00
Neil Alexander
6deb10f3f6
Don't answer expensive federation requests for rooms we no longer belong to (#2398)
This includes `/state`, `/state_ids`, `/get_missing_events` and `/backfill`.

This should fix #2396.
2022-04-28 11:45:56 +01:00
Neil Alexander
923f789ca3
Fix graceful shutdown 2022-04-27 15:29:49 +01:00
Neil Alexander
54ff4cf690
Don't try to federated-join via ourselves (#2383) 2022-04-27 12:23:55 +01:00
Till
e8ab2154aa
Return M_NOT_FOUND for rejected events (#2371)
* Return M_NOT_FOUND for rejected events

* Add passing tests
2022-04-25 19:05:01 +02:00
Neil Alexander
aad81b7b4d
Only call key update process functions if there are updates, don't send things to ourselves over federation 2022-04-25 14:22:46 +01:00
Till
446819e4ac
Store the EDU type in the database (#2370) 2022-04-25 11:56:50 +02:00
Neil Alexander
6d78c4d67d
Fix retrieving cross-signing signatures in /user/devices/{userId} (#2368)
* Fix retrieving cross-signing signatures in `/user/devices/{userId}`

We need to know the target device IDs in order to get the signatures and we weren't populating those.

* Fix up signature retrieval

* Fix SQLite

* Always include the target's own signatures as well as the requesting user
2022-04-22 14:58:24 +01:00
Till
e8dd37d533
Add metrics for internal API requests (#2310)
* Add response size and requests total to internal handler

* Move MustRegister calls to New* funcs

* Move MustRegister back to init

* Init at some place, minimize changes
2022-04-08 12:24:40 +02:00
Neil Alexander
99ef547295
Simplify presence stringification (should help with vector-im/element-android#5712) 2022-04-07 10:10:28 +01:00
Till
e5e3350ce1
Add presence module V2 (#2312)
* Syncapi presence

* Clientapi http presence handler

* Why is this here?

* Missing files

* FederationAPI presence implementation

* Add new presence stream

* Pinecone update

* Pinecone update

* Add passing tests

* Make linter happy

* Add presence producer

* Add presence config option

* Set user to unavailable after x minutes

* Only set currently_active if online
Avoid unneeded presence updates when syncing

* Tweaks

* Query devices for last_active_ts
Fixes & tweaks

* Export SharedUsers/SharedUsers

* Presence stream in MemoryStorage

* Remove status_msg_nil

* Fix sytest crashes

* Make presence types const and use stringer for it

* Change options to allow inbound/outbound presence

* Fix option & typo

* Update configs

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-06 13:11:19 +02:00
Neil Alexander
4d9d9cc9b1
Update to matrix-org/gomatrixserverlib#300 2022-04-05 14:43:44 +01:00
Neil Alexander
9b316ac64c
Slower federation warm-up (#2320)
* Wake destination queues gradually, rather than all at once

* Delay device list updates too

* Maximum two minute warmup period
2022-04-04 15:14:10 +01:00
S7evinK
49dc49b232
Remove eduserver (#2306)
* Move receipt sending to own JetStream producer

* Move SendToDevice to producer

* Remove most parts of the EDU server

* Fix SendToDevice & copyrights

* Move structs, cleanup EDU Server traces

* Use HeadersOnly subscription

* Missing file

* Fix linter issues

* Move consumers to own files

* Rename durable consumer; Consumer cleanup

* Docs/config cleanup
2022-03-29 14:14:35 +02:00
Neil Alexander
98a5e410d7
Per-room consumers (#2293)
* Roomserver input refactoring — again!

* Ensure the actor runs again

* Preserve consumer after unsubscribe

* Another sprinkling of magic

* Rename `TopicFor` to `Prefixed`

* Recreate the stream if the config is bad

* Check streams too

* Prefix subjects, preserve inboxes

* Recreate if subjects wrong

* Remove stream subject

* Reconstruct properly

* Fix mutex unlock

* Comments

* Fix tests

* Don't drop events

* Review comments

* Separate `queueInputRoomEvents` function

* Re-jig control flow a bit
2022-03-23 10:20:18 +00:00
Neil Alexander
9572f5ed19
Wait for safe shutdown of NATS Server (#2289) 2022-03-21 10:32:34 +00:00
S7evinK
8336ce972e
Remove unused partition_offset_table (#2288) 2022-03-21 10:47:41 +01:00
Neil Alexander
e30aa38fb0
Stream tweaks, use same codepath for sync vs async input room events, wait for error response via NATS messages (#2283) 2022-03-16 14:21:11 +00:00
Neil Alexander
e485f9c2bd
64-bit stream IDs for device list updates (#2267) 2022-03-10 13:17:28 +00:00
Neil Alexander
67de4dbd0c
Don't send adds_state_events in roomserver output events anymore (#2258)
* Don't send `adds_state_events` in roomserver output events anymore

* Set `omitempty` on some output fields that aren't always set

* Add `AddsState` helper function

* No-op if no added state event IDs

* Revert "No-op if no added state event IDs"

This reverts commit 71a0ef3df10e0d94234d916246c30b0a4e82b26e.

* Revert "Add `AddsState` helper function"

This reverts commit c9fbe45475eb12ae44d2a8da7c0fc3a002ad9819.
2022-03-07 17:17:16 +00:00
kegsay
f1b92de017
MSC2946: Spaces Summary (round 2) (#2232)
* Initial cut at fixing up MSC2946 to work with latest spec

* bugfix: send response back correctly

* Initial working version of MSC2946

* msc2946: handle suggested_only; remove custom database

As the MSC doesn't require reverse lookups, we can just pull
the room state and inspect via the roomserver database. To
handle this, expand QueryCurrentState to support wildcards.

Use all this and handle `?suggested_only`.

* Sort child rooms

* msc2946: Make TestClientSpacesSummary pass

* msc2946: allow invited rooms to be spidered

* msc2946: support basic federation requests

* fix up go mod
2022-03-01 13:40:07 +00:00
Neil Alexander
34116178e8
Remove logging line in PerformInvite 2022-02-22 13:47:14 +00:00
Neil Alexander
5106cc807c
Ensure only one transaction is used for RS input per room (#2178)
* Ensure the input API only uses a single transaction

* Remove more of the dead query API call

* Tidy up

* Fix tests hopefully

* Don't do unnecessary work for rooms that don't exist

* Improve error, fix another case where transaction wasn't used properly

* Add a unit test for checking single transaction on RS input API

* Fix logic oops when deciding whether to use a transaction in storeEvent
2022-02-11 17:40:14 +00:00
S7evinK
a4e7d471af
Remove FederationDisabled error type (#2174) 2022-02-11 18:15:44 +01:00
Neil Alexander
88b45d5cd2
Drop m.room.create events in federation /send transaction (#2179) 2022-02-11 15:18:14 +00:00
kegsay
aa5c3b88de
Unmarshal events at the Dendrite level not GMSL level (#2164)
* Use new event json types in gmsl

* Fix EventJSON to actually unmarshal events

* Update GMSL

* Bump GMSL and improve error messages

* Send back the correct RespState

* Update GMSL
2022-02-09 20:31:24 +00:00
S7evinK
cc688a9a38
Avoid unnecessary logs and marshaling (#2167)
Co-authored-by: kegsay <kegan@matrix.org>
2022-02-09 15:46:52 +01:00
S7evinK
ac25065a54
Fix sytest uploading signed devices gets propagated over federation (#2162)
* Remove unneeded logging

* Add MasterKey & SelfSigningKey to update
Avoid panic if signatures are not present

* Add passing test

* Revert "Add MasterKey & SelfSigningKey to update"

This reverts commit 2c81b34884be8b5b875a33420c0f985b578d3fb8.

* Send MasterKey & SelfSigningKey with update

* Debugging

* Remove delete() so we also query signingkeys
2022-02-09 13:11:43 +01:00
S7evinK
2771d93748
Remove OutputKeyChangeEvent consumer on keyserver (#2160)
* Remove keyserver consumer

* Remove keyserver from eduserver

* Directly upload device keys without eduserver

* Add passing tests
2022-02-08 18:13:38 +01:00
Neil Alexander
a84f50f4fb
Demote logging entry for backoff 2022-02-08 16:49:49 +00:00
S7evinK
9de7efa0b0
Remove sarama/saramajetstream dependencies (#2138)
* Remove dependency on saramajetstream & sarama

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Remove internal.ContinualConsumer from federationapi

* Remove internal.ContinualConsumer from syncapi

* Remove internal.ContinualConsumer from keyserver

* Move to new Prepare function

* Remove saramajetstream & sarama dependency

* Delete unneeded file

* Remove duplicate import

* Log error instead of silently irgnoring it

* Move `OffsetNewest` and `OffsetOldest` into keyserver types, change them to be more sane values

* Fix comments

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-02-04 13:08:13 +00:00
Neil Alexander
2a5c38fee2
Use background contexts during federated join for clarity (#2134)
* Use background contexts for clarity

* Don't wait for the context to expire before trying to return

* Actually we don't really need a goroutine here
2022-02-02 17:33:36 +00:00
Neil Alexander
c773b038bb
Use pull consumers (#2140)
* Pull consumers

* Pull consumers

* Only nuke consumers if they are push consumers

* Clean up old consumers

* Better error handling

* Update comments
2022-02-02 13:32:48 +00:00
Neil Alexander
ba1a9b98b7
Tweak some logging (#2130)
* Modify some log levels

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@336334f

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@cde7ac8

* Demote warning about key change producer

* Add more useful roomserver logging

* Further tweaking
2022-01-31 10:48:28 +00:00
Neil Alexander
a763cbb0e1
Roomserver/federation input refactor (#2104)
* Put federation client functions into their own file

* Look for missing auth events in RS input

* Remove retrieveMissingAuthEvents from federation API

* Logging

* Sorta transplanted the code over

* Use event origin failing all else

* Don't get stuck on mutexes:

* Add verifier

* Don't mark state events with zero snapshot NID as not existing

* Check missing state if not an outlier before storing the event

* Reject instead of soft-fail, don't copy roominfo so much

* Use synchronous contexts, limit time to fetch missing events

* Clean up some commented out bits

* Simplify `/send` endpoint significantly

* Submit async

* Report errors on sending to RS input

* Set max payload in NATS to 16MB

* Tweak metrics

* Add `workerForRoom` for tidiness

* Try skipping unmarshalling errors for RespMissingEvents

* Track missing prev events separately to avoid calculating state when not possible

* Tweak logic around checking missing state

* Care about state when checking missing prev events

* Don't check missing state for create events

* Try that again

* Handle create events better

* Send create room events as new

* Use given event kind when sending auth/state events

* Revert "Use given event kind when sending auth/state events"

This reverts commit 089d64d271b5fca8c104e1554711187420dbebca.

* Only search for missing prev events or state for new events

* Tweaks

* We only have missing prev if we don't supply state

* Room version tweaks

* Allow async inputs again

* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed

* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)

* Use work queue policy, deliver all on restart

* Reduce chance of duplicates being sent by NATS

* Limit the number of servers we attempt to reduce backpressure

* Some review comment fixes

* Tidy up a couple things

* Don't limit servers, randomise order using map

* Some context refactoring

* Update gmsl

* Don't resend create events

* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't

* Exclude our own servername

* Try backing off servers

* Make excluding self behaviour optional

* Exclude self from g_m_e

* Update sytest-whitelist

* Update consumers for the roomserver output stream

* Remember to send outliers for state returned from /gme

* Make full HTTP tests less upsetti

* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist

* Remove debugging test

* Fix blacklist again, remove unnecessary duplicate context

* Clearer contexts, don't use background in case there's something happening there

* Don't queue up events more than once in memory

* Correctly identify create events when checking for state

* Fill in gaps again in /gme code

* Remove `AuthEventIDs` from `InputRoomEvent`

* Remove stray field

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2022-01-27 14:29:14 +00:00
Neil Alexander
8a1bc70524
Exclude our own server name in GetJoinedHostsForRooms (#2110)
* Exclude our own servername

* Make excluding self behaviour optional
2022-01-25 17:00:39 +00:00
Neil Alexander
16035b9737
NATS JetStream tweaks (#2086)
* Use named NATS durable consumers

* Build fixes

* Remove dupe call to SetFederationAPI

* Use namespaced consumer name

* Fix namespacing

* Fix unit tests hopefully
2022-01-07 17:31:57 +00:00
kegsay
173b1e8d3e
Fix #2084 - incorrect /event_auth response (#2085)
* Fix #2084

* Return early

* Linting
2022-01-06 17:13:34 +00:00
S7evinK
161f145176
Add NATS JetStream support (#1866)
* Add NATS JetStream support
Update shopify/sarama

* Fix addresses

* Don't change Addresses in Defaults

* Update saramajetstream

* Add missing error check

Keep typing events for at least one minute

* Use all configured NATS addresses

* Update saramajetstream

* Try setting up with NATS

* Make sure NATS uses own persistent directory (TODO: make this configurable)

* Update go.mod/go.sum

* Jetstream package

* Various other refactoring

* Build fixes

* Config tweaks, make random jetstream storage path for CI

* Disable interest policies

* Try to sane default on jetstream base path

* Try to use in-memory for CI

* Restore storage/retention

* Update nats.go dependency

* Adapt changes to config

* Remove unneeded TopicFor

* Dep update

* Revert "Remove unneeded TopicFor"

This reverts commit f5a4e4a339b6f94ec215778dca22204adaa893d1.

* Revert changes made to streams

* Fix build problems

* Update nats-server

* Update go.mod/go.sum

* Roomserver input API queuing using NATS

* Fix topic naming

* Prometheus metrics

* More refactoring to remove saramajetstream

* Add missing topic

* Don't try to populate map that doesn't exist

* Roomserver output topic

* Update go.mod/go.sum

* Message acknowledgements

* Ack tweaks

* Try to resume transaction re-sends

* Try to resume transaction re-sends

* Update to matrix-org/gomatrixserverlib@91dadfb

* Remove internal.PartitionStorer from components that don't consume keychanges

* Try to reduce re-allocations a bit in resolveConflictsV2

* Tweak delivery options on RS input

* Publish send-to-device messages into correct JetStream subject

* Async and sync roomserver input

* Update dendrite-config.yaml

* Remove roomserver tests for now (they need rewriting)

* Remove roomserver test again (was merged back in)

* Update documentation

* Docker updates

* More Docker updates

* Update Docker readme again

* Fix lint issues

* Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

* Go 1.16 instead of Go 1.13 for upgrade tests and Complement

* Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

This reverts commit 368675283fc44501f227639811bdb16dd5deef8c.

* Don't report any errors on `/send` to see what fun that creates

* Fix panics on closed channel sends

* Enforce state key matches sender

* Do the same for leave

* Various tweaks to make tests happier

Squashed commit of the following:

commit 13f9028e7a63662759ce7c55504a9d2423058668
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:47:14 2022 +0000

    Do the same for leave

commit e6be7f05c349fafbdddfe818337a17a60c867be1
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:33:42 2022 +0000

    Enforce state key matches sender

commit 85ede6d64bf10ce9b91cdd6d80f87350ee55242f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 14:07:04 2022 +0000

    Fix panics on closed channel sends

commit 9755494a98bed62450f8001d8128e40481d27e15
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:38:22 2022 +0000

    Don't report any errors on `/send` to see what fun that creates

commit 3bb4f87b5dd56882febb4db5621db484c8789b7c
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:00:26 2022 +0000

    Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

    This reverts commit 368675283fc44501f227639811bdb16dd5deef8c.

commit fe2673ed7be9559eaca134424e403a4faca100b0
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 12:09:34 2022 +0000

    Go 1.16 instead of Go 1.13 for upgrade tests and Complement

commit 368675283fc44501f227639811bdb16dd5deef8c
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 11:51:45 2022 +0000

    Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

commit b028dfc08577bcf52e6cb498026e15fa5d46d07c
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 10:29:08 2022 +0000

    Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Merge in NATS Server v2.6.6 and nats.go v1.13 into the in-process connection fork

* Add `jetstream.WithJetStreamMessage` to make ack/nak-ing less messy, use process context in consumers

* Fix consumer component name in  federation API

* Add comment explaining where streams are defined

* Tweaks to roomserver input with comments

* Finish that sentence that I apparently forgot to finish in INSTALL.md

* Bump version number of config to 2

* Add comments around asynchronous sends to roomserver in processEventWithMissingState

* More useful error message when the config version does not match

* Set version in generate-config

* Fix version in config.Defaults

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-01-05 17:44:49 +00:00
Neil Alexander
3113210f17
Fix keyring regressions in previous P2P demo 2021-12-13 13:24:49 +00:00
Neil Alexander
c9419e51af
Don't populate config defaults where it doesn't make sense (#2058)
* Don't populate config defaults where it doesn't make sense

* Fix dendritejs builds
2021-11-24 11:57:39 +00:00
Neil Alexander
ec716793eb
Merge federationapi, federationsender, signingkeyserver components (#2055)
* Initial federation sender -> federation API refactoring

* Move base into own package, avoids import cycle

* Fix build errors

* Fix tests

* Add signing key server tables

* Try to fold signing key server into federation API

* Fix dendritejs builds

* Update embedded interfaces

* Fix panic, fix lint error

* Update configs, docker

* Rename some things

* Reuse same keyring on the implementing side

* Fix federation tests, `NewBaseDendrite` can accept freeform options

* Fix build

* Update create_db, configs

* Name tables back

* Don't rename federationsender consumer for now
2021-11-24 10:45:23 +00:00
Neil Alexander
323a6fb54f
Resume federation sends (#2039)
* Resume federation sends

* Review comments

* Fix build error
2021-11-08 09:24:16 +00:00
Neil Alexander
fbd1a0ab13
Update to matrix-org/gomatrixserverlib@5e02b64 2021-11-02 10:13:38 +00:00
Ryan W
1cd4d50181
Added .well-known/matrix/server endpoint (#1988)
* Added .well-known/matrix/server endpoint

Signed-off-by: Ryan Whittington <twentybitdev@gmail.com>

* Replaced tabs with spaces

Signed-off-by: Ryan Whittington <twentybitdev@gmail.com>
2021-09-10 10:05:31 +01:00
Ryan W
a624eab309
- Removed double imports (#1989)
- Lower cased error messages

Signed-off-by: Ryan Whittington <twentybitdev@gmail.com>

Co-authored-by: kegsay <kegan@matrix.org>
2021-09-08 17:31:03 +01:00
kegsay
7dc8fb1fe7
Add more logs (#2005)
* Add more logs

To help debug the migration issue in #1924 along with manual data-loss-inducing fixes.
Also log the origin server on processed txns to help debug buggy server origins.

* Fix query
2021-09-07 15:07:14 +01:00
Neil Alexander
51b119107c
Don't return nonsense canonical room aliases in the public rooms responses (#1992) 2021-08-27 16:50:30 +01:00
Neil Alexander
2dd5fd1fd6
publicRooms should accept POST as well as GET (#1991) 2021-08-27 15:48:27 +01:00
Neil Alexander
ff21675c5b
Cross-signing fixes, notifications via sync, federation (#1974)
* Initial work on signing key update EDUs

* Fix build

* Produce/consume EDUs

* Producer logging

* Only produce key change notifications for local users

* Better naming

* Try to notify sync

* Enable feature

* Use key change topic

* Don't bother verifying signatures, validate key lengths if we can, notifier fixes

* Copyright notices

* Remove tests from whitelist until matrix-org/sytest#1117

* Some review comment fixes

* Update to matrix-org/gomatrixserverlib@f9416ac

* Remove unneeded parameter
2021-08-17 13:44:30 +01:00
Neil Alexander
b1377d991a
Cross-signing signature handling (#1965)
* Handle other signatures

* Decorate key ID properly

* Match by key IDs

* Tweaks

* Fixes

* Fix /user/keys/query bug, review comments, update sytest-whitelist

* Various wtweaks

* Fix wiring for keyserver in API mode

* Additional fixes
2021-08-09 14:35:24 +01:00
Neil Alexander
e95b1fd238
Cross-signing validation for self-sigs, expose signatures over /user/keys/query and /user/devices/{userId} (#1962)
* Enable unstable feature again

* Try to verify when a device signs a key

* Try to verify when a key signs a device

* It's the self-signing key, not the master key

* Fix error

* Try to verify master key uploads

* Actually we can't guarantee we can do that so nevermind

* Add signatures into /devices/list request

* Fix nil pointer

* Reprioritise map creation

* Don't skip devices that don't have signatures

* Add some debug logging

* Fix logic error in QuerySignatures

* Fix bugs

* Expose master and self-signing keys on /devices/list hopefully

* maps are tedious

* Expose signatures via /keys/query

* Upload signatures when uploading keys

* Fixes

* Disable the feature again
2021-08-06 10:13:35 +01:00
kegsay
e3df612953
Add tracing to user API (#1948)
Use the trace version in tests so we can just implement the required API functions.
2021-08-03 11:23:25 +01:00
Meenal Trivedi
fa1ec482a7
fix:Inviting to an unsupported room version return M_BAD_JSON instead of Incompatible_Version (#1930)
* fix:Inviting to an unsupported room version return M_BAD_JSON instead of M_UNSUPPORTED_ROOM_VERSION

Signed-off-by: Meenal Trivedi <meenaltrivedi6102@gmail.com>

* fix

Signed-off-by: Meenal Trivedi <meenaltrivedi6102@gmail.com>

* fix

Signed-off-by: Meenal Trivedi <meenaltrivedi6102@gmail.com>

* feat: make requested changes

Signed-off-by: Meenal Trivedi <meenaltrivedi6102@gmail.com>

* Use error typecast from matrix-org/gomatrixserverlib#272

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2021-07-26 10:41:58 +01:00
kegsay
af64e648d7
Fix failing Complement tests (#1931)
* Check for missing state keys to avoid panicking

* Check for not allowed errors on send_leave

* More logging

* handle send_join errors too

* Additional send_join checks

* s/join/gmsl.json/
2021-07-19 13:15:19 +01:00
kegsay
728061db03
fedsender: try to satisfy all notary key requests from the cache first (#1925)
* fedsender: try to satisfy all notary key requests from the cache first

* Linting
2021-07-16 11:35:42 +01:00
kegsay
c102adaf43
fedsender: add cache tables for notary keys (#1923)
* Add notary server tables for postgres

* Add sqlite tables

* fedsender: GetServerKeys -> QueryServerKeys

As it now checks a cache and can return multiple responses
2021-07-15 17:45:37 +01:00
kegsay
7df3e691f2
Fix failing complement test (#1917)
Specifically `TestBannedUserCannotSendJoin`
2021-07-13 12:22:27 +01:00
Neil Alexander
c8408a6387
Add more optimised code path for checking if we're in a room (#1909)
* Add more optimised code path for checking if we're in a room

* Fix database queries

* Fix federation API test

* Fix logging

* Review comments

* Make separate API call for room membership
2021-07-09 16:36:45 +01:00
Neil Alexander
f2974721d5
Fix concurrent map reads/writes on t.hadEvents (#1902)
* Fix concurrent map reads/writes on t.hadEvents

* Add hadEvent function
2021-07-07 18:55:44 +01:00
Neil Alexander
bcd3ef38d0
Track expiry rate on pduCountTotal 2021-07-05 13:47:37 +01:00
Neil Alexander
99d8e1c107
Federation API fixes (#1899)
* Ensure worker has work before starting goroutine

* Revert "Remove processEventWithMissingStateMutex"

This reverts commit 7f02eab47d.

* Use request context when processing transactions

* Keep goroutine count down by not starting work for things where the caller gave up

* Remove mutex, start workers at correct time
2021-07-05 12:14:31 +01:00
Neil Alexander
7f02eab47d
Remove processEventWithMissingStateMutex 2021-07-05 09:14:24 +01:00
Neil Alexander
57320897cb
Federation API workers for /send to reduce memory usage (#1897)
* Try to process rooms concurrently in FS /send

* Clean up

* Use request context so that dead things don't linger for so long

* Remove mutex

* Free up pdus slice so only references remaining are in channel

* Revert "Remove mutex"

This reverts commit 8558075e8c9bab3c1d8b2252b4ab40c7eaf774e8.

* Process EDUs in parallel

* Try refactoring /send concurrency

* Fix waitgroup

* Release on waitgroup

* Respond to transaction

* Reduce CPU usage, fix unit tests

* Tweaks

* Move into one file
2021-07-02 12:33:27 +01:00
Neil Alexander
2647f6e9c5
Fix concurrent map read/write on haveEvents (#1893) 2021-06-30 12:32:20 +01:00
Neil Alexander
b7a2d369c0
Change how servers are selected for missing auth/prev events (#1892)
* Change how servers are selected for missing auth/prev events

* Shuffle order

* Move ServersInRoomProvider into api package
2021-06-30 12:05:58 +01:00
Neil Alexander
0e69212206
Give up on loops when the context expires (#1891) 2021-06-30 10:39:47 +01:00
Neil Alexander
3afb161352
Reduce memory usage in federation /send endpoint (#1890)
* More aggressive event caching

* Deduplicate /state results

* Deduplicate more

* Ensure we use the correct list of events when excluding repeated state

* Fixes

* Ensure we track all events we already knew about properly
2021-06-30 10:01:56 +01:00
Neil Alexander
e2b6a90d90
Put gmectx back to 5 minutes 2021-06-29 10:22:26 +01:00
Neil Alexander
f645646ca9
Restore the getServers RS query (needs optimisation) 2021-06-29 09:37:28 +01:00
Neil Alexander
4417f24678
Protect processEventWithMissingState with per-room mutex, to prevent mass CPU burn/RAM usage
Squashed commit of the following:

commit 7fad77c10e3c1c78feddb37351812b209d9c0f25
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 15:06:52 2021 +0100

    Fix processEventWithMissingStateMutexes

commit 138cddcac7b8373a8e1816a232f84a7bda6adcdf
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:59:44 2021 +0100

    Use internal.MutexByRoom

commit 6e6f026cfad31da391ad261cfec16d41dff1b15b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:50:18 2021 +0100

    Try to slow things down per room

commit b97d406dff2e11769a9202fbf58b138a541ca449
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:41:27 2021 +0100

    Try to slow things down

commit 8866120ebf880b4fd8a456937f69903e233c19a2
Merge: 9f2de8a2 4a37b19a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:40:33 2021 +0100

    Merge branch 'neilalexander/rsinputfifo' into neilalexander/rsinputfifo2

commit 4a37b19a8f6fe8af02e979827253d22a0ccdedb8
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:34:54 2021 +0100

    Add comments

commit f9ab3f4b8157a42d657735101bc2c768c663e814
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:31:21 2021 +0100

    Tweaks

commit 9f2de8a29cadec4c785d9c2e4e74c1138305f759
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:15:59 2021 +0100

    Ask origin only for missing things for now

commit 8fd878c75a4066abb21597d524a4eb4670a392d4
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 11:18:11 2021 +0100

    Make sure someone wakes up

commit b63f699f1b74948d180885449398f999fafb18c8
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 11:12:58 2021 +0100

    Use a FIFO queue instead of a channel to reduce backpressure
2021-06-28 15:11:59 +01:00
Neil Alexander
080ae6a829
Move room mutex in federation API (#1830)
* Move room mutex in federation API to surround resolveStatesAndCheck

* Guard processEventWithMissingState instead

* Revert "Guard processEventWithMissingState instead"

This reverts commit 0ce88036aa79b0ce97e967afb70fe932a547d5f0.
2021-04-13 11:13:07 +01:00
Kegsay
b769d5a25e
Optimise memory usage when calling /g_m_e (#1819)
* Optimise memory usage when calling /g_m_e

* cache more events

* refactor handling of device list update pokes

* Sigh
2021-04-08 13:50:39 +01:00
Bruce MacDonald
d27607af78
Implement OpenID module (#599) (#1812)
* Implement OpenID module (#599)

- Unrelated: change Riot references to Element in client API routing

Signed-off-by: Bruce MacDonald <contact@bruce-macdonald.com>

* OpenID module tweaks (#599)

- specify expiry is ms rather than vague ts
- add OpenID token lifetime to configuration
- use Go naming conventions for the path params
- store plaintext token rather than hash
- remove openid table sqllite mutex

* Add default OpenID token lifetime (#599)

* Update dendrite-config.yaml

Co-authored-by: Kegsay <kegsay@gmail.com>
Co-authored-by: Kegsay <kegan@matrix.org>
2021-04-07 13:26:20 +01:00
Kegsay
f8d3a762c4
Add a per-room mutex to federationapi when processing transactions (#1810)
* Add a per-room mutex to federationapi when processing transactions

This has numerous benefits:
 - Prevents us doing lots of state resolutions in busy rooms. Previously, room forks would always result
   in a state resolution being performed immediately, without checking if we were already doing this in
   a different transaction. Now they will queue up, resulting in fewer calls to `/state_ids`, `/g_m_e`, etc.
 - Prevents memory usage from growing too large as a result and potentially OOMing.

And costs:
 - High traffic rooms will be slightly slower due to head-of-line blocking from other servers,
   though this has always been an issue as roomserver has a per-room mutex already.

* Fix unit tests

* Correct mutex lock ordering
2021-03-30 10:01:32 +01:00
Kegsay
af41f6d454
Add Sentry support (#1803)
* Add Sentry support

* Use HTTP Sentry properly maybe

* Capture panics

* Log fed Sentry stuff correctly

* British english linter
2021-03-24 10:25:24 +00:00
Kegsay
802f1c96f8
Add more metrics (#1802)
* Add more metrics

* Linting
2021-03-23 15:22:00 +00:00
Kegsay
a1b7e4ef3f
log less for failed key querys, add counters for incoming pdus/edus (#1801)
* log less for failed key querys, add counters for incoming pdus/edus

* use labels

* Blacklist flakey test

* Fix metrics
2021-03-23 11:33:36 +00:00
Will Hunt
9557ccada4
Fix appsevice alias queries part 2 (#1684)
* Check membership of room

* Use QueryStateAfterEventsResponse

* Fix complexity

* Add field ShouldHitAppservice to GetRoomIDForAlias

* Hit appservice when trying to join a non-existent alias

* remove unused

* Changes that I made a long time ago

* Rename to appserviceJoinedAtEvent

* Check membership in GetMemberships

* Update QueryMembershipsForRoom

* Tweaks in client API

* Update appserviceJoinedAtEvent

* Comments

* Try QueryMembershipForUser instead

* Undo some changes to client API that shouldn't be needed

* More /event tweaks

* Refactor /event bit

* Go back to QueryMembershipsForRoom because appservices are hard

* Fix bugs in onMessage

* Add comments

* More logical naming, clean up a bit

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2021-03-03 17:00:31 +00:00
Neil Alexander
d15836e260
Increase gocyclo complexity to 25 (and remove all but 2 golint directives related to it) (#1783) 2021-03-03 14:35:57 +00:00
Neil Alexander
5d74a1757f
Don't query for servers so often in /send (#1766)
* Look up servers less often, don't hit API for missing auth events unless there are actually missing auth events

* Remove ResolveConflictsAdhoc (since it is already in GMSL), other tweaks

* Update gomatrixserverlib to matrix-org/gomatrixserverlib#254

* Fix resolve-state

* Initialise t.servers on first use
2021-02-16 17:12:17 +00:00
Neil Alexander
6757b67a32
NewClient and NewFederationClient updates (#1730)
* Use matrix-org/gomatrixserverlib#252

* Add missing WithSkipVerify to test

* Functions instead

* Update gomatrixserverlib to matrix-org/gomatrixserverlib#252

* Fix disabling TLS validation
2021-01-22 16:09:05 +00:00
Kegsay
93942f8ab6
Gate peeking behind msc flags (#1731) 2021-01-22 16:08:47 +00:00
Matthew Hodgson
0571d395b5
Peeking over federation via MSC2444 (#1391)
* a very very WIP first cut of peeking via MSC2753.

doesn't yet compile or work.
needs to actually add the peeking block into the sync response.
checking in now before it gets any bigger, and to gather any initial feedback on the vague shape of it.

* make PeekingDeviceSet private

* add server_name param

* blind stab at adding a `peek` section to /sync

* make it build

* make it launch

* add peeking to getResponseWithPDUsForCompleteSync

* cancel any peeks when we join a room

* spell out how to runoutside of docker if you want speed

* fix SQL

* remove unnecessary txn for SelectPeeks

* fix s/join/peek/ cargocult fail

* HACK: Track goroutine IDs to determine when we write by the wrong thread

To use: set `DENDRITE_TRACE_SQL=1` then grep for `unsafe`

* Track partition offsets and only log unsafe for non-selects

* Put redactions in the writer goroutine

* Update filters on writer goroutine

* wrap peek storage in goid hack

* use exclusive writer, and MarkPeeksAsOld more efficiently

* don't log ascii in binary at sql trace...

* strip out empty roomd deltas

* re-add txn to SelectPeeks

* re-add accidentally deleted field

* reject peeks for non-worldreadable rooms

* move perform_peek

* fix package

* correctly refactor perform_peek

* WIP of implementing MSC2444

* typo

* Revert "Merge branch 'kegan/HACK-goid-sqlite-db-is-locked' into matthew/peeking"

This reverts commit 3cebd8dbfbccdf82b7930b7b6eda92095ca6ef41, reversing
changes made to ed4b3a58a7855acc43530693cc855b439edf9c7c.

* (almost) make it build

* clean up bad merge

* support SendEventWithState with optional event

* fix build & lint

* fix build & lint

* reinstate federated peeks in the roomserver (doh)

* fix sql thinko

* todo for authenticating state returned by /peek

* support returning current state from QueryStateAndAuthChain

* handle SS /peek

* reimplement SS /peek to prod the RS to tell the FS about the peek

* rename RemotePeeks as OutboundPeeks

* rename remote_peeks_table as outbound_peeks_table

* add perform_handle_remote_peek.go

* flesh out federation doc

* add inbound peeks table and hook it up

* rename ambiguous RemotePeek as InboundPeek

* rename FSAPI's PerformPeek as PerformOutboundPeek

* setup inbound peeks db correctly

* fix api.SendEventWithState with no event

* track latestevent on /peek

* go fmt

* document the peek send stream race better

* fix SendEventWithRewrite not to bail if handed a non-state event

* add fixme

* switch SS /peek to use SendEventWithRewrite

* fix comment

* use reverse topo ordering to find latest extrem

* support postgres for federated peeking

* go fmt

* back out bogus go.mod change

* Fix performOutboundPeekUsingServer

* Fix getAuthChain -> GetAuthChain

* Fix build issues

* Fix build again

* Fix getAuthChain -> GetAuthChain

* Don't repeat outbound peeks for the same room ID to the same servers

* Fix lint

* Don't omitempty to appease sytest

Co-authored-by: Kegan Dougal <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2021-01-22 14:55:08 +00:00
Neil Alexander
05324b6861
Send/state tweaks (#1681)
* Check missing event count

* Don't use request context for /send
2021-01-04 13:47:48 +00:00
Kegsay
b507312d4c
MSC2836 threading: part 2 (#1596)
* Update GMSL

* Add MSC2836EventRelationships to fedsender

* Call MSC2836EventRelationships in reqCtx

* auth remote servers

* Extract room ID and servers from previous events; refactor a bit

* initial cut of federated threading

* Use the right client/fed struct in the response

* Add QueryAuthChain for use with MSC2836

* Add auth chain to federated response

* Fix pointers

* under CI: more logging and enable mscs, nil fix

* Handle direction: up

* Actually send message events to the roomserver..

* Add children and children_hash to unsigned, with tests

* Add logic for exploring threads and tracking children; missing storage functions

* Implement storage functions for children

* Add fetchUnknownEvent

* Do federated hits for include_children if we have unexplored children

* Use /ev_rel rather than /event as the former includes child metadata

* Remove cross-room threading impl

* Enable MSC2836 in the p2p demo

* Namespace mscs db

* Enable msc2836 for ygg

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-12-04 14:11:01 +00:00
Neil Alexander
be7d8595be
Peeking updates (#1607)
* Add unpeek

* Don't allow peeks into encrypted rooms

* Fix send tests

* Update consumers
2020-12-03 11:11:46 +00:00
Neil Alexander
b5aa7ca3ab
Top-level setup package (#1605)
* Move config, setup, mscs into "setup" top-level folder

* oops, forgot the EDU server

* Add setup

* goimports
2020-12-02 17:41:00 +00:00
Neil Alexander
265cf5e835
Protect txnReq.newEvents with mutex (#1587)
* Protect txnReq.newEvents and txnReq.haveEvents with mutex

* Missing defer

* Remove t.haveEventsMutex
2020-11-18 11:31:58 +00:00