Adds wakeup broadcast handling to the pinecone demos.
This will reset their blacklist status and interrupt any ongoing
federation queue backoffs currently in progress for this peer.
The end result is that any queued events will quickly be sent to the
peer if they had disconnected while attempting to send events to them.
Adds `PUT
/_matrix/client/v3/directory/list/appservice/{networkId}/{roomId}` and
`DELTE
/_matrix/client/v3/directory/list/appservice/{networkId}/{roomId}`
support, as well as the ability to filter `/publicRooms` on networkID
and including all networks.
This is a refactor of the federation destination queues.
It fixes a few things, namely:
- actually retry outgoing events with backoff behaviour
- obtain enough events from the database to fill messages as much as
possible
- minimize the amount of running goroutines
- use pure timers for backoff
- don't restart queue unless necessary
- close the background task when backing off
- increase max edus in a transaction to match the spec
- cleanup timers more aggresively to reduce memory usage
- add jitter to backoff timers to reduce resource spikes
- add a bunch of tests (with real and fake databases) to ensure
everything is working
This fixes some edge cases where federation queue backoffs and
blacklisting weren't behaving as expected.
It also adds new tests for the federation queues to ensure their
behaviour continues to work correctly.
This ensures that the joined hosts in the federation API are correct
after the state is rewritten. This might fix some races around the time
of joining federated rooms.
If the private key file is lost, it's often possible to retrieve the
public key from another server elsewhere, so we should make it possible
to configure it in that way.
Some tweaks for the send-to-device consumers/producers:
- use `json.RawMessage` without marshalling it first
- try further devices (if available) if we failed to `PublishMsg` in the
producers
- some logging changes (to better debug E2EE issues)
We were `json.Unmarshal`ing the EDU and `json.Marshal`ing right before
sending the EDU to the stream. Those are now removed and the consumer
does `json.Unmarshal` once.
`If a device list update goes missing, the server resyncs on the next
one` was failing because a previous test would receive a `waitTime` of
1h, resulting in the test timing out.
This now tries to handle the returned errors differently, e.g. by using
the default `waitTime` of 2s. Also doesn't try further users in the
list, if one of the errors would cause a longer `waitTime`.
This makes the following changes:
* The various `Defaults` functions are now responsible for setting sane defaults if `generate` is specified, rather than hiding them in `generate-config`
* Some configuration options have been marked as `omitempty` so that they don't appear in generated configs unnecessarily (monolith-specific vs. polylith-specific options)
* A new option `-polylith` has been added to `generate-config` to create a config that makes sense for polylith deployments (i.e. including the internal/external API listeners and per-component database sections)
* A new option `-normalise` has been added to `generate-config` to take an existing file and add any missing options and/or defaults
In some conditions (fast CPUs), this test would race the clock for EDU expiration when all we want to make sure of is that the expired EDUs are properly deleted. Given this, we set the expiry time to 0 so the specified EDUs are always deleted when DeleteExpiredEDUs is called.
Fixes#2650.
Signed-off-by: Winter <winter@winter.cafe>
* Generic-based internal HTTP API (tested out on a few endpoints in the federation API)
* Add `PerformInvite`
* More tweaks
* Fix metric name
* Fix LookupStateIDs
* Lots of changes to clients
* Some serverside stuff
* Some error handling
* Use paths as metric names
* Revert "Use paths as metric names"
This reverts commit a9323a6a343f5ce6461a2e5bd570fe06465f1b15.
* Namespace metric names
* Remove duplicate entry
* Remove another duplicate entry
* Tweak error handling
* Some more tweaks
* Update error behaviour
* Some more error tweaking
* Fix API path for `PerformDeleteKeys`
* Fix another path
* Tweak federation client proxying
* Fix another path
* Don't return typed nils
* Some more tweaks, not that it makes any difference
* Tweak federation client proxying
* Maybe fix the key backup test
* Add housekeeping function to delete old/expired EDUs
* Add migrations
* Evict EDUs from cache
* Fix queries
* Fix upgrade
* Use map[string]time.Duration to specify different expiry times
* Fix copy & paste mistake
* Set expires_at to tomorrow
* Don't allow NULL
* Add comment
* Add tests
* Use new testrig package
* Fix migrations
* Never expire m.direct_to_device
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
* Add race testing to tests, and fix a few small race conditions in the tests
* Enable run-sytest on MacOS
* Remove deadlock detecting mutex, per code review feedback
* Remove autoformatting related changes and a closure that is not needed
* Adjust to importing nats client as 'natsclient'
Signed-off-by: Brian Meek <brian@hntlabs.com>
* Clarify the use of gooseMutex to proect goose internal state
Signed-off-by: Brian Meek <brian@hntlabs.com>
* Remove no longer needed mutex for guarding goose
Signed-off-by: Brian Meek <brian@hntlabs.com>
* Add new db migration
* Update migrations
Remove goose
* Add possibility to test direct upgrades
* Try to fix WASM test
* Add checks for specific migrations
* Remove AddMigration
Use WithTransaction
Add Dendrite version to table
* Fix linter issues
* Update tests
* Update comments, outdent if
* Namespace migrations
* Add direct upgrade tests, skipping over one version
* Split migrations
* Update go version in CI
* Fix copy&paste mistake
* Use contexts in migrations
Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Membership updater refactoring
* Pass in membership state
* Use membership check rather than referring to state directly
* Delete irrelevant membership states
* We don't need the leave event after all
* Tweaks
* Put a log entry in that I might stand a chance of finding
* Be less panicky
* Tweak invite handling
* Don't freak if we can't find the event NID
* Use event NID from `types.Event`
* Clean up
* Better invite handling
* Placate the almighty linter
* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand
* Fix the sytest after all (thanks @S7evinK for the spot)
* Try Ristretto cache
* Tweak
* It's beautiful
* Update GMSL
* More strict keyable interface
* Fix that some more
* Make less panicky
* Don't enforce mutability checks for now
* Determine mutability using deep equality
* Tweaks
* Namespace keys
* Make federation caches mutable
* Update cost estimation, add metric
* Update GMSL
* Estimate cost for metrics better
* Reduce counters a bit
* Try caching events
* Some guards
* Try again
* Try this
* Use separate caches for hopefully better hash distribution
* Fix bug with admitting events into cache
* Try to fix bugs
* Check nil
* Try that again
* Preserve order jeezo this is messy
* thanks VS Code for doing exactly the wrong thing
* Try this again
* Be more specific
* aaaaargh
* One more time
* That might be better
* Stronger sorting
* Cache expiries, async publishing of EDUs
* Put it back
* Use a shared cache again
* Cost estimation fixes
* Update ristretto
* Reduce counters a bit
* Clean up a bit
* Update GMSL
* 1GB
* Configurable cache sizees
* Tweaks
* Add `config.DataUnit` for specifying friendly cache sizes
* Various tweaks
* Update GMSL
* Add back some lazy loading caching
* Include key in cost
* Include key in cost
* Tweak max age handling, config key name
* Only register prometheus metrics if requested
* Review comments @S7evinK
* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)
* Review comments
* Update sample configs
* Update GHA Workflow
* Update Complement images to Go 1.18
* Remove the cache test from the federation API as we no longer guarantee immediate cache admission
* Don't check the caches in the renewal test
* Possibly fix the upgrade tests
* Update to matrix-org/gomatrixserverlib#322
* Update documentation to refer to Go 1.18
This should avoid coercions between signed and unsigned ints which might fix problems like `sql: converting argument $5 type: uint64 values with high bit set are not supported`.
* Add `QueryRestrictedJoinAllowed`
* Add `Resident` flag to `QueryRestrictedJoinAllowedResponse`
* Check restricted joins on federation API
* Return `Restricted` to determine if the room was restricted or not
* Populate `AuthorisedVia` properly
* Sign the event on `/send_join`, return it in the `/send_join` response in the `"event"` key
* Kick back joins with invalid authorising user IDs, use event from `"event"` key if returned in `RespSendJoin`
* Use invite helper in `QueryRestrictedJoinAllowed`
* Only use users with the power to invite, change error bubbling a bit
* Placate the almighty linter
One day I will nuke `gocyclo` from orbit and everything in the world will be much better for it.
* Review comments
* Fix flakey sytest 'Local device key changes get to remote servers'
* Debug logs
* Remove internal/test and use /test only
Remove a lot of ancient code too.
* Use FederationRoomserverAPI in more places
* Use more interfaces in federationapi; begin adding regression test
* Linting
* Add regression test
* Unbreak tests
* ALL THE LOGS
* Fix a race condition which could cause events to not be sent to servers
If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.
* Unbreak new tests
* Linting
* Add very basic syncapi tests
* Add a way to inject jetstream messages
* implement add_state_ids
* bugfixes
* Unbreak tests
* Remove now un-needed API call
* Linting
* Don't ask roomserver for events we already have in federation API
* Check number of events returned is as expected
* Preallocate array
* Improve shape a bit
* tidy up interfaces
* remove unused GetCreatorIDForAlias
* Add RoomserverUserAPI interface
* Define more interfaces
* Use AppServiceInternalAPI for consistent naming
* clean up federationapi constructor a bit
* Fix monolith in -http mode
* Specify interfaces used by appservice, do half of clientapi
* convert more deps of clientapi to finer-grained interfaces
* Convert mediaapi and rest of clientapi
* Somehow this got missed
* Simplify federation API `AddPublicRoutes`
* Simplify client API `AddPublicRoutes`
* Simplify media API `AddPublicRoutes`
* Simplify sync API `AddPublicRoutes`
* Simplify `AddAllPublicRoutes`
* Fix retrieving cross-signing signatures in `/user/devices/{userId}`
We need to know the target device IDs in order to get the signatures and we weren't populating those.
* Fix up signature retrieval
* Fix SQLite
* Always include the target's own signatures as well as the requesting user
* Add response size and requests total to internal handler
* Move MustRegister calls to New* funcs
* Move MustRegister back to init
* Init at some place, minimize changes
* Move receipt sending to own JetStream producer
* Move SendToDevice to producer
* Remove most parts of the EDU server
* Fix SendToDevice & copyrights
* Move structs, cleanup EDU Server traces
* Use HeadersOnly subscription
* Missing file
* Fix linter issues
* Move consumers to own files
* Rename durable consumer; Consumer cleanup
* Docs/config cleanup
* Roomserver input refactoring — again!
* Ensure the actor runs again
* Preserve consumer after unsubscribe
* Another sprinkling of magic
* Rename `TopicFor` to `Prefixed`
* Recreate the stream if the config is bad
* Check streams too
* Prefix subjects, preserve inboxes
* Recreate if subjects wrong
* Remove stream subject
* Reconstruct properly
* Fix mutex unlock
* Comments
* Fix tests
* Don't drop events
* Review comments
* Separate `queueInputRoomEvents` function
* Re-jig control flow a bit
* Don't send `adds_state_events` in roomserver output events anymore
* Set `omitempty` on some output fields that aren't always set
* Add `AddsState` helper function
* No-op if no added state event IDs
* Revert "No-op if no added state event IDs"
This reverts commit 71a0ef3df10e0d94234d916246c30b0a4e82b26e.
* Revert "Add `AddsState` helper function"
This reverts commit c9fbe45475eb12ae44d2a8da7c0fc3a002ad9819.