* Try to process rooms concurrently in FS /send
* Clean up
* Use request context so that dead things don't linger for so long
* Remove mutex
* Free up pdus slice so only references remaining are in channel
* Revert "Remove mutex"
This reverts commit 8558075e8c9bab3c1d8b2252b4ab40c7eaf774e8.
* Process EDUs in parallel
* Try refactoring /send concurrency
* Fix waitgroup
* Release on waitgroup
* Respond to transaction
* Reduce CPU usage, fix unit tests
* Tweaks
* Move into one file
* More aggressive event caching
* Deduplicate /state results
* Deduplicate more
* Ensure we use the correct list of events when excluding repeated state
* Fixes
* Ensure we track all events we already knew about properly
Squashed commit of the following:
commit 7fad77c10e3c1c78feddb37351812b209d9c0f25
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 15:06:52 2021 +0100
Fix processEventWithMissingStateMutexes
commit 138cddcac7b8373a8e1816a232f84a7bda6adcdf
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:59:44 2021 +0100
Use internal.MutexByRoom
commit 6e6f026cfad31da391ad261cfec16d41dff1b15b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:50:18 2021 +0100
Try to slow things down per room
commit b97d406dff2e11769a9202fbf58b138a541ca449
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:41:27 2021 +0100
Try to slow things down
commit 8866120ebf880b4fd8a456937f69903e233c19a2
Merge: 9f2de8a2 4a37b19a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:40:33 2021 +0100
Merge branch 'neilalexander/rsinputfifo' into neilalexander/rsinputfifo2
commit 4a37b19a8f6fe8af02e979827253d22a0ccdedb8
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:34:54 2021 +0100
Add comments
commit f9ab3f4b8157a42d657735101bc2c768c663e814
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:31:21 2021 +0100
Tweaks
commit 9f2de8a29cadec4c785d9c2e4e74c1138305f759
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:15:59 2021 +0100
Ask origin only for missing things for now
commit 8fd878c75a4066abb21597d524a4eb4670a392d4
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 11:18:11 2021 +0100
Make sure someone wakes up
commit b63f699f1b74948d180885449398f999fafb18c8
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 11:12:58 2021 +0100
Use a FIFO queue instead of a channel to reduce backpressure
* Implement OpenID module (#599)
- Unrelated: change Riot references to Element in client API routing
Signed-off-by: Bruce MacDonald <contact@bruce-macdonald.com>
* OpenID module tweaks (#599)
- specify expiry is ms rather than vague ts
- add OpenID token lifetime to configuration
- use Go naming conventions for the path params
- store plaintext token rather than hash
- remove openid table sqllite mutex
* Add default OpenID token lifetime (#599)
* Update dendrite-config.yaml
Co-authored-by: Kegsay <kegsay@gmail.com>
Co-authored-by: Kegsay <kegan@matrix.org>
* Add a per-room mutex to federationapi when processing transactions
This has numerous benefits:
- Prevents us doing lots of state resolutions in busy rooms. Previously, room forks would always result
in a state resolution being performed immediately, without checking if we were already doing this in
a different transaction. Now they will queue up, resulting in fewer calls to `/state_ids`, `/g_m_e`, etc.
- Prevents memory usage from growing too large as a result and potentially OOMing.
And costs:
- High traffic rooms will be slightly slower due to head-of-line blocking from other servers,
though this has always been an issue as roomserver has a per-room mutex already.
* Fix unit tests
* Correct mutex lock ordering
* Check membership of room
* Use QueryStateAfterEventsResponse
* Fix complexity
* Add field ShouldHitAppservice to GetRoomIDForAlias
* Hit appservice when trying to join a non-existent alias
* remove unused
* Changes that I made a long time ago
* Rename to appserviceJoinedAtEvent
* Check membership in GetMemberships
* Update QueryMembershipsForRoom
* Tweaks in client API
* Update appserviceJoinedAtEvent
* Comments
* Try QueryMembershipForUser instead
* Undo some changes to client API that shouldn't be needed
* More /event tweaks
* Refactor /event bit
* Go back to QueryMembershipsForRoom because appservices are hard
* Fix bugs in onMessage
* Add comments
* More logical naming, clean up a bit
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Look up servers less often, don't hit API for missing auth events unless there are actually missing auth events
* Remove ResolveConflictsAdhoc (since it is already in GMSL), other tweaks
* Update gomatrixserverlib to matrix-org/gomatrixserverlib#254
* Fix resolve-state
* Initialise t.servers on first use
* a very very WIP first cut of peeking via MSC2753.
doesn't yet compile or work.
needs to actually add the peeking block into the sync response.
checking in now before it gets any bigger, and to gather any initial feedback on the vague shape of it.
* make PeekingDeviceSet private
* add server_name param
* blind stab at adding a `peek` section to /sync
* make it build
* make it launch
* add peeking to getResponseWithPDUsForCompleteSync
* cancel any peeks when we join a room
* spell out how to runoutside of docker if you want speed
* fix SQL
* remove unnecessary txn for SelectPeeks
* fix s/join/peek/ cargocult fail
* HACK: Track goroutine IDs to determine when we write by the wrong thread
To use: set `DENDRITE_TRACE_SQL=1` then grep for `unsafe`
* Track partition offsets and only log unsafe for non-selects
* Put redactions in the writer goroutine
* Update filters on writer goroutine
* wrap peek storage in goid hack
* use exclusive writer, and MarkPeeksAsOld more efficiently
* don't log ascii in binary at sql trace...
* strip out empty roomd deltas
* re-add txn to SelectPeeks
* re-add accidentally deleted field
* reject peeks for non-worldreadable rooms
* move perform_peek
* fix package
* correctly refactor perform_peek
* WIP of implementing MSC2444
* typo
* Revert "Merge branch 'kegan/HACK-goid-sqlite-db-is-locked' into matthew/peeking"
This reverts commit 3cebd8dbfbccdf82b7930b7b6eda92095ca6ef41, reversing
changes made to ed4b3a58a7855acc43530693cc855b439edf9c7c.
* (almost) make it build
* clean up bad merge
* support SendEventWithState with optional event
* fix build & lint
* fix build & lint
* reinstate federated peeks in the roomserver (doh)
* fix sql thinko
* todo for authenticating state returned by /peek
* support returning current state from QueryStateAndAuthChain
* handle SS /peek
* reimplement SS /peek to prod the RS to tell the FS about the peek
* rename RemotePeeks as OutboundPeeks
* rename remote_peeks_table as outbound_peeks_table
* add perform_handle_remote_peek.go
* flesh out federation doc
* add inbound peeks table and hook it up
* rename ambiguous RemotePeek as InboundPeek
* rename FSAPI's PerformPeek as PerformOutboundPeek
* setup inbound peeks db correctly
* fix api.SendEventWithState with no event
* track latestevent on /peek
* go fmt
* document the peek send stream race better
* fix SendEventWithRewrite not to bail if handed a non-state event
* add fixme
* switch SS /peek to use SendEventWithRewrite
* fix comment
* use reverse topo ordering to find latest extrem
* support postgres for federated peeking
* go fmt
* back out bogus go.mod change
* Fix performOutboundPeekUsingServer
* Fix getAuthChain -> GetAuthChain
* Fix build issues
* Fix build again
* Fix getAuthChain -> GetAuthChain
* Don't repeat outbound peeks for the same room ID to the same servers
* Fix lint
* Don't omitempty to appease sytest
Co-authored-by: Kegan Dougal <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Update GMSL
* Add MSC2836EventRelationships to fedsender
* Call MSC2836EventRelationships in reqCtx
* auth remote servers
* Extract room ID and servers from previous events; refactor a bit
* initial cut of federated threading
* Use the right client/fed struct in the response
* Add QueryAuthChain for use with MSC2836
* Add auth chain to federated response
* Fix pointers
* under CI: more logging and enable mscs, nil fix
* Handle direction: up
* Actually send message events to the roomserver..
* Add children and children_hash to unsigned, with tests
* Add logic for exploring threads and tracking children; missing storage functions
* Implement storage functions for children
* Add fetchUnknownEvent
* Do federated hits for include_children if we have unexplored children
* Use /ev_rel rather than /event as the former includes child metadata
* Remove cross-room threading impl
* Enable MSC2836 in the p2p demo
* Namespace mscs db
* Enable msc2836 for ygg
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* fix conversion from int to string yields a string of one rune, not a string of digits
* Add receipts table to syncapi
* Use StreamingToken as the since value
* Add required method to testEDUProducer
* Make receipt json creation "easier" to read
* Add receipts api to the eduserver
* Add receipts endpoint
* Add eduserver kafka consumer
* Add missing kafka config
* Add passing tests to whitelist
Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
* Fix copy & paste error
* Fix column count error
* Make outbound federation receipts pass
* Make "Inbound federation rejects receipts from wrong remote" pass
* Don't use errors package
* - Add TODO for batching requests
- Rename variable
* Return a better error message
* - Use OutputReceiptEvent instead of InputReceiptEvent as result
- Don't use the errors package for errors
- Defer CloseAndLogIfError to close rows
- Fix Copyright
* Better creation/usage of JoinResponse
* Query all joined rooms instead of just one
* Update gomatrixserverlib
* Add sqlite3 migration
* Add postgres migration
* Ensure required sequence exists before running migrations
* Clarification on comment
* - Fix a bug when creating client receipts
- Use concrete types instead of interface{}
* Remove dead code
Use key for timestamp
* Fix postgres query...
* Remove single purpose struct
* Use key/value directly
* Only apply receipts on initial sync or if edu positions differ,
otherwise we'll be sending the same receipts over and over again.
* Actually update the id, so it is correctly send in syncs
* Set receipt on request to /read_markers
* Fix issue with receipts getting overwritten
* Use fmt.Errorf instead of pkg/errors
* Revert "Add postgres migration"
This reverts commit 722fe5a04628882b787d096942459961db159b06.
* Revert "Add sqlite3 migration"
This reverts commit d113b03f6495a4b8f8bcf158a3d00b510b4240cc.
* Fix selectRoomReceipts query
* Make golangci-lint happy
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add basic storage methods
* Add internal api handler
* Add check for forgotten room
* Add /rooms/{roomID}/forget endpoint
* Add missing rsAPI method
* Remove unused parameters
* Add passing tests
Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
* Add missing file
* Add postgres migration
* Add sqlite migration
* Use Forgetter to forget room
* Remove empty line
* Update HTTP status codes
It looks like the spec calls for these to be 400, rather than 403: https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-rooms-roomid-forget
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add KindOld
* Don't process latest events/memberships for old events
* Allow federationsender to ignore duplicate key entries when LatestEventIDs is duplicated by RS output events
* Signal to downstream components if an event has become a forward extremity
* Don't exclude from sync
* Soft-fail checks on KindNew
* Don't run the latest events updater at all for KindOld
* Don't make federation sender change after all
* Kind in federation sender join
* Don't send isForwardExtremity
* Fix syncapi
* Update comments
* Fix SendEventWithState
* Update sytest-whitelist
* Generate old output events
* Sync API consumes old room events
* Update comments
* Capture errors
* Don't request only state key tuples needed for auth (we end up discarding room state this way)
* QueryStateAfterEvent returns all state when no tuples supplied
* Resolve state
* Comments
* Recursively fetch auth events if needed
* Fix processEvent call
* Ask more servers in lookupEvent
* Don't panic!
* Panic at the Disco
* Find servers more aggressively
* Add getServers
* Fix number of servers to 5, don't bail making RespState if auth events missing
* Fix panic
* Ignore missing state events too
* Report number of servers correctly
* Don't reuse request context for /send_join
* Update federation API tests
* Don't recurse processEvents
* Implement getEvents differently
* Adjust backfill to send backward extremity with state before other backfilled events, include prev_events with no state amongst missing events
* Not finished refactor
* Fix test
* Remove isInboundTxn
* Remove debug logging
* Try to ask other servers in the room for missing events if the origin won't provide them
* Logging
* More logging
* Implement QueryMissingAuthPrevEvents
* Try to get missing auth events badly
* Use processEvent
* Logging
* Update QueryMissingAuthPrevEvents
* Try to find missing auth events
* Patchy fix for test
* Logging tweaks
* Send auth events as outliers
* Update check in QueryMissingAuthPrevEvents
* Error responses
* More return codes
* Don't return error on reject/soft-fail since it was ultimately handled
* More tweaks
* More error tweaks
* Sanity-check room version on RS event input
* Update gomatrixserverlib
* Reject make_join when no room members are left
* Revert some changes from wrong branch
* Distinguish between room not existing and room being abandoned on this server
* nolint
* WIP Event rejection
* Still send back errors for rejected events
Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.
* Implement rejected events
Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.
* Update test to match reality
* Modify InputRoomEvents to no longer return an error
Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.
* Remove redundant returns; linting
* Update blacklist
* Remove QueryBulkStateContent from current state server
Expected fail due to db impl not existing
* Implement query bulk state content
* Fix up rejecting invites over federation
* Fix bulk content marshalling
* Use background context when processing event with missing state
* Five minute timeout
* Remove context from txnreq, thread through instead
* Fix unit tests
* Move currentstateserver API to roomserver
Stub out DB functions for now, nothing uses the roomserver version yet.
* Allow it to startup
* Implement some current-state-server storage interface functions
* Add missing package
* Initial FIFOing of roomserver inputs
* Remove EventID response from api.InputRoomEventsResponse
* Don't send back event ID unnecessarily
* Fix ordering hopefully
* Reduce copies, use buffered task channel to reduce contention on other rooms
* Fix error handling
* Add Queryer and use embedded structs
* Add Inputer and factor out more RS API stuff
This neatly splits up the RS API based on the functionality it provides,
whilst providing a useful place for code sharing via the `helpers` package.
* First pass at server ACLs (not efficient)
* Use transaction origin, update whitelist
* Fix federation API test
It's sufficient for us to return nothing in response to current state, so that the server ACL check returns no ACLs.
* More efficient server ACLs - hopefully
* Fix queries
* Fix queries
* Avoid panics by nil pointers
* Bug fixes
* Fix state event type
* Fix mutex
* Update logging
* Ignore port when matching servername
* Use read mutex
* Fix bugs
* Fix sync API test
* Comments
* Add tests, tweaks to behaviour
* Fix test output
* Initial pass at refactoring config (not finished)
* Don't forget current state and EDU servers
* More shifting around
* Update server key API tests
* Fix roomserver test
* Fix more tests
* Further tweaks
* Fix current state server test (sort of)
* Maybe fix appservices
* Fix client API test
* Include database connection string in database options
* Fix sync API build
* Update config test
* Fix unit tests
* Fix federation sender build
* Fix gobind build
* Set Listen address for all services in HTTP monolith mode
* Validate config, reinstate appservice derived in directory, tweaks
* Tweak federation API test
* Set MaxOpenConnections/MaxIdleConnections to previous values
* Update generate-config
* Add InputDeviceListUpdate
* Unbreak unit tests
* Process inbound device list updates from federation
- Persist the keys in the keyserver and produce key changes
- Does not currently fetch keys from the remote server if the prev IDs are missing
* Linting
* Add QueryDeviceMessages to serve up device keys and stream IDs
* Consume key change events in fedsender
Don't yet send them to destinations as we haven't worked them out yet
* Send device list updates to all required servers
* Glue it all together
* Use content_value instead of membership
* Fix build
* Replace publicroomsapi with a combination of clientapi/roomserver/currentstateserver
- All public rooms paths are now handled by clientapi
- Requests to (un)publish rooms are sent to the roomserver via `PerformPublish`
which are stored in a new `published_table.go`
- Requests for public rooms are handled in clientapi by:
* Fetch all room IDs which are published using `QueryPublishedRooms` on the roomserver.
* Apply pagination parameters to the slice.
* Do a `QueryBulkStateContent` request to the currentstateserver to pull out
required state event *content* (not entire events).
* Aggregate and return the chunk.
Mostly but not fully implemented (DB queries on currentstateserver are missing)
* Fix pq query
* Make postgres work
* Make sqlite work
* Fix tests
* Unbreak pagination tests
* Linting
* Add PerformInvite and refactor how errors get handled
- Rename `JoinError` to `PerformError`
- Remove `error` from the API function signature entirely. This forces
errors to be bundled into `PerformError` which makes it easier for callers
to detect and handle errors. On network errors, HTTP clients will make a
`PerformError`.
* Unbreak everything; thanks Go!
* Send back JSONResponse according to the PerformError
* Update federation invite code too
We would return a 403 first (as the server would not be allowed to
see this event) and only then return a 404 if the event is not in
the given room. We now invert those checks for /state and /state_ids
to make the tests pass.
* Fix rooms v3 url paths for good - with tests
- Add a test rig around `federationapi` to test routing.
- Use `JSONVerifier` over `KeyRing` so we can stub things out more easily.
- Add `test.NopJSONVerifier` which verifies nothing.
- Add `base.BaseMux` which is the original `mux.Router` used to spawn public/internal routers.
- Listen on `base.BaseMux` and not the default serve mux as it cleans paths which we don't want.
- Factor out `ListenAndServe` to `test.ListenAndServe` and add flag for listening on TLS.
* Fix comments
* Linting
* Minor perf/debugging improvements
- publicroomsapi: Don't call QueryEventsByID with no event IDs
- appservice: Consume only if there are 1 or more ASes
- roomserver: don't keep a copy of the request "for debugging" - we trace now
* fedsender: return early if we have no destinations
* Unbreak tests
* s/QueryBackfill/PerformBackfill/g
* OutputEvent now includes AddStateEvents which contain the full event of extra state events
* Only include adds not the current event
* Get adding state right
* Remove clientapi producers which aren't actually producers
They are actually just convenience wrappers around the internal APIs
for roomserver/eduserver. Move their logic to their respective `api`
packages and call them directly.
* Remove TODO
* unbreak ygg
* Use MissingAuthEventHandler on performjoin to try and speed up cases where we have missing events
* Update gomatrixserverlib
* Use supplied room version
* Use AuthChainProvider
* Tweaks
* Update gomatrixserverlib
* Signature checks
* Return bad request on CS API /send if bad JSON
* Return some more M_BAD_JSON in the right places
* nolint because damnit gocyclo all I added was a type check for an error
* Update gomatrixserverlib
* Update gomatrixserverlib
* Update sytest-whitelist
* Update gomatrixserverlib
* Update sytest-whitelist
* NotJSON -> BadJSON
* Groundwork for send-to-device messaging
* Update sample config
* Add unstable routing for now
* Send to device consumer in sync API
* Start the send-to-device consumer
* fix indentation in dendrite-config.yaml
* Create send-to-device database tables, other tweaks
* Add some logic for send-to-device messages, add them into sync stream
* Handle incoming send-to-device messages, count them with EDU stream pos
* Undo changes to test
* pq.Array
* Fix sync
* Logging
* Fix a couple of transaction things, fix client API
* Add send-to-device test, hopefully fix bugs
* Comments
* Refactor a bit
* Fix schema
* Fix queries
* Debug logging
* Fix storing and retrieving of send-to-device messages
* Try to avoid database locks
* Update sync position
* Use latest sync position
* Jiggle about sync a bit
* Fix tests
* Break out the retrieval from the update/delete behaviour
* Comments
* nolint on getResponseWithPDUsForCompleteSync
* Try to line up sync tokens again
* Implement wildcard
* Add all send-to-device tests to whitelist, what could possibly go wrong?
* Only care about wildcard when targeted locally
* Deduplicate transactions
* Handle tokens properly, return immediately if waiting send-to-device messages
* Fix sync
* Update sytest-whitelist
* Fix copyright notice (need to do more of this)
* Comments, copyrights
* Return errors from Do, fix dendritejs
* Review comments
* Comments
* Constructor for TransactionWriter
* defletions
* Update gomatrixserverlib, sytest-blacklist
* Separate muxes for public and internal APIs
* Update client-api-proxy and federation-api-proxy so they don't add /api to the path
* Tidy up
* Consistent HTTP setup
* Set up prefixes properly
* sytest: Make 'Inbound federation can backfill events' pass
This breaks 'Outbound federation can backfill events' because now
we are returning the right number of events, which the previous
test was relying on.
Previously, /messages was backfilling the membership event, causing
the test to pass. Now we are no longer backfilling the membership
event due to the change in this commit, causing the test to fail.
The test should instead be returning the membership event locally
from synacpis database, but it doesn't do it fast enough, resulting
in a no-op /sync response with a next_batch=s0_0 which will never
pick up the local membership event when it rolls in. The test
does attempt to retry, but doesn't take the new next_batch=s1_0
resulting in it missing from the /messages response.
* Linting