* Put federation client functions into their own file
* Look for missing auth events in RS input
* Remove retrieveMissingAuthEvents from federation API
* Logging
* Sorta transplanted the code over
* Use event origin failing all else
* Don't get stuck on mutexes:
* Add verifier
* Don't mark state events with zero snapshot NID as not existing
* Check missing state if not an outlier before storing the event
* Reject instead of soft-fail, don't copy roominfo so much
* Use synchronous contexts, limit time to fetch missing events
* Clean up some commented out bits
* Simplify `/send` endpoint significantly
* Submit async
* Report errors on sending to RS input
* Set max payload in NATS to 16MB
* Tweak metrics
* Add `workerForRoom` for tidiness
* Try skipping unmarshalling errors for RespMissingEvents
* Track missing prev events separately to avoid calculating state when not possible
* Tweak logic around checking missing state
* Care about state when checking missing prev events
* Don't check missing state for create events
* Try that again
* Handle create events better
* Send create room events as new
* Use given event kind when sending auth/state events
* Revert "Use given event kind when sending auth/state events"
This reverts commit 089d64d271b5fca8c104e1554711187420dbebca.
* Only search for missing prev events or state for new events
* Tweaks
* We only have missing prev if we don't supply state
* Room version tweaks
* Allow async inputs again
* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed
* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)
* Use work queue policy, deliver all on restart
* Reduce chance of duplicates being sent by NATS
* Limit the number of servers we attempt to reduce backpressure
* Some review comment fixes
* Tidy up a couple things
* Don't limit servers, randomise order using map
* Some context refactoring
* Update gmsl
* Don't resend create events
* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't
* Exclude our own servername
* Try backing off servers
* Make excluding self behaviour optional
* Exclude self from g_m_e
* Update sytest-whitelist
* Update consumers for the roomserver output stream
* Remember to send outliers for state returned from /gme
* Make full HTTP tests less upsetti
* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist
* Remove debugging test
* Fix blacklist again, remove unnecessary duplicate context
* Clearer contexts, don't use background in case there's something happening there
* Don't queue up events more than once in memory
* Correctly identify create events when checking for state
* Fill in gaps again in /gme code
* Remove `AuthEventIDs` from `InputRoomEvent`
* Remove stray field
Co-authored-by: Kegan Dougal <kegan@matrix.org>
* Add NATS JetStream support
Update shopify/sarama
* Fix addresses
* Don't change Addresses in Defaults
* Update saramajetstream
* Add missing error check
Keep typing events for at least one minute
* Use all configured NATS addresses
* Update saramajetstream
* Try setting up with NATS
* Make sure NATS uses own persistent directory (TODO: make this configurable)
* Update go.mod/go.sum
* Jetstream package
* Various other refactoring
* Build fixes
* Config tweaks, make random jetstream storage path for CI
* Disable interest policies
* Try to sane default on jetstream base path
* Try to use in-memory for CI
* Restore storage/retention
* Update nats.go dependency
* Adapt changes to config
* Remove unneeded TopicFor
* Dep update
* Revert "Remove unneeded TopicFor"
This reverts commit f5a4e4a339b6f94ec215778dca22204adaa893d1.
* Revert changes made to streams
* Fix build problems
* Update nats-server
* Update go.mod/go.sum
* Roomserver input API queuing using NATS
* Fix topic naming
* Prometheus metrics
* More refactoring to remove saramajetstream
* Add missing topic
* Don't try to populate map that doesn't exist
* Roomserver output topic
* Update go.mod/go.sum
* Message acknowledgements
* Ack tweaks
* Try to resume transaction re-sends
* Try to resume transaction re-sends
* Update to matrix-org/gomatrixserverlib@91dadfb
* Remove internal.PartitionStorer from components that don't consume keychanges
* Try to reduce re-allocations a bit in resolveConflictsV2
* Tweak delivery options on RS input
* Publish send-to-device messages into correct JetStream subject
* Async and sync roomserver input
* Update dendrite-config.yaml
* Remove roomserver tests for now (they need rewriting)
* Remove roomserver test again (was merged back in)
* Update documentation
* Docker updates
* More Docker updates
* Update Docker readme again
* Fix lint issues
* Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)
* Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that
* Go 1.16 instead of Go 1.13 for upgrade tests and Complement
* Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"
This reverts commit 368675283fc44501f227639811bdb16dd5deef8c.
* Don't report any errors on `/send` to see what fun that creates
* Fix panics on closed channel sends
* Enforce state key matches sender
* Do the same for leave
* Various tweaks to make tests happier
Squashed commit of the following:
commit 13f9028e7a63662759ce7c55504a9d2423058668
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 15:47:14 2022 +0000
Do the same for leave
commit e6be7f05c349fafbdddfe818337a17a60c867be1
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 15:33:42 2022 +0000
Enforce state key matches sender
commit 85ede6d64bf10ce9b91cdd6d80f87350ee55242f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 14:07:04 2022 +0000
Fix panics on closed channel sends
commit 9755494a98bed62450f8001d8128e40481d27e15
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 13:38:22 2022 +0000
Don't report any errors on `/send` to see what fun that creates
commit 3bb4f87b5dd56882febb4db5621db484c8789b7c
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 13:00:26 2022 +0000
Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"
This reverts commit 368675283fc44501f227639811bdb16dd5deef8c.
commit fe2673ed7be9559eaca134424e403a4faca100b0
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 12:09:34 2022 +0000
Go 1.16 instead of Go 1.13 for upgrade tests and Complement
commit 368675283fc44501f227639811bdb16dd5deef8c
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 11:51:45 2022 +0000
Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that
commit b028dfc08577bcf52e6cb498026e15fa5d46d07c
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 10:29:08 2022 +0000
Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)
* Merge in NATS Server v2.6.6 and nats.go v1.13 into the in-process connection fork
* Add `jetstream.WithJetStreamMessage` to make ack/nak-ing less messy, use process context in consumers
* Fix consumer component name in federation API
* Add comment explaining where streams are defined
* Tweaks to roomserver input with comments
* Finish that sentence that I apparently forgot to finish in INSTALL.md
* Bump version number of config to 2
* Add comments around asynchronous sends to roomserver in processEventWithMissingState
* More useful error message when the config version does not match
* Set version in generate-config
* Fix version in config.Defaults
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add more logs
To help debug the migration issue in #1924 along with manual data-loss-inducing fixes.
Also log the origin server on processed txns to help debug buggy server origins.
* Fix query
* Add more optimised code path for checking if we're in a room
* Fix database queries
* Fix federation API test
* Fix logging
* Review comments
* Make separate API call for room membership
* Ensure worker has work before starting goroutine
* Revert "Remove processEventWithMissingStateMutex"
This reverts commit 7f02eab47d.
* Use request context when processing transactions
* Keep goroutine count down by not starting work for things where the caller gave up
* Remove mutex, start workers at correct time
* Try to process rooms concurrently in FS /send
* Clean up
* Use request context so that dead things don't linger for so long
* Remove mutex
* Free up pdus slice so only references remaining are in channel
* Revert "Remove mutex"
This reverts commit 8558075e8c9bab3c1d8b2252b4ab40c7eaf774e8.
* Process EDUs in parallel
* Try refactoring /send concurrency
* Fix waitgroup
* Release on waitgroup
* Respond to transaction
* Reduce CPU usage, fix unit tests
* Tweaks
* Move into one file
* More aggressive event caching
* Deduplicate /state results
* Deduplicate more
* Ensure we use the correct list of events when excluding repeated state
* Fixes
* Ensure we track all events we already knew about properly
Squashed commit of the following:
commit 7fad77c10e3c1c78feddb37351812b209d9c0f25
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 15:06:52 2021 +0100
Fix processEventWithMissingStateMutexes
commit 138cddcac7b8373a8e1816a232f84a7bda6adcdf
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:59:44 2021 +0100
Use internal.MutexByRoom
commit 6e6f026cfad31da391ad261cfec16d41dff1b15b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:50:18 2021 +0100
Try to slow things down per room
commit b97d406dff2e11769a9202fbf58b138a541ca449
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:41:27 2021 +0100
Try to slow things down
commit 8866120ebf880b4fd8a456937f69903e233c19a2
Merge: 9f2de8a2 4a37b19a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:40:33 2021 +0100
Merge branch 'neilalexander/rsinputfifo' into neilalexander/rsinputfifo2
commit 4a37b19a8f6fe8af02e979827253d22a0ccdedb8
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:34:54 2021 +0100
Add comments
commit f9ab3f4b8157a42d657735101bc2c768c663e814
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:31:21 2021 +0100
Tweaks
commit 9f2de8a29cadec4c785d9c2e4e74c1138305f759
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:15:59 2021 +0100
Ask origin only for missing things for now
commit 8fd878c75a4066abb21597d524a4eb4670a392d4
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 11:18:11 2021 +0100
Make sure someone wakes up
commit b63f699f1b74948d180885449398f999fafb18c8
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 11:12:58 2021 +0100
Use a FIFO queue instead of a channel to reduce backpressure
* Add a per-room mutex to federationapi when processing transactions
This has numerous benefits:
- Prevents us doing lots of state resolutions in busy rooms. Previously, room forks would always result
in a state resolution being performed immediately, without checking if we were already doing this in
a different transaction. Now they will queue up, resulting in fewer calls to `/state_ids`, `/g_m_e`, etc.
- Prevents memory usage from growing too large as a result and potentially OOMing.
And costs:
- High traffic rooms will be slightly slower due to head-of-line blocking from other servers,
though this has always been an issue as roomserver has a per-room mutex already.
* Fix unit tests
* Correct mutex lock ordering
* Look up servers less often, don't hit API for missing auth events unless there are actually missing auth events
* Remove ResolveConflictsAdhoc (since it is already in GMSL), other tweaks
* Update gomatrixserverlib to matrix-org/gomatrixserverlib#254
* Fix resolve-state
* Initialise t.servers on first use
* fix conversion from int to string yields a string of one rune, not a string of digits
* Add receipts table to syncapi
* Use StreamingToken as the since value
* Add required method to testEDUProducer
* Make receipt json creation "easier" to read
* Add receipts api to the eduserver
* Add receipts endpoint
* Add eduserver kafka consumer
* Add missing kafka config
* Add passing tests to whitelist
Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
* Fix copy & paste error
* Fix column count error
* Make outbound federation receipts pass
* Make "Inbound federation rejects receipts from wrong remote" pass
* Don't use errors package
* - Add TODO for batching requests
- Rename variable
* Return a better error message
* - Use OutputReceiptEvent instead of InputReceiptEvent as result
- Don't use the errors package for errors
- Defer CloseAndLogIfError to close rows
- Fix Copyright
* Better creation/usage of JoinResponse
* Query all joined rooms instead of just one
* Update gomatrixserverlib
* Add sqlite3 migration
* Add postgres migration
* Ensure required sequence exists before running migrations
* Clarification on comment
* - Fix a bug when creating client receipts
- Use concrete types instead of interface{}
* Remove dead code
Use key for timestamp
* Fix postgres query...
* Remove single purpose struct
* Use key/value directly
* Only apply receipts on initial sync or if edu positions differ,
otherwise we'll be sending the same receipts over and over again.
* Actually update the id, so it is correctly send in syncs
* Set receipt on request to /read_markers
* Fix issue with receipts getting overwritten
* Use fmt.Errorf instead of pkg/errors
* Revert "Add postgres migration"
This reverts commit 722fe5a04628882b787d096942459961db159b06.
* Revert "Add sqlite3 migration"
This reverts commit d113b03f6495a4b8f8bcf158a3d00b510b4240cc.
* Fix selectRoomReceipts query
* Make golangci-lint happy
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add KindOld
* Don't process latest events/memberships for old events
* Allow federationsender to ignore duplicate key entries when LatestEventIDs is duplicated by RS output events
* Signal to downstream components if an event has become a forward extremity
* Don't exclude from sync
* Soft-fail checks on KindNew
* Don't run the latest events updater at all for KindOld
* Don't make federation sender change after all
* Kind in federation sender join
* Don't send isForwardExtremity
* Fix syncapi
* Update comments
* Fix SendEventWithState
* Update sytest-whitelist
* Generate old output events
* Sync API consumes old room events
* Update comments
* Capture errors
* Don't request only state key tuples needed for auth (we end up discarding room state this way)
* QueryStateAfterEvent returns all state when no tuples supplied
* Resolve state
* Comments
* Recursively fetch auth events if needed
* Fix processEvent call
* Ask more servers in lookupEvent
* Don't panic!
* Panic at the Disco
* Find servers more aggressively
* Add getServers
* Fix number of servers to 5, don't bail making RespState if auth events missing
* Fix panic
* Ignore missing state events too
* Report number of servers correctly
* Don't reuse request context for /send_join
* Update federation API tests
* Don't recurse processEvents
* Implement getEvents differently
* Adjust backfill to send backward extremity with state before other backfilled events, include prev_events with no state amongst missing events
* Not finished refactor
* Fix test
* Remove isInboundTxn
* Remove debug logging
* Try to ask other servers in the room for missing events if the origin won't provide them
* Logging
* More logging
* Implement QueryMissingAuthPrevEvents
* Try to get missing auth events badly
* Use processEvent
* Logging
* Update QueryMissingAuthPrevEvents
* Try to find missing auth events
* Patchy fix for test
* Logging tweaks
* Send auth events as outliers
* Update check in QueryMissingAuthPrevEvents
* Error responses
* More return codes
* Don't return error on reject/soft-fail since it was ultimately handled
* More tweaks
* More error tweaks
* WIP Event rejection
* Still send back errors for rejected events
Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.
* Implement rejected events
Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.
* Update test to match reality
* Modify InputRoomEvents to no longer return an error
Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.
* Remove redundant returns; linting
* Update blacklist