Some tweaks for the send-to-device consumers/producers:
- use `json.RawMessage` without marshalling it first
- try further devices (if available) if we failed to `PublishMsg` in the
producers
- some logging changes (to better debug E2EE issues)
Introduced index improves select query performance. Example execution time of `selectSendToDeviceMessagesSQL` query dropped from 80 ms to 15 ms. No sytest modifications are required.
### Pull Request Checklist
* [x] I have added added tests for PR _or_ I have justified why this PR doesn't need tests.
* [x] Pull request includes a [sign off](https://github.com/matrix-org/dendrite/blob/main/docs/CONTRIBUTING.md#sign-off)
Signed-off-by: `Piotr Kozimor <p1996k@gmail.com>`
This should fix an issue where we return less than the expected membership events, when doing an initial sync.
When doing an initial sync, the state limit is set to `math.MaxInt32`, while the default filter is set to 20.
- Reverts 9dc57122d9 as it was causing issues https://github.com/matrix-org/dendrite/issues/2660
- Updates the GMSL `DefaultStateFilter` to use a limit of 20 events
- Uses the timeline events to determine the new position instead of the state events
This should hopefully deflake Backfill works correctly with history visibility set to joined as we were using the default shared visibility, even if the events are set to joined (or something else)
* Use existing current room state if we have it
* Don't dedupe before applying the history vis filter
* Revert "Don't dedupe before applying the history vis filter"
This reverts commit d27c4a0874dabb77c2eda6b23eb7c00478bc9e90.
* Revert "Use existing current room state if we have it"
This reverts commit 5819b4a7ce511204c4fb48d3c4741612b136e2ea.
* Tweaks
* Only return non-retired invites
* Revert "Only return non-retired invites"
This reverts commit 1150aa7f385b7d7cf5378297f3e17566d5aabcc6.
* Check if we're doing an initial sync in the stream
* Add possibility to set history_visibility and user AccountType
* Add new DB queries
* Add actual history_visibility changes for /messages
* Add passing tests
* Extract check function
* Cleanup
* Cleanup
* Fix build on 386
* Move ApplyHistoryVisibilityFilter to internal
* Move queries to topology table
* Add filtering to /sync and /context
Some cleanup
* Add passing tests; Remove failing tests :(
* Re-add passing tests
* Move filtering to own function to avoid duplication
* Re-add passing test
* Use newly added GMSL HistoryVisibility
* Update gomatrixserverlib
* Set the visibility when creating events
* Default to shared history visibility
* Remove unused query
* Update history visibility checks to use gmsl
Update tests
* Remove unused statement
* Update migrations to set "correct" history visibility
* Add method to fetch the membership at a given event
* Tweaks and logging
* Use actual internal rsAPI, default to shared visibility in tests
* Revert "Move queries to topology table"
This reverts commit 4f0d41be9c194a46379796435ce73e79203edbd6.
* Remove noise/unneeded code
* More cleanup
* Try to optimize database requests
* Fix imports
* PR peview fixes/changes
* Move setting history visibility to own migration, be more restrictive
* Fix unit tests
* Lint
* Fix missing entries
* Tweaks for incremental syncs
* Adapt generic changes
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
* Generic-based internal HTTP API (tested out on a few endpoints in the federation API)
* Add `PerformInvite`
* More tweaks
* Fix metric name
* Fix LookupStateIDs
* Lots of changes to clients
* Some serverside stuff
* Some error handling
* Use paths as metric names
* Revert "Use paths as metric names"
This reverts commit a9323a6a343f5ce6461a2e5bd570fe06465f1b15.
* Namespace metric names
* Remove duplicate entry
* Remove another duplicate entry
* Tweak error handling
* Some more tweaks
* Update error behaviour
* Some more error tweaking
* Fix API path for `PerformDeleteKeys`
* Fix another path
* Tweak federation client proxying
* Fix another path
* Don't return typed nils
* Some more tweaks, not that it makes any difference
* Tweak federation client proxying
* Maybe fix the key backup test
* Bypass lazyLoadCache if we're doing an initial sync
* Make the linter happy again?
* Revert "Make the linter happy again?"
This reverts commit 52a5691ba3c17c05698bcc6a13092090f27ace63.
* Try that again
* Invalidate LazyLoadCache on initial syncs
* Remove unneeded check
* Add TODO
* Rename Invalite -> InvalidateLazyLoadedUser
* Thanks IDE
* Fix notification query
* Also for SQLite
* Move tests to whitelist
* Revert "Move tests to whitelist"
This reverts commit a7d0120019a111ce45a447ba40233d9c101e6e9b.
* Add race testing to tests, and fix a few small race conditions in the tests
* Enable run-sytest on MacOS
* Remove deadlock detecting mutex, per code review feedback
* Remove autoformatting related changes and a closure that is not needed
* Adjust to importing nats client as 'natsclient'
Signed-off-by: Brian Meek <brian@hntlabs.com>
* Clarify the use of gooseMutex to proect goose internal state
Signed-off-by: Brian Meek <brian@hntlabs.com>
* Remove no longer needed mutex for guarding goose
Signed-off-by: Brian Meek <brian@hntlabs.com>
* Fix query issue, only add "changed" users if we actually share a room
* Avoid log spam if context is done
* Undo changes to filterSharedUsers
* Add logging again..
* Fix SQLite shared users query
* Change query to include invited users
Issue: During conversation, under some conditions, sync cookie is not advanced, and, as a result, client loops on the same sync API call creating high traffic and CPU load.
Fix: pdu component of cookie was updated incorrectly.
* Add new db migration
* Update migrations
Remove goose
* Add possibility to test direct upgrades
* Try to fix WASM test
* Add checks for specific migrations
* Remove AddMigration
Use WithTransaction
Add Dendrite version to table
* Fix linter issues
* Update tests
* Update comments, outdent if
* Namespace migrations
* Add direct upgrade tests, skipping over one version
* Split migrations
* Update go version in CI
* Fix copy&paste mistake
* Use contexts in migrations
Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Membership updater refactoring
* Pass in membership state
* Use membership check rather than referring to state directly
* Delete irrelevant membership states
* We don't need the leave event after all
* Tweaks
* Put a log entry in that I might stand a chance of finding
* Be less panicky
* Tweak invite handling
* Don't freak if we can't find the event NID
* Use event NID from `types.Event`
* Clean up
* Better invite handling
* Placate the almighty linter
* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand
* Fix the sytest after all (thanks @S7evinK for the spot)
* Add function to the sync API storage package for filtering shared users
* Use the database instead of asking the RS API
* Fix unit tests
* Fix map handling in `filterSharedUsers`
* Try Ristretto cache
* Tweak
* It's beautiful
* Update GMSL
* More strict keyable interface
* Fix that some more
* Make less panicky
* Don't enforce mutability checks for now
* Determine mutability using deep equality
* Tweaks
* Namespace keys
* Make federation caches mutable
* Update cost estimation, add metric
* Update GMSL
* Estimate cost for metrics better
* Reduce counters a bit
* Try caching events
* Some guards
* Try again
* Try this
* Use separate caches for hopefully better hash distribution
* Fix bug with admitting events into cache
* Try to fix bugs
* Check nil
* Try that again
* Preserve order jeezo this is messy
* thanks VS Code for doing exactly the wrong thing
* Try this again
* Be more specific
* aaaaargh
* One more time
* That might be better
* Stronger sorting
* Cache expiries, async publishing of EDUs
* Put it back
* Use a shared cache again
* Cost estimation fixes
* Update ristretto
* Reduce counters a bit
* Clean up a bit
* Update GMSL
* 1GB
* Configurable cache sizees
* Tweaks
* Add `config.DataUnit` for specifying friendly cache sizes
* Various tweaks
* Update GMSL
* Add back some lazy loading caching
* Include key in cost
* Include key in cost
* Tweak max age handling, config key name
* Only register prometheus metrics if requested
* Review comments @S7evinK
* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)
* Review comments
* Update sample configs
* Update GHA Workflow
* Update Complement images to Go 1.18
* Remove the cache test from the federation API as we no longer guarantee immediate cache admission
* Don't check the caches in the renewal test
* Possibly fix the upgrade tests
* Update to matrix-org/gomatrixserverlib#322
* Update documentation to refer to Go 1.18
This should avoid coercions between signed and unsigned ints which might fix problems like `sql: converting argument $5 type: uint64 values with high bit set are not supported`.
* Check state before event
* Tweaks
* Refactor a bit, include in output events
* Don't waste time if soft failed either
* Tweak control flow, comments, use GMSL history visibility type
* syncapi: don't return early for no-op incremental syncs
Comments explain why, but basically it's an inefficient use
of bandwidth and some sytests rely on /sync to block.
* Honour timeouts
* Actually return a response with timeout=0
* bugfix: fix race condition when updating presence via /sync
Previously when presence is updated via /sync, we would send the presence update
asyncly via NATS. This created a race condition:
- If the presence update is processed quickly, the /sync which triggered the presence
update would see an online presence.
- If the presence update was processed slowly, the /sync which triggered the presence
update would see an offline presence.
This is the root cause behind the flakey sytest: 'User sees their own presence in a sync'.
The fix is to ensure we update the database/advance the stream position synchronously
for local users.
* Bugfix for test
* Fix flakey sytest 'Local device key changes get to remote servers'
* Debug logs
* Remove internal/test and use /test only
Remove a lot of ancient code too.
* Use FederationRoomserverAPI in more places
* Use more interfaces in federationapi; begin adding regression test
* Linting
* Add regression test
* Unbreak tests
* ALL THE LOGS
* Fix a race condition which could cause events to not be sent to servers
If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.
* Unbreak new tests
* Linting
* Fix OTK spam
* Update comment
* Optimize selectKeysCountSQL to only return max 100 keys
* Return CurrentPosition if the request timed out
* Revert "Return CurrentPosition if the request timed out"
This reverts commit 7dbdda964189f5542048c06ce5ffc6d4da1814e6.
Co-authored-by: kegsay <kegan@matrix.org>
* Add very basic syncapi tests
* Add a way to inject jetstream messages
* implement add_state_ids
* bugfixes
* Unbreak tests
* Remove now un-needed API call
* Linting
* Don't ask roomserver for events we already have in federation API
* Check number of events returned is as expected
* Preallocate array
* Improve shape a bit
* syncapi: use finer-grained interfaces when making the syncapi
* Use specific interfaces for syncapi-roomserver interactions
* Define query access token api for shared http auth code
* Initial phone home stats queries
* Add userAgent to UpdateDeviceLastSeen
Add new Table for tracking daily user vists
* Add user_daily_visits table
* Fix queries
* userapi stats tables & queries
* userapi interface and internal api
* sycnapi stats queries
* testing phone home stats
* Add complete config to syncapi
* add missing files
* Fix queries
* Send empty request
* Add version & monolith stats
* Add configuration for phone home stats
* Move WASM to its own file, add config and comments
* Add tracing methods
* Add total rooms
* Add more fields, actually send data somewhere
* Move stats to the userapi
* Move phone home stats to util package
* Cleanup
* Linter & parts of GH comments
* More GH comments changes
- Move comments to SQL statements
- Shrink interface, add struct for stats
- No fatal errors, use defaults
* Be more explicit when querying
* Fix wrong calculation & wrong query params
Add tests
* Add Windows stats
* ADd build constraint
* Use new testing structure
Fix issues with getting values when using SQLite
Fix wrong AddDate value
Export UpdateUserDailyVisits
* Fix query params
* Fix test
* Add comment about countR30UsersSQL and countR30UsersV2SQL; fix test
* Update config
* Also update example config file
* Use OS level proxy, update logging
Co-authored-by: kegsay <kegan@matrix.org>
* Simplify federation API `AddPublicRoutes`
* Simplify client API `AddPublicRoutes`
* Simplify media API `AddPublicRoutes`
* Simplify sync API `AddPublicRoutes`
* Simplify `AddAllPublicRoutes`
* Only load members of newly joined rooms
* Comment that the query is prepared at runtime
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Use filter and limit presence count
* More limiting
* More limiting
* Fix unit test
* Also limit presence by last_active_ts
* Update query, use "from" as the initial lastPos
* Get 1000 presence events, they are filtered later
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Don't create fictitious presence entries for users that don't have any
* Update whitelist, since that test probably shouldn't be passing
* Fix panics
Squashed commit of the following:
commit 0ec8de57261d573a5f88577aa9d7a1174d3999b9
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Apr 26 16:56:30 2022 +0100
Select filter onto provided target filter
commit da40b6fffbf5737864b223f49900048f557941f9
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Apr 26 16:48:00 2022 +0100
Specify other field too
commit ffc0b0801f63bb4d3061b6813e3ce5f3b4c8fbcb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Apr 26 16:45:44 2022 +0100
Send as much account data as possible during complete sync
* Initial work on lazyloading
* Partially implement lazy loading on /sync
* Rename methods
* Make missing tests pass
* Preallocate slice, even if it will end up with fewer values
* Let the cache handle the user mapping
* Linter
* Cap cache growth
* Precompute values for `userIDSet` in sync notifier
* Mutexes
* Fixes
* Sensible initial value
* Update syncapi/notifier/notifier.go
Co-authored-by: Till <2353100+S7evinK@users.noreply.github.com>
* Placate the almighty linter
Co-authored-by: Till <2353100+S7evinK@users.noreply.github.com>
* syncapi: add more tests; fix more bugs
bugfixes:
- The postgres impl of TopologyTable.SelectEventIDsInRange did not use the provided txn
- The postgres impl of EventsTable.SelectEvents did not preserve the ordering of the input event IDs in the output events slice
- The sqlite impl of EventsTable.SelectEvents did not use a bulk `IN ($1)` query.
Added tests:
- `TestGetEventsInRangeWithTopologyToken`
- `TestOutputRoomEventsTable`
- `TestTopologyTable`
* -p 1 for now
* Add response size and requests total to internal handler
* Move MustRegister calls to New* funcs
* Move MustRegister back to init
* Init at some place, minimize changes
* Add test infrastructure code for dendrite unit/integ tests
Start re-enabling some syncapi storage tests in the process.
* Linting
* Add postgres service to unit tests
* dendrite not syncv3
* Skip test which doesn't work
* Linting
* Add `jetstream.PrepareForTests`
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add ignore users
* Ignore users in pushrules
Add passing tests
* Update sytest lists
* Store ignore knowledge in the sync API
* Fix copyrights
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Micro-optimisations, lock fixes
* Refactor `SharedUsers`
* Reuse map to reduce allocations/GC pressure
* oh yeah, initialise it
* Leave room for the user ID we'll no doubt append afterward
* Include joined and invite member counts in room summary
This should fix#2314 and also fix the problem where some clients like Element Android, Fluffychat etc would display the wrong member count for a given room.
* Improve SQLite query precision
* Check existence of state key for membership events
* Move receipt sending to own JetStream producer
* Move SendToDevice to producer
* Remove most parts of the EDU server
* Fix SendToDevice & copyrights
* Move structs, cleanup EDU Server traces
* Use HeadersOnly subscription
* Missing file
* Fix linter issues
* Move consumers to own files
* Rename durable consumer; Consumer cleanup
* Docs/config cleanup
* Use latest event position in response for advancing the stream position in an incremental sync
* Create some calm
* Use To in worst case
* Don't waste CPU cycles on an empty response after all
* Bug fixes
* Fix another bug