Fixes https://github.com/matrix-org/dendrite/issues/3240 and potentially
a root cause for state resets.
While testing, I've had added some more debug logging:
```
time="2023-12-16T18:13:11.319458084Z" level=warning msg="already processed event" event_id="$qFYMl_F2vb1N0yxmvlFAMhqhGhLKq4kA-o_YCQKH7tQ" kind=KindNew times=2
time="2023-12-16T18:13:14.537389126Z" level=warning msg="already processed event" event_id="$EU-LTsKErT6Mt1k12-p_3xOHfiLaK6gtwVDlZ35lSuo" kind=KindNew times=5
time="2023-12-16T18:13:16.789551206Z" level=warning msg="already processed event" event_id="$dIPuAfTL5x0VyG873LKPslQeljCSxFT1WKxUtjIMUGE" kind=KindNew times=5
time="2023-12-16T18:13:17.383838767Z" level=warning msg="already processed event" event_id="$7noSZiCkzerpkz_UBO3iatpRnaOiPx-3IXc0GPDQVGE" kind=KindNew times=2
time="2023-12-16T18:13:22.091946597Z" level=warning msg="already processed event" event_id="$3Lvo3Wbi2ol9-nNbQ93N-E2MuGQCJZo5397KkFH-W6E" kind=KindNew times=1
time="2023-12-16T18:13:23.026417446Z" level=warning msg="already processed event" event_id="$lj1xS46zsLBCChhKOLJEG-bu7z-_pq9i_Y2DUIjzGy4" kind=KindNew times=4
```
So we did receive the same event over and over again. Given they are
`KindNew`, we don't short circuit if we already processed them, which
potentially caused the state to be calculated with a now wrong state
snapshot.
Also fixes the back pressure metric. We now correctly increment the
counter once we sent the message to NATS and decrement it once we
actually processed an event.
This should fix#3004 by making sure we also update our in-memory ACLs
after joining a new room.
Also makes use of more caching in `GetStateEvent`
Bonus: Adds some tests, as I was about to use `GetBulkStateContent`, but
turns out that `GetStateEvent` is basically doing the same, just that it
only gets the `eventTypeNID`/`eventStateKeyNID` once and not for every
call.
Fixes include:
- Translating state keys that contain user IDs to their respective room
keys for both querying and sending state events
- **NOTE**: there may be design discussion needed on what should happen
when sender keys cannot be found for users
- A simple fix for kicking guests from rooms properly
- Logic for boundary history visibilities was slightly off (I'm
surprised this only manifested in pseudo ID room versions)
Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
They are fundamentally different concepts, so should be represented as
such. Proto events are exchanged in /make_xxx calls over federation, and
made as "fledgling" events in /createRoom and general event sending.
*Building* events is a reasonably complex VERSION SPECIFIC process which
needs amongst other things, auth event providers, prev events, signing
keys, etc.
Requires https://github.com/matrix-org/gomatrixserverlib/pull/379
Requires https://github.com/matrix-org/gomatrixserverlib/pull/376
This has numerous upsides:
- Less type casting to `*Event` is required.
- Making Dendrite work with `PDU` interfaces means we can swap out Event
impls more easily.
- Tests which represent weird event shapes are easier to write.
Part of a series of refactors on GMSL.
We only use it in a few places currently, enough to get things to
compile and run. We should be using it in much more places.
Similarly, in some places we cast []PDU back to []*Event, we need to not
do that. Likewise, in some places we cast PDU to *Event, we need to not
do that. For now though, hopefully this is a start.
Replaced with types.HeaderedEvent _for now_. In reality we want to move
them all to gmsl.Event and only use HeaderedEvent when we _need_ to
bundle the version/event ID with the event (seriailsation boundaries,
and even then only when we don't have the room version).
Requires https://github.com/matrix-org/gomatrixserverlib/pull/373
Adds tests for `QueryRestrictedJoinAllowed`, `IsServerAllowed` and
`PerformRoomUpgrade`. Refactors the `QueryRoomVersionForRoom` method to
accept a string and return a `gmsl.RoomVersion` instead of req/resp
structs.
Adds some more caching for `GetStateEvent`
This should also fix#2912 by ignoring state events belonging to other
users.
We need to check the redaction PL in Dendrite, if we do it in GMSL, we
end up not sending the event to the output stream because it will be
rejected.
---------
Co-authored-by: kegsay <kegan@matrix.org>
This PR changes the following:
- `StoreEvent` now only stores an event (and possibly prev event),
instead of also doing redactions
- Adds a `MaybeRedactEvent` (pulled out from `StoreEvent`), which should
be called after storing events
- a few other things
This PR changes a few things:
- It pulls out the creation of several NIDs from the `StoreEvent`
function to make the functions more reusable
- Uses more caching when using those NIDs to avoid DB round trips
Needs https://github.com/matrix-org/sytest/pull/1315, as otherwise the
membership events aren't persisted yet when hitting `/state` after
kicking guest users.
Makes the following tests pass:
```
Guest users denied access over federation if guest access prohibited
Guest users are kicked from guest_access rooms on revocation of guest_access
Guest users are kicked from guest_access rooms on revocation of guest_access over federation
```
Todo (in a follow up PR):
- Restrict access to CS API Endpoints as per
https://spec.matrix.org/v1.4/client-server-api/#client-behaviour-14
Co-authored-by: kegsay <kegan@matrix.org>
Makes the following tests pass
```
/upgrade moves remote aliases to the new room
Local and remote users' homeservers remove a room from their public directory on upgrade
```
Fixes `outliers whose auth_events are in a different room are correctly
rejected`, by validating that auth events are all from the same room and
not using rejected events for event auth.
commit 1929b688e31987c46e0c8a546f0f9cb0a46bf9a3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Aug 22 10:09:44 2022 +0100
Still process state-before for soft-failed events
commit e83c0b701d40d78b92072c4643f6bc6f71b72800
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Aug 22 10:06:50 2022 +0100
Improve logging
commit 29e26124bc27cb83d449de2a4214b253c594aa93
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Aug 22 09:58:13 2022 +0100
Don't store soft-failed events as rejected
* Reprocess outliers that were previously rejected
* Might as well do all events this way
* More useful errors
* Fix queries
* Tweak condition
* Don't wrap errors
* Report more useful error
* Flatten error on `r.Queryer.QueryStateAfterEvents`
* Some more debug logging
* Flatten error in `QueryRestrictedJoinAllowed`
* Revert "Flatten error in `QueryRestrictedJoinAllowed`"
This reverts commit 1238b4184c30e0c31ffb0f364806fa1275aba483.
* Tweak `QueryStateAfterEvents`
* Handle MissingStateError too
* Scope to room
* Clean up
* Fix the error
* Only apply rejection check to outliers
* Add possibility to set history_visibility and user AccountType
* Add new DB queries
* Add actual history_visibility changes for /messages
* Add passing tests
* Extract check function
* Cleanup
* Cleanup
* Fix build on 386
* Move ApplyHistoryVisibilityFilter to internal
* Move queries to topology table
* Add filtering to /sync and /context
Some cleanup
* Add passing tests; Remove failing tests :(
* Re-add passing tests
* Move filtering to own function to avoid duplication
* Re-add passing test
* Use newly added GMSL HistoryVisibility
* Update gomatrixserverlib
* Set the visibility when creating events
* Default to shared history visibility
* Remove unused query
* Update history visibility checks to use gmsl
Update tests
* Remove unused statement
* Update migrations to set "correct" history visibility
* Add method to fetch the membership at a given event
* Tweaks and logging
* Use actual internal rsAPI, default to shared visibility in tests
* Revert "Move queries to topology table"
This reverts commit 4f0d41be9c194a46379796435ce73e79203edbd6.
* Remove noise/unneeded code
* More cleanup
* Try to optimize database requests
* Fix imports
* PR peview fixes/changes
* Move setting history visibility to own migration, be more restrictive
* Fix unit tests
* Lint
* Fix missing entries
* Tweaks for incremental syncs
* Adapt generic changes
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
* Try Ristretto cache
* Tweak
* It's beautiful
* Update GMSL
* More strict keyable interface
* Fix that some more
* Make less panicky
* Don't enforce mutability checks for now
* Determine mutability using deep equality
* Tweaks
* Namespace keys
* Make federation caches mutable
* Update cost estimation, add metric
* Update GMSL
* Estimate cost for metrics better
* Reduce counters a bit
* Try caching events
* Some guards
* Try again
* Try this
* Use separate caches for hopefully better hash distribution
* Fix bug with admitting events into cache
* Try to fix bugs
* Check nil
* Try that again
* Preserve order jeezo this is messy
* thanks VS Code for doing exactly the wrong thing
* Try this again
* Be more specific
* aaaaargh
* One more time
* That might be better
* Stronger sorting
* Cache expiries, async publishing of EDUs
* Put it back
* Use a shared cache again
* Cost estimation fixes
* Update ristretto
* Reduce counters a bit
* Clean up a bit
* Update GMSL
* 1GB
* Configurable cache sizees
* Tweaks
* Add `config.DataUnit` for specifying friendly cache sizes
* Various tweaks
* Update GMSL
* Add back some lazy loading caching
* Include key in cost
* Include key in cost
* Tweak max age handling, config key name
* Only register prometheus metrics if requested
* Review comments @S7evinK
* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)
* Review comments
* Update sample configs
* Update GHA Workflow
* Update Complement images to Go 1.18
* Remove the cache test from the federation API as we no longer guarantee immediate cache admission
* Don't check the caches in the renewal test
* Possibly fix the upgrade tests
* Update to matrix-org/gomatrixserverlib#322
* Update documentation to refer to Go 1.18
* Check state before event
* Tweaks
* Refactor a bit, include in output events
* Don't waste time if soft failed either
* Tweak control flow, comments, use GMSL history visibility type
* Add response size and requests total to internal handler
* Move MustRegister calls to New* funcs
* Move MustRegister back to init
* Init at some place, minimize changes
* Ensure the input API only uses a single transaction
* Remove more of the dead query API call
* Tidy up
* Fix tests hopefully
* Don't do unnecessary work for rooms that don't exist
* Improve error, fix another case where transaction wasn't used properly
* Add a unit test for checking single transaction on RS input API
* Fix logic oops when deciding whether to use a transaction in storeEvent
* Check that we have a populated state snapshot when determining if we closed the gap
* Do the same in the query API
* Use HasState more opportunistically
* Try to avoid falling down the hole of using a trustworthy but empty state snapshot for non-create events
* Refactor missing state and make sure that we really solve the problem for the new event
* Comments
* Review comments
* Tweak that check again
* Tidy up that create check further
* Fix build hopefully
* Update sendOutliers to use OrderAuthAndStateEvents
* Don't go out of bounds on missingEvents
* Use new event json types in gmsl
* Fix EventJSON to actually unmarshal events
* Update GMSL
* Bump GMSL and improve error messages
* Send back the correct RespState
* Update GMSL