8.9 KiB
8.9 KiB
Dendrite to Synapse Migration - TODO
Goal
Migrate local PostgreSQL data from Dendrite to Synapse. Minimum: users, rooms, messages, files.
Status: ALL PHASES COMPLETE AND VALIDATED
Tested against real Dendrite DB dump (bocken.org):
- 1194 users, 492 rooms, 51474 events, 2747 media files, 3309 thumbnails
- Full migration runs in ~8 seconds
- Synapse starts cleanly, admin API returns correct data
- Messages, room state, memberships, media metadata all verified
Architecture Notes
Dendrite Schema (Go, NID-based)
- Uses numeric IDs (NIDs) for rooms, events, event types, state keys
- Event JSON stored separately in
roomserver_event_json - Event types mapped via
roomserver_event_types(nid -> string) - State keys mapped via
roomserver_event_state_keys(nid -> string) - Membership uses numeric nid references (target_nid = event_state_key_nid of user)
- Media uses
base64hash(SHA-256) for dedup, stored inmediaapi_media_repository - Media files:
{base}/{hash[0]}/{hash[1]}/{hash[2:]}/file - Thumbnails:
{base}/{hash[0]}/{hash[1]}/{hash[2:]}/thumbnail-{w}x{h}-{method} - Accounts in
userapi_accounts, profiles inuserapi_profiles
Synapse Schema (Python, text-based)
- Uses text IDs directly everywhere
- Event JSON in
event_jsontable, metadata ineventstable - State managed via
state_groups+state_groups_statewith delta chains - Membership in
room_memberships+local_current_membership - Media in
local_media_repository(uses media_id as filesystem key) - Media files:
{base}/local_content/{id[0:2]}/{id[2:4]}/{id[4:]} - Thumbnails:
{base}/local_thumbnails/{id[0:2]}/{id[2:4]}/{id[4:]}/{w}-{h}-{top}-{sub}-{method} - Accounts in
users, profiles inprofiles - Schema versioned (currently v93-94), needs Synapse pre-init to create schema
Key Mapping: Dendrite -> Synapse
| Dendrite Table | Synapse Table | Notes |
|---|---|---|
| userapi_accounts | users | password_hash, created_ts (ms->s), account_type->is_guest/admin |
| userapi_profiles | profiles | user_id=localpart, full_user_id=@user:server |
| userapi_devices | devices + access_tokens | direct map; access_token preserved so clients don't re-login |
| roomserver_rooms | rooms | room_id, room_version; creator from m.room.create events |
| roomserver_events + event_json | events + event_json | denormalize NIDs, topological_ordering=depth |
| syncapi_current_room_state | current_state_events | direct map |
| syncapi_current_room_state (member) | room_memberships + local_current_membership | |
| mediaapi_media_repository | local_media_repository | media_id, type, size, upload_name, user_id |
| mediaapi_thumbnail | local_media_repository_thumbnails | |
| syncapi_receipts | receipts_linearized + receipts_graph | partial unique index for NULL thread_id |
| roomserver_redactions | redactions |
Tasks
Phase 0: Setup
- Explore Dendrite schema
- Explore Synapse schema
- Create migration plan
- Create script skeleton with connection handling + CLI args
Phase 1: Users & Profiles
- Migrate userapi_accounts -> users (created_ts ms->s conversion)
- Migrate userapi_profiles -> profiles (user_id=localpart, full_user_id=@user:server)
- Migrate userapi_devices -> devices
- Tested: 1194 users, 1194 profiles, 13 devices
Phase 2: Rooms
- Migrate roomserver_rooms -> rooms
- Extract room creator from m.room.create events
- Migrate roomserver_room_aliases -> room_aliases + room_alias_servers
- Tested: 492 rooms, correct creators
Phase 3: Events (Core)
- Build event_type NID->string and state_key NID->string lookups
- Migrate events with denormalized types/state_keys
- stream_ordering = global sequential, topological_ordering = depth
- internal_metadata = "{}" (stream_ordering/outlier read from events columns)
- format_version mapped from room version (v1-2->1, v3->2, v4-10->3, v11+->4)
- processed = True for migrated events
- Migrate event_json with correct format
- Populate state_events (events where state_key IS NOT NULL)
- Build event_edges from prev_events in event JSON
- Build event_auth from auth_events in event JSON
- Forward extremities from Dendrite's latest_event_nids
- room_depth from MIN(depth) per room
- Tested: 51474 events, 24609 state events, 489 fwd extremities
Phase 4: Room State
- current_state_events from syncapi_current_room_state
- Incremental state groups: one per state event, delta chains via state_group_edges
- All events mapped to correct state group via event_to_state_groups
- Tested: 24609 state groups, 51474 event mappings, 0 unmapped events
Phase 5: Membership
- Migrate from syncapi_current_room_state (type=m.room.member) -> room_memberships
- Populate local_current_membership for local users
- Include event_stream_ordering FK
- Tested: 7254 memberships, 3220 local memberships
Phase 6: Media
- Migrate mediaapi_media_repository -> local_media_repository
- Migrate mediaapi_thumbnail -> local_media_repository_thumbnails
- Copy content files: Dendrite
{base}/{hash[0]}/{hash[1]}/{hash[2:]}/file-> Synapse{base}/local_content/{id[0:2]}/{id[2:4]}/{id[4:]} - Copy thumbnails: Dendrite
thumbnail-{w}x{h}-{method}-> Synapse{w}-{h}-{top}-{sub}-{method} - Tested: 2747 media, 3309 thumbnails, file paths verified
Phase 7: Auxiliary Data
- Migrate receipts (receipts_linearized + receipts_graph, partial unique index)
- Migrate redactions
- Populate room_stats_current (member counts by type)
- Populate room_stats_state (room name, topic, encryption, etc.)
- Update events_stream_seq sequence
- Populate user_stats_current
- Tested: 857 receipts, 216 redactions, 492 room stats
Validation
- Synapse starts against migrated DB without errors
- Admin API: 488 rooms visible with correct names and member counts
- Messages accessible and readable via API
- Room state correct (creator, version, state types)
- Media metadata accessible via admin statistics API
- Background updates run normally post-migration
Findings / Issues Log
- Dendrite event_state_key_nid 0 = not a state event, nid 1 = '' (empty string)
- Dendrite event_type_nid preassigned: 1=m.room.create, 2=power_levels, 3=join_rules, 4=third_party_invite, 5=member, 6=redaction, 7=history_visibility
- Synapse topological_ordering = depth (NOT a per-room counter)
- Synapse internal_metadata JSON should be "{}" - stream_ordering and outlier loaded from events table columns
- Synapse format_version: room v1-2=1, v3=2, v4-10=3, v11+=4
- Synapse receipts_linearized has partial unique index WHERE thread_id IS NULL
- Synapse room_alias_servers has no unique constraint - must check-before-insert
- Synapse profiles unique on user_id (localpart), NOT on full_user_id
- Forward extremities: use Dendrite's latest_event_nids, don't compute from graph
- 2262 rejected events in Dendrite skipped during migration
- 5548 orphan event edges (referencing federated events we don't have) - normal
- Synapse background updates recalculate some stats after startup - normal
- E2EE: three things must be migrated together for encrypted history to survive —
(1)
userapi_devices.access_token->access_tokensso clients don't re-login (re-login usually wipes the local Megolm store and always changes device_id, breaking Olm continuity), (2)syncapi_send_to_device->device_inboxso undelivered m.room.encrypted Olm messages (Megolm key shares to offline devices) reach the recipient, (3)device_lists_streamseeded from local devices so clients re-verify e2e_device_keys_json on first sync (otherwise: stale cache mismatches). v1 of the migration only moved the Synapse-native E2EE tables; that left 1480 pending to-device messages stranded and all clients forced to re-login, which is the root cause of "partial key loss" reported against v1. - E2EE:
keyserver_fallback_keys->e2e_fallback_keys_jsonadded so new Olm sessions still succeed after OTKs are exhausted. - E2EE:
e2e_cross_signing_keys.stream_idnow drawn from the matching sequence (e2e_cross_signing_keys_sequence) to avoid UNIQUE(stream_id) collisions on subsequent Synapse writes. - Re-running Phase 8 duplicates
e2e_cross_signing_signaturesrows (no unique constraint on the Synapse side). TRUNCATE before re-running, or run once.
Usage
# Prerequisites: Synapse must be initialized first (creates schema)
python3 -m synapse.app.homeserver --config-path homeserver.yaml # start+stop once
# Full migration
python3 migrate.py \
--dendrite-db "dbname=dendrite host=/run/postgresql" \
--synapse-db "dbname=synapse host=/run/postgresql" \
--server-name "example.com" \
--dendrite-media-path /var/lib/dendrite/media \
--synapse-media-path /var/lib/synapse/media_store \
--phase 1,2,3,4,5,6,7
# Selective phases (e.g., just re-run media)
python3 migrate.py ... --phase 6
# Dry run (no commits)
python3 migrate.py ... --dry-run