# Dendrite to Synapse Migration - TODO ## Goal Migrate local PostgreSQL data from Dendrite to Synapse. Minimum: users, rooms, messages, files. ## Status: ALL PHASES COMPLETE AND VALIDATED Tested against real Dendrite DB dump (bocken.org): - 1194 users, 492 rooms, 51474 events, 2747 media files, 3309 thumbnails - Full migration runs in ~8 seconds - Synapse starts cleanly, admin API returns correct data - Messages, room state, memberships, media metadata all verified ## Architecture Notes ### Dendrite Schema (Go, NID-based) - Uses numeric IDs (NIDs) for rooms, events, event types, state keys - Event JSON stored separately in `roomserver_event_json` - Event types mapped via `roomserver_event_types` (nid -> string) - State keys mapped via `roomserver_event_state_keys` (nid -> string) - Membership uses numeric nid references (target_nid = event_state_key_nid of user) - Media uses `base64hash` (SHA-256) for dedup, stored in `mediaapi_media_repository` - Media files: `{base}/{hash[0]}/{hash[1]}/{hash[2:]}/file` - Thumbnails: `{base}/{hash[0]}/{hash[1]}/{hash[2:]}/thumbnail-{w}x{h}-{method}` - Accounts in `userapi_accounts`, profiles in `userapi_profiles` ### Synapse Schema (Python, text-based) - Uses text IDs directly everywhere - Event JSON in `event_json` table, metadata in `events` table - State managed via `state_groups` + `state_groups_state` with delta chains - Membership in `room_memberships` + `local_current_membership` - Media in `local_media_repository` (uses media_id as filesystem key) - Media files: `{base}/local_content/{id[0:2]}/{id[2:4]}/{id[4:]}` - Thumbnails: `{base}/local_thumbnails/{id[0:2]}/{id[2:4]}/{id[4:]}/{w}-{h}-{top}-{sub}-{method}` - Accounts in `users`, profiles in `profiles` - Schema versioned (currently v93-94), needs Synapse pre-init to create schema ### Key Mapping: Dendrite -> Synapse | Dendrite Table | Synapse Table | Notes | |---|---|---| | userapi_accounts | users | password_hash, created_ts (ms->s), account_type->is_guest/admin | | userapi_profiles | profiles | user_id=localpart, full_user_id=@user:server | | userapi_devices | devices + access_tokens | direct map; access_token preserved so clients don't re-login | | roomserver_rooms | rooms | room_id, room_version; creator from m.room.create events | | roomserver_events + event_json | events + event_json | denormalize NIDs, topological_ordering=depth | | syncapi_current_room_state | current_state_events | direct map | | syncapi_current_room_state (member) | room_memberships + local_current_membership | | | mediaapi_media_repository | local_media_repository | media_id, type, size, upload_name, user_id | | mediaapi_thumbnail | local_media_repository_thumbnails | | | syncapi_receipts | receipts_linearized + receipts_graph | partial unique index for NULL thread_id | | roomserver_redactions | redactions | | ## Tasks ### Phase 0: Setup - [x] Explore Dendrite schema - [x] Explore Synapse schema - [x] Create migration plan - [x] Create script skeleton with connection handling + CLI args ### Phase 1: Users & Profiles - [x] Migrate userapi_accounts -> users (created_ts ms->s conversion) - [x] Migrate userapi_profiles -> profiles (user_id=localpart, full_user_id=@user:server) - [x] Migrate userapi_devices -> devices - [x] Tested: 1194 users, 1194 profiles, 13 devices ### Phase 2: Rooms - [x] Migrate roomserver_rooms -> rooms - [x] Extract room creator from m.room.create events - [x] Migrate roomserver_room_aliases -> room_aliases + room_alias_servers - [x] Tested: 492 rooms, correct creators ### Phase 3: Events (Core) - [x] Build event_type NID->string and state_key NID->string lookups - [x] Migrate events with denormalized types/state_keys - [x] stream_ordering = global sequential, topological_ordering = depth - [x] internal_metadata = "{}" (stream_ordering/outlier read from events columns) - [x] format_version mapped from room version (v1-2->1, v3->2, v4-10->3, v11+->4) - [x] processed = True for migrated events - [x] Migrate event_json with correct format - [x] Populate state_events (events where state_key IS NOT NULL) - [x] Build event_edges from prev_events in event JSON - [x] Build event_auth from auth_events in event JSON - [x] Forward extremities from Dendrite's latest_event_nids - [x] room_depth from MIN(depth) per room - [x] Tested: 51474 events, 24609 state events, 489 fwd extremities ### Phase 4: Room State - [x] current_state_events from syncapi_current_room_state - [x] Incremental state groups: one per state event, delta chains via state_group_edges - [x] All events mapped to correct state group via event_to_state_groups - [x] Tested: 24609 state groups, 51474 event mappings, 0 unmapped events ### Phase 5: Membership - [x] Migrate from syncapi_current_room_state (type=m.room.member) -> room_memberships - [x] Populate local_current_membership for local users - [x] Include event_stream_ordering FK - [x] Tested: 7254 memberships, 3220 local memberships ### Phase 6: Media - [x] Migrate mediaapi_media_repository -> local_media_repository - [x] Migrate mediaapi_thumbnail -> local_media_repository_thumbnails - [x] Copy content files: Dendrite `{base}/{hash[0]}/{hash[1]}/{hash[2:]}/file` -> Synapse `{base}/local_content/{id[0:2]}/{id[2:4]}/{id[4:]}` - [x] Copy thumbnails: Dendrite `thumbnail-{w}x{h}-{method}` -> Synapse `{w}-{h}-{top}-{sub}-{method}` - [x] Tested: 2747 media, 3309 thumbnails, file paths verified ### Phase 7: Auxiliary Data - [x] Migrate receipts (receipts_linearized + receipts_graph, partial unique index) - [x] Migrate redactions - [x] Populate room_stats_current (member counts by type) - [x] Populate room_stats_state (room name, topic, encryption, etc.) - [x] Update events_stream_seq sequence - [x] Populate user_stats_current - [x] Tested: 857 receipts, 216 redactions, 492 room stats ### Validation - [x] Synapse starts against migrated DB without errors - [x] Admin API: 488 rooms visible with correct names and member counts - [x] Messages accessible and readable via API - [x] Room state correct (creator, version, state types) - [x] Media metadata accessible via admin statistics API - [x] Background updates run normally post-migration ## Findings / Issues Log - Dendrite event_state_key_nid 0 = not a state event, nid 1 = '' (empty string) - Dendrite event_type_nid preassigned: 1=m.room.create, 2=power_levels, 3=join_rules, 4=third_party_invite, 5=member, 6=redaction, 7=history_visibility - Synapse topological_ordering = depth (NOT a per-room counter) - Synapse internal_metadata JSON should be "{}" - stream_ordering and outlier loaded from events table columns - Synapse format_version: room v1-2=1, v3=2, v4-10=3, v11+=4 - Synapse receipts_linearized has partial unique index WHERE thread_id IS NULL - Synapse room_alias_servers has no unique constraint - must check-before-insert - Synapse profiles unique on user_id (localpart), NOT on full_user_id - Forward extremities: use Dendrite's latest_event_nids, don't compute from graph - 2262 rejected events in Dendrite skipped during migration - 5548 orphan event edges (referencing federated events we don't have) - normal - Synapse background updates recalculate some stats after startup - normal - E2EE: three things must be migrated together for encrypted history to survive — (1) `userapi_devices.access_token` -> `access_tokens` so clients don't re-login (re-login usually wipes the local Megolm store and always changes device_id, breaking Olm continuity), (2) `syncapi_send_to_device` -> `device_inbox` so undelivered m.room.encrypted Olm messages (Megolm key shares to offline devices) reach the recipient, (3) `device_lists_stream` seeded from local devices so clients re-verify e2e_device_keys_json on first sync (otherwise: stale cache mismatches). v1 of the migration only moved the Synapse-native E2EE tables; that left 1480 pending to-device messages stranded and all clients forced to re-login, which is the root cause of "partial key loss" reported against v1. - E2EE: `keyserver_fallback_keys` -> `e2e_fallback_keys_json` added so new Olm sessions still succeed after OTKs are exhausted. - E2EE: `e2e_cross_signing_keys.stream_id` now drawn from the matching sequence (`e2e_cross_signing_keys_sequence`) to avoid UNIQUE(stream_id) collisions on subsequent Synapse writes. - Re-running Phase 8 duplicates `e2e_cross_signing_signatures` rows (no unique constraint on the Synapse side). TRUNCATE before re-running, or run once. ## Usage ```bash # Prerequisites: Synapse must be initialized first (creates schema) python3 -m synapse.app.homeserver --config-path homeserver.yaml # start+stop once # Full migration python3 migrate.py \ --dendrite-db "dbname=dendrite host=/run/postgresql" \ --synapse-db "dbname=synapse host=/run/postgresql" \ --server-name "example.com" \ --dendrite-media-path /var/lib/dendrite/media \ --synapse-media-path /var/lib/synapse/media_store \ --phase 1,2,3,4,5,6,7 # Selective phases (e.g., just re-run media) python3 migrate.py ... --phase 6 # Dry run (no commits) python3 migrate.py ... --dry-run ```