Files
dendrite2synapse/TODO.md

8.9 KiB

Dendrite to Synapse Migration - TODO

Goal

Migrate local PostgreSQL data from Dendrite to Synapse. Minimum: users, rooms, messages, files.

Status: ALL PHASES COMPLETE AND VALIDATED

Tested against real Dendrite DB dump (bocken.org):

  • 1194 users, 492 rooms, 51474 events, 2747 media files, 3309 thumbnails
  • Full migration runs in ~8 seconds
  • Synapse starts cleanly, admin API returns correct data
  • Messages, room state, memberships, media metadata all verified

Architecture Notes

Dendrite Schema (Go, NID-based)

  • Uses numeric IDs (NIDs) for rooms, events, event types, state keys
  • Event JSON stored separately in roomserver_event_json
  • Event types mapped via roomserver_event_types (nid -> string)
  • State keys mapped via roomserver_event_state_keys (nid -> string)
  • Membership uses numeric nid references (target_nid = event_state_key_nid of user)
  • Media uses base64hash (SHA-256) for dedup, stored in mediaapi_media_repository
  • Media files: {base}/{hash[0]}/{hash[1]}/{hash[2:]}/file
  • Thumbnails: {base}/{hash[0]}/{hash[1]}/{hash[2:]}/thumbnail-{w}x{h}-{method}
  • Accounts in userapi_accounts, profiles in userapi_profiles

Synapse Schema (Python, text-based)

  • Uses text IDs directly everywhere
  • Event JSON in event_json table, metadata in events table
  • State managed via state_groups + state_groups_state with delta chains
  • Membership in room_memberships + local_current_membership
  • Media in local_media_repository (uses media_id as filesystem key)
  • Media files: {base}/local_content/{id[0:2]}/{id[2:4]}/{id[4:]}
  • Thumbnails: {base}/local_thumbnails/{id[0:2]}/{id[2:4]}/{id[4:]}/{w}-{h}-{top}-{sub}-{method}
  • Accounts in users, profiles in profiles
  • Schema versioned (currently v93-94), needs Synapse pre-init to create schema

Key Mapping: Dendrite -> Synapse

Dendrite Table Synapse Table Notes
userapi_accounts users password_hash, created_ts (ms->s), account_type->is_guest/admin
userapi_profiles profiles user_id=localpart, full_user_id=@user:server
userapi_devices devices + access_tokens direct map; access_token preserved so clients don't re-login
roomserver_rooms rooms room_id, room_version; creator from m.room.create events
roomserver_events + event_json events + event_json denormalize NIDs, topological_ordering=depth
syncapi_current_room_state current_state_events direct map
syncapi_current_room_state (member) room_memberships + local_current_membership
mediaapi_media_repository local_media_repository media_id, type, size, upload_name, user_id
mediaapi_thumbnail local_media_repository_thumbnails
syncapi_receipts receipts_linearized + receipts_graph partial unique index for NULL thread_id
roomserver_redactions redactions

Tasks

Phase 0: Setup

  • Explore Dendrite schema
  • Explore Synapse schema
  • Create migration plan
  • Create script skeleton with connection handling + CLI args

Phase 1: Users & Profiles

  • Migrate userapi_accounts -> users (created_ts ms->s conversion)
  • Migrate userapi_profiles -> profiles (user_id=localpart, full_user_id=@user:server)
  • Migrate userapi_devices -> devices
  • Tested: 1194 users, 1194 profiles, 13 devices

Phase 2: Rooms

  • Migrate roomserver_rooms -> rooms
  • Extract room creator from m.room.create events
  • Migrate roomserver_room_aliases -> room_aliases + room_alias_servers
  • Tested: 492 rooms, correct creators

Phase 3: Events (Core)

  • Build event_type NID->string and state_key NID->string lookups
  • Migrate events with denormalized types/state_keys
  • stream_ordering = global sequential, topological_ordering = depth
  • internal_metadata = "{}" (stream_ordering/outlier read from events columns)
  • format_version mapped from room version (v1-2->1, v3->2, v4-10->3, v11+->4)
  • processed = True for migrated events
  • Migrate event_json with correct format
  • Populate state_events (events where state_key IS NOT NULL)
  • Build event_edges from prev_events in event JSON
  • Build event_auth from auth_events in event JSON
  • Forward extremities from Dendrite's latest_event_nids
  • room_depth from MIN(depth) per room
  • Tested: 51474 events, 24609 state events, 489 fwd extremities

Phase 4: Room State

  • current_state_events from syncapi_current_room_state
  • Incremental state groups: one per state event, delta chains via state_group_edges
  • All events mapped to correct state group via event_to_state_groups
  • Tested: 24609 state groups, 51474 event mappings, 0 unmapped events

Phase 5: Membership

  • Migrate from syncapi_current_room_state (type=m.room.member) -> room_memberships
  • Populate local_current_membership for local users
  • Include event_stream_ordering FK
  • Tested: 7254 memberships, 3220 local memberships

Phase 6: Media

  • Migrate mediaapi_media_repository -> local_media_repository
  • Migrate mediaapi_thumbnail -> local_media_repository_thumbnails
  • Copy content files: Dendrite {base}/{hash[0]}/{hash[1]}/{hash[2:]}/file -> Synapse {base}/local_content/{id[0:2]}/{id[2:4]}/{id[4:]}
  • Copy thumbnails: Dendrite thumbnail-{w}x{h}-{method} -> Synapse {w}-{h}-{top}-{sub}-{method}
  • Tested: 2747 media, 3309 thumbnails, file paths verified

Phase 7: Auxiliary Data

  • Migrate receipts (receipts_linearized + receipts_graph, partial unique index)
  • Migrate redactions
  • Populate room_stats_current (member counts by type)
  • Populate room_stats_state (room name, topic, encryption, etc.)
  • Update events_stream_seq sequence
  • Populate user_stats_current
  • Tested: 857 receipts, 216 redactions, 492 room stats

Validation

  • Synapse starts against migrated DB without errors
  • Admin API: 488 rooms visible with correct names and member counts
  • Messages accessible and readable via API
  • Room state correct (creator, version, state types)
  • Media metadata accessible via admin statistics API
  • Background updates run normally post-migration

Findings / Issues Log

  • Dendrite event_state_key_nid 0 = not a state event, nid 1 = '' (empty string)
  • Dendrite event_type_nid preassigned: 1=m.room.create, 2=power_levels, 3=join_rules, 4=third_party_invite, 5=member, 6=redaction, 7=history_visibility
  • Synapse topological_ordering = depth (NOT a per-room counter)
  • Synapse internal_metadata JSON should be "{}" - stream_ordering and outlier loaded from events table columns
  • Synapse format_version: room v1-2=1, v3=2, v4-10=3, v11+=4
  • Synapse receipts_linearized has partial unique index WHERE thread_id IS NULL
  • Synapse room_alias_servers has no unique constraint - must check-before-insert
  • Synapse profiles unique on user_id (localpart), NOT on full_user_id
  • Forward extremities: use Dendrite's latest_event_nids, don't compute from graph
  • 2262 rejected events in Dendrite skipped during migration
  • 5548 orphan event edges (referencing federated events we don't have) - normal
  • Synapse background updates recalculate some stats after startup - normal
  • E2EE: three things must be migrated together for encrypted history to survive — (1) userapi_devices.access_token -> access_tokens so clients don't re-login (re-login usually wipes the local Megolm store and always changes device_id, breaking Olm continuity), (2) syncapi_send_to_device -> device_inbox so undelivered m.room.encrypted Olm messages (Megolm key shares to offline devices) reach the recipient, (3) device_lists_stream seeded from local devices so clients re-verify e2e_device_keys_json on first sync (otherwise: stale cache mismatches). v1 of the migration only moved the Synapse-native E2EE tables; that left 1480 pending to-device messages stranded and all clients forced to re-login, which is the root cause of "partial key loss" reported against v1.
  • E2EE: keyserver_fallback_keys -> e2e_fallback_keys_json added so new Olm sessions still succeed after OTKs are exhausted.
  • E2EE: e2e_cross_signing_keys.stream_id now drawn from the matching sequence (e2e_cross_signing_keys_sequence) to avoid UNIQUE(stream_id) collisions on subsequent Synapse writes.
  • Re-running Phase 8 duplicates e2e_cross_signing_signatures rows (no unique constraint on the Synapse side). TRUNCATE before re-running, or run once.

Usage

# Prerequisites: Synapse must be initialized first (creates schema)
python3 -m synapse.app.homeserver --config-path homeserver.yaml  # start+stop once

# Full migration
python3 migrate.py \
    --dendrite-db "dbname=dendrite host=/run/postgresql" \
    --synapse-db "dbname=synapse host=/run/postgresql" \
    --server-name "example.com" \
    --dendrite-media-path /var/lib/dendrite/media \
    --synapse-media-path /var/lib/synapse/media_store \
    --phase 1,2,3,4,5,6,7

# Selective phases (e.g., just re-run media)
python3 migrate.py ... --phase 6

# Dry run (no commits)
python3 migrate.py ... --dry-run