FIP: Snapchain #207

varunsrin · 2024-11-25T06:48:24Z

varunsrin
Nov 25, 2024
Maintainer

Title: Snapchain
Type: Implementation FIP
Authors: @sanjay, @v
Acknowledgements: @deodad, @jneu, @cassie, @horsefacts.eth, @vrypan.eth, @sds, @dynemyte, jeremy

Abstract

Snapchain is a blockchain-like network for storing and syncing Farcaster's social data. It has stronger and faster consistency guarantees than the current deltagraph system which is finding it hard to keep all the nodes in sync in near real-time. The tradeoff we make for the consistency improvements is a new consensus step that introduces more complexity and failure modes which must be addressed.

Problem

A decentralized social network is one where two users can find each other and communicate, even under adverse conditions. Users must be able to run a node and use it to communicate with each other. Each node must reach consensus about a user’s state and stay in sync with other nodes. If Alice follows Bob at one node, it must make sure that she wasn’t already following Bob and then update this relationship on every other node.

Users generate a lot of transactions and expect real-time delivery. Twitter, for example, has 200M daily users and sees 10k TPS and is
likely to see 1TB - 10TB/day in state growth. Existing decentralized networks can’t handle this kind of load with real-time delivery. It’s not because it’s impossible, but because they make tradeoffs to solve different user problems. Blockchains move money and are designed to prevent double spends, which makes sharding and pruning data difficult. Federated systems like email are shard-able but have weak decentralization and consistency, which makes apps harder to build. See Appendix D for more details.

Farcaster has used a CRDT-based system called a deltagraph to decentralize social data. By defining every transaction as a CRDT operation, consensus is reached immediately without coordination at the local node. The changes are then gossiped out to peers which can lazily update their own state. The network served 100k users doing 500 TPS with 2GB/day state growth in early 2024.

As the network grew to thousands of nodes, some of them get out of sync due to gossip failures. Since CRDTs are unordered, a node could only detect gossip failures by syncing manually with every other node and comparing all transactions. This becomes slow and eventually infeasible as the number of nodes and valid messages cross some threshold (see Appendix C). The lack of ordering also meant that the network cannot enforce global rate limits, and they must be localized to each node. The side effect of this is that a transaction that passes the limits on one node might be rejected by another. Without strict ordering, it's hard to guarantee both real-time delivery and strong consistency.

Specification

Snapchain introduces transaction ordering and blockchain-like semantics to Farcaster. A block production step is added which groups and orders user transactions. Syncing is much simpler since a node only needs to find and download missing blocks. Snapchain, like the deltagraph before it, relies on an external blockchain to handle account creation and fee collection.

Snapchain is different from most blockchains because its transactions are not turing complete, are account independent and pruned often. A transaction is a "post", "like" or other social operation which only affects a single account. This is important for scaling since it prevents the network from being used for non-social purposes and makes sharding by account easy. Older transactions are pruned to clear data from inactive users or negating transactions, such as when a user likes and unlikes the something.

The initial release of snapchain should support a TPS of > 9000 which would support 2 million daily users.

1. Accounts

Users create and manage accounts using an external blockchain. This incurs some fees during setup but is necessary for the strong security and consistency guarantees. Calling the registry contract onchain issues a unique account number or farcaster id to the wallet. Signed messages from this wallet are treated by Snapchain as authorized actions from the account. Accounts can be transferred between wallets at any time, though an address may only own one account at a time.

Accounts can acquire human-friendly ENS usernames by proving ownership with an onchain or offchain proof. All references to the account are made to the farcaster id, which in turn is mapped to the verified ENS username by clients. This lets users change their username without having to resign all data on the network. This system can also be extended to non-ENS name systems if desired.

Accounts can issue "app keys" onchain which are keys with a narrower set of permissions. They can post messages on behalf of the account but cannot change ownership of the account or modify other app keys. They are used like auth tokens to delegate permissions to clients safely. It may be possible to implement app keys on Snapchain in future, avoiding onchain fees for modifying them.

Account recovery is built into registry contract which lets the wallet nominate another address which also controls the farcaster id. This could be set to the user's primary wallet, an m-of-n social recovery multisig or institutional recovery wallet. User may also compose their own recovery systems by converting the wallet into a smart wallet which can implement custom recovery logic.

2. Transactions

A blockchain transaction is a Farcaster specific transactions that happens on an external blockchain. An example is when Alice makes a transaction to the registry contract to get her farcaster id and set up her app keys. Snapchain nodes listen to and store blockchain transactions in their history.

A snapchain transaction is a social action like making a new post. Alice says “Hello World” by making an add-post transaction, signing it with her app key and broadcasting it. Nodes verify that every transaction is correctly signed according to the specification. Common actions like deleting posts or following other users have their own transaction types. Snapchain transactions are self-authenticating and anyone trace the authenticity from the message to the app key to the wallet to the farcaster id.

3. Account State

An account comes into existence when a blockchain transaction is made to create a new account in the registry. It's state is simply the set of blockchain and snapchain transactions that it generates. A deterministic state root can be computed by putting all the transaction ids into a merkle trie. Transactions made by one account cannot affect the state of another account. Enforcing this restriction makes Snapchain more scalable since account-level sharding becomes trivial to implement.

When a new transaction is accepted, it may be added to the state or it may replace a previous transaction in the state or it may delete a previous transaction entirely.In the example below, we see Alice’s account state changing as she creates an account, adds a post and then deletes it.

Formal definition: There exists a state (S) for an account (A) made up of transactions. S is a subset of all transactions made by a user (S ⊆ Ta). A merge function M accepts an S and t and returns a new state S’ ( M:S×t→S′). Each T is idempotent but not associative or commutative.

4. Blocks

Snapchain and blockchain transactions are sequenced into blocks. A block must have a signature from the block producer, a link to the previous block and a global state root. The global state root is the root of the global state trie, whose leaves are the roots of each account state trie. If the state of any account changes, the global state root also changes.

Blocks are produced by a committee of block validators and tendermint is used to reach consensus. A leader is chosen to produce the block and at least two-thirds of other validators must sign off. Snapchain is byzantine tolerant and up to one-third of the network can be malicious without affecting block production. Validators are selected through a voting committee which is described in Appendix A.

Blocks are grouped into epochs that are K blocks in length. A special epoch block is published at the beginning which contains additional metadata used to re-configure chain parameters. These blocks must be preserved forever and cannot be pruned. One example of epoch metadata is the leader rotation schedule. Leaders must be rotated periodically or if they fail to produce a block. The schedule for the next K blocks is determined using a deterministic function and included in every epoch block.

Nodes get new blocks from their peers and update account states. After a week, non-epoch blocks can be pruned by nodes to free up disk space. Pruning permanently removes deleted posts and likes which is desirable feature for users. The week’s delay ensures that nodes that go offline even for a few days can catch up by streaming blocks from their peers.

Nodes that go offline for long periods (or that start from scratch) must use snapshot sync instead. The protocol will publish daily snapshots of the global state to a file server as a public good. The snapshot is tamper-proof since modifying transactions will invalidate block signatures and omitting transactions will invalidate the global root. Nodes can download the state snapshot and then stream blocks from their peers to catch up.

5. State Rent

Decentralized networks can be flooded with transactions which consume disk space, bandwidth and compute. Blockchains control this by imposing a per-transaction fee, but this isn’t great for a social network. If users have to worry about fees for each post, they will post less frequently which is bad for the network.

Snapchain gives users practically unlimited transactions if they pay a yearly fee. Users must rent a storage unit on the external blockchain after creating an account. Each unit gives them a rate limit (500 tx/hour) and a storage limit (10,000 txns) for their account state. Users can buy multiple units to increase these limits but in practice 99% of users rarely need more than one.

Usage feels “unlimited” because when storage limits are exceeded a user’s oldest transaction is discarded instead of preventing the newer transaction from confirming. Each transaction type (post, like, follow) has its own set of limits and a newer post will push out the oldest post. This generally works well in a timeline based social network because older posts are rarely revisited and most users are comfortable with the ephemeral behavior. Those who want more permanence can pay for additional storage units or archive data elsewhere.

The benefits of this system are that users don’t really have to think about storage and can just keep using the network. One downside is that a single storage unit must have separate, fixed limits for each types and users with different usage patterns may feel that they are wasting storage. Another downside is that while expiring the oldest message is a reasonable decision for posts, it may not be the right tradeoff for something like a follow. Apps may need to implement safeguards to protect users from blowing away certain historical data when limits are exceeded.

6. Sharding

Snapchain can be sharded into N segments using N+1 tendermint chains to improve scalability. Accounts are assigned to a chain using a deterministic function. In the example below, odd numbered accounts are assigned to one shard and even ones to the other. The N+1th chain is used to unite all the shards so that they appear as a single chain. Our approach to sharding is inspired by NEAR’s Nightshade.

A shard chain must have at least three validators and store all relevant account state. Validators may be automatically or manually rotated between shards through a validator schedule in the epoch block. Erasure coding is used to distribute account state from one shard across validators in other shards so that the data is still available even if all validators within a shard fail.

Block production is triggered when the previous block is finalized. Each shard chain bundles transactions into a block and computes a shard root, which is like the global root but limited to accounts in a shard. The N+1th shard chain waits for the N shards to be produced and then performs another tendermint step bundling them into a single block and computes a global state root across the shard roots.

7. Sync

Nodes rely on gossip as the primary mechanism for p2p communication. When a block is produced, the header is gossiped out on a topic and shards are sent out on separate topics. Gossip failures are reasonably easy to recover from due to ordering. If a block is skipped, a sequence jump will be detected and the node is aware that they missed a block. All nodes will expose rpc endpoints which can be used to fetch older blocks.

Validators also rely on gossip to manage the mempool and for inter-validator communication when consensus is being reached on the state of a block. All the tendermint consensus steps happen via gossip messages. Validators may also expose rpc endpoints for failure recovery.

8. Handling Failures

Validators can fail in a variety of ways and we must define how the network behaves in each scenario. Let’s start with the honest malfunctions:

Shard leader fails to produce a shard — after 5 seconds, consensus changes leadership according to the rotation. We can tolerate the failure of up to 1/3 of the validators.
A shard is not produced in time for the block — block production continues. If they fail to produce a shard for an entire epoch, the chain is halted.
A block is not produced — after 5 seconds, consensus changes leadership according to the rotation. We can tolerate the failure of up to 1/3 of the block leaders.
Blockchain (OP Mainnet) reorgs — TBD, need to consider this in some detail and should also cover the simpler case of chain stalls.

If nodes are behaving maliciously, there are more attack scenarios that are possible:

Block leader excludes shards or halts production — mitigated by rotating leaders, but governance action is needed to evict them permanently and solve the issue.
Shard leader excludes a user’s transactions — mitigated by rotating shard leaders, but governance action is needed to evict them * permanently and solve the issue.
Shard validator majority excludes a user’s transactions — if more than 2/3rd of a validators shards collude they can censor a user, and governance action is needed to resolve.
Block validator majority excludes a shard — if more than 2/3rd of block validators collude they can censor a shard, and governance action is needed to resolve.
Shard validator majority can reissue a shard before block finality — TBD are malicious, they can reissue a shard for a block before it gets finalized.
If > 2/3 majority of block validators and > 2/3 majority of one shard validators collude, they can reissue a block which would cause a network fork. Requires a refork and restart of the network,

9. Open Questions

Should “app keys” be moved back onto Snapchain instead of being onchain? See discussion: https://warpcast.com/v/0x12ec1470
Should storage rent be paid by applications and not by users? See discussion: https://warpcast.com/v/0xde06c50b
Should snapchain have timestamp guarantees?
Should storage limits be fixed-size and in bytes, with apps implementing the “rolling” logic?
Do we need some form of cross shard communication for usernames? Can we bring farcaster usernames into the Snapchain system?
Should Validator keys be ECDSA or EDDSA?

Rationale

What exactly is hard about sync today?
(See Appendix C)

Why not fork a blockchain instead of designing a new one?
(See Appendix D)

Why was tendermint chosen as the consensus algorithm?
It has been used in production systems for many years, has fast finality and good liveness guarantees. There are also well written implementations in Go and even one in Rust.

Will validators be able to censor users?
Censorship will be challenging with as few as ten globally distributed validators. There is no direct economic gain or loss caused by censorship. Users being censored can amplify their message via others and censorship is provable by observing transactions in the mempool. If all validators do collude, the voting committee described in Section A acts as a check and balance to change the validators set. If all the validators and voters collude, it may be possible to censor.

Should we take a different approach that makes censorship even harder?
It is possible to design even more decentralized forms of governance and block production to make censorship less practical. The argument against this is that censorship is already reasonably impractical and most of these designs come with great cost to system complexity or user experience which makes the network less likely to be useful. It is also important to remember that Snapchain has been upgraded in the past as requirements have changed, and can be upgraded again in the future if necesary.

Release

There are three major milestones for the development of Snapchain:

Milestone	Date	Sumary
Alpha	Dec 2024	A fully functional node that can be run locally
Testnet	Jan 2025	A testnet that mirrors the deltagraph's state on mainnet.
Mainnet	Feb 2025	A mainnet that replaces the deltagraph.

More details will be added about the specific migration path once the alpha milestone is achieved.

Appendix

Appendix A: Consensus System

A group of 15 voters are chosen through rough consensus from the Farcaster community. They must be technical enough to understand the tradeoffs of decentralization.
A group of 11 validators are chosen by the voters. A vote is called every 6 months and each validator must receive votes from a majority of participating voters. At least 80% of voters must participate. Ties are resolved with a majority vote between the two tied results.
Votes are cast and stored in a Github repository as public signatures from the users wallet.
Voters may call a vote to replace a malfunctioning node as long as there is sufficient proof.
Voters may call a vote at to replace a malicious or absentee voter with a majority vote.
Voters must replace one of their number every 6 months through a majority vote.

Appendix B: Extensions

Partial Sync

Apps may want to operate nodes that sync only a subset of the networks data because the cost of syncing everything is too expensive. This isn’t a huge concern today since the network fits easily onto cloud instances at under 200 GB. However, if this becomes a problem it will be possible to let some nodes sync specific accounts. The tradeoff is that they can’t contribute to block propagation on the network and will be relegated to being “edge nodes”.

Appendix C: Why is sync hard today?

A question that’s come up a few times about Snapchain is some variant of “why is syncing hard today?”

There is no source of truth to sync from - Messages can be added or removed from any node at any point in history due to the eventually consistent nature of CRDTs. Changes are gossiped out when they happen, but this could fail for a variety of reasons. The only way for a node to catch up 100% is to 1) sync with every other node and compare every message and 2) prevent messages from entering the network until this is completed. There are 4000 nodes x 150 million messages today with 100s of messages changing every second making this impossible.
Rate limits cause nodes to diverge — rate limits are important to protect the network since we do not charge transaction fees. global rate limiting is impossible with crdts, so they are implemented per node. It is possible for a message to be temporarily rejected from a hub due to rate limits, but accepted by others.
Pruning complicates things — pruning means that when one message is received another, older message might be removed. this means that older state is constantly being modified by newer messages so its hard to be efficient about comparing message ids and hard to reason about why two nodes diverge.
Unidirectional sync is slow. A node can be “ahead” of another node for some accounts state and “behind” it for another account. In order for these nodes to get into sync, both of them must pull data from the other node (bidirectional sync) before any state change happens. In practice, this is challenging to implementing and we rely on unidirectional sync which means that only some state converges.

One class of solutions was “partial ordering” — the basic idea was that we would chain messages by having each message reference the previous one. The chains would either be per user or per app, instead of the total ordering that Snapchain proposes. The benefit of this approach is that we do not need a heavyweight consensus model since in the happy path each chain is typically only edited by one node at a time.

One way to think about this is that it reduces the sync space. Our nodes today must compare the total set of messages which is 150M items. If you can have a chain per user, that’s down to 1M items. If you have a chain per app that’s probably closer to just ~1000 unique items to compare per sync.

But there are still some unsolved problems:

Pruning is not possible — because there is a chain, we cannot easily prune older state because the tombstones are necessary for sync to function.
Rate limits are still hard — there’s no way to reach consensus across users or apps, so the limits would still be local and diverge.
Forking causes a lot of thrash — a user or app can “fork” their chain by introducing a conflicting message at some point in history. This would invalidate all future messages, which causes a lot of sync thrash and is an easy way to DDOS the network.
There is still no source of truth — a node still has to sync with every other node to converge because we are using CRDTs. We have reduced the search space from 4000 nodes * 150M items to 4000 nodes * 1000 app chains. But nodes will still be slightly out of sync with each other, and the problem will return as we add more nodes or items.
The migration path is messy — since messages need to be chained to other messages, we have to update older messages to this new format. but the problem is that messages are signed, and unless the user comes online with their key the message cannot be upgraded. we cannot ensure that users return, so we must either deprecate older data after some cutoff or keep both sync models built into hubs for a really long time.

Appendix D: Why not fork a blockchain?

An alternative to building snapchain would be to fork an existing blockchain to have similar properties. We would modify the VM so that the set of opcodes is limited to social actions and modify the transaction model to mirror snapchains rate-limit + pruning approach to metering usage. There are two challenges with this approach:

Sharding - given our tx volume and data size, we're going to need sharding soon. snapchains can be sharded by account easily because transactions are independent across accounts. blockchains have much more complicated sharding systems and we haven't found any that work in production yet. so there's a lot of implementation risk and unnecessary complexity.
Pruning - most chains we've looked at don't really have an easy way to bolt on pruning, or the ability to arbitrarily discard data from points in time cleanly. we would have to do a large refactor that touches most abstractions in the system.

Blockchains are doing a lot of work in both these areas and it is quite possible that in 2-3 years our POV on this has changed. But if we are making a decision today about the best solution for a 5 year time horizon, Snapchain seems like a better bet.

vrypan · 2024-11-25T07:10:31Z

vrypan
Nov 25, 2024

Is there a way to spin up a new node if the snapshot server goes down? Even if it's painful.

4 replies

varunsrin Dec 5, 2024
Maintainer Author

if you're willing to trust a specific node, you can ask the node to produce a snapshot at a specific future block and then download it. you could also do it over m of n nodes if you dont want to trust a specific node. the challenge of course is that this is pretty storage and performance intensive and can be subject to a ddos attack, so its not something that all nodes will expose publicly.

i think a better strategy is to have multiple snapshot servers run by different parties to ensure high availability.

vrypan Dec 5, 2024

I think snapshots can be incremental. This makes a lot of things easier, including storage and generation.
It would be great if we used IPFS CIDs for the file names, even if we don't use IPFS. Again, this means that even if the protocol does not specify IPFS, it is easy for other parties to copy the snapshots on IPFS (the will have the same name/CID, so it's easy for me to know I'm downloading a valid one), and can reduce the load and dependency on a single node.

This comment was marked as spam.

Sign in to view

vrypan Dec 27, 2024

Related: farcasterxyz/hub-monorepo#2458

vrypan · 2024-11-25T07:23:16Z

vrypan
Nov 25, 2024

How is the outcome of the vote translated to addition/removal of a validator? I would expect the vote to happen onchain so that everyone (and the validators themselves) knows who is an active validator.

12 replies

shazow Dec 8, 2024

Is the only time that this repo will be read is when a new vote event occurs? Doesn't this need to get read every time a node gets booted to verify who is a valid member of the permissioned consortium? If this is true, doesn't that prevent nodes from booting if Github is not accessible? Or worse, doesn't this allow an attacker to override the consortium if a committer or ci is compromised?

varunsrin Dec 8, 2024
Maintainer Author

is the only time that this repo will be read is when a new vote event occurs

the validators will all read once from github, agree on the new validator set and then finalize that as metadata in the next epoch block header, which becomes the source of truth.

the github data should still be kept around for full context and should be migrated to the smart contract version in the future, but it is no longer essential for the security of the chain.

shazow Dec 8, 2024

So if validators/nodes get DoS'd and have to reboot, they'll never touch Github?

roninjin10 Dec 8, 2024

What is the complexity being avoided of putting it onchain if it's literally just a one time bootstrapping mechanism that you want to stick around for context? Shouldn't this work?

contract BootstrapingContractV0 is Upgradeable {
    string public constant bootstrapingJson = "...";
}

shazow Dec 9, 2024

Set the owner to a 5 of M safe multisig and that's already better than Github with custom signature checking conventions.

stevenjoe0906 · 2024-12-01T05:58:51Z

stevenjoe0906
Dec 1, 2024

How is the TPS on this architecture. Will it be 5000 or higher?

1 reply

varunsrin Dec 3, 2024
Maintainer Author

Ankarrr · 2024-12-13T03:26:30Z

Ankarrr
Dec 13, 2024

A blockchain transaction is a Farcaster specific transactions that happens on an external blockchain. An example is when Alice makes a transaction to the registry contract to get her farcaster id and set up her app keys. Snapchain nodes listen to and store blockchain transactions in their history.

why still rely on external for account management? Is it unable to create tx types for account management?

5 replies

vrypan Dec 13, 2024

Moving app key management from L2 to the snapchain is something that has been discussed and may be implemented at some point in the future. But I think that for now, it's better not to touch this part, the list of changes is already long enough :-)

Ankarrr Dec 16, 2024

I see that. May I know where could I find the latest discussion about this? I still think if we need to re-build the whole thing, leaving a core part (account management) as an external dependency doesn't sound right to me.

vrypan Dec 16, 2024

Hmm.. I don't remember. It could be in this thread somewhere #193, or on discussions that took place on Farcaster.

androidsixteen Dec 16, 2024

@Ankarrr -- until snapchain has greater security (perhaps comparable to an ETH L2), it may not make sense to move the accounts registry / management to it

There is greater risk to account integrity being compromised by the limited set of block producers in snapchain

varunsrin Dec 19, 2024
Maintainer Author

two reasons:

there is no way to collect fees on snapchain, which is necessary for account creation
snapchain does not have the same level of security. part of the reason its able to reach such high throughput is that it outsources security to the L2s.

androidsixteen · 2024-12-16T18:40:19Z

androidsixteen
Dec 16, 2024

Some questions / comments @varunsrin:

In the example below, we see Alice’s account state changing as she creates an account, adds a post and then deletes it.

Is there a degradation issue if a malicious user creates a bunch of posts and then deletes them? Because these are stored in a Merkle trie, there is the added overhead of recalculating hashes for every level above the post that was removed

This generally works well in a timeline based social network because older posts are rarely revisited and most users are comfortable with the ephemeral behavior. Those who want more permanence can pay for additional storage units or archive data elsewhere.

One thing to consider is that not all posts are made equal. Some older posts are very high quality and should not be culled automatically by the sliding post window. Would be great if there was a way to intentionally flag an old post from being evicted, or perhaps this kicks in once a "banger threshold" has been hit

Apps may need to implement safeguards to protect users from blowing away certain historical data when limits are exceeded.

Isn't this a collective action problem if handled at the app level? Meaning the app that has the worst safeguards causes the user to lose data, even if all other apps have some type of standard for notification / safeguards.

A shard chain must have at least three validators and store all relevant account state.

Are the shard validators a subset of global validators? Or are they distinct entities?

In the example below, odd numbered accounts are assigned to one shard and even ones to the other.

Will say it once again -- this is such a cool property of Farcaster! Because there is no contention between accounts, you can "pack" shards much more efficiently than any market / app-driven sharding scheme. Love this quality!

A shard is not produced in time for the block — block production continues. If they fail to produce a shard for an entire epoch, the chain is halted.

Which chain is halted? The shard or the entire snapchain? Asking because if the latter, somebody can just target 2 validators within the same shard to cause global liveness failure (rather than 4 total validators / 36% of total set)

Blockchain (OP Mainnet) reorgs — TBD, need to consider this in some detail and should also cover the simpler case of chain stalls.

Agree that sequencer liveness failure is the bigger concern. With potential rollbacks from fraud proofs, would it make sense to store accounts that are still within the challenge window in a distinct way from accounts that have been "finalized"? This way eviction or graceful re-submission of external blockchain txs is easier?

Shard leader excludes a user’s transactions

How would the network know if this is happening? If this requires on 1/3 shard validators inspecting their mempool and noticing censorship, that's a pretty tough whistle to blow (meaning even well intentioned folks wouldn't know to check for this unless there's some active monitoring of mempools built in)

Do we need some form of cross shard communication for usernames? Can we bring farcaster usernames into the Snapchain system?

Would be better to have a "name shard" and specialize concerns IMO -- similar to Polkadot in that certain chains can be "common goods parachains" that handle a global service like name registry or bridged asset escrow / minting

Voters must replace one of their number every 6 months through a majority vote.

What does this mean? Does it mean replace one of the voters?

5 replies

vrypan Dec 16, 2024

One thing to consider is that not all posts are made equal. Some older posts are very high quality and should not be culled automatically by the sliding post window. Would be great if there was a way to intentionally flag an old post from being evicted, or perhaps this kicks in once a "banger threshold" has been hit

I think that ordering is helping here. The current (in production) design allows to "resurrect" (same hash, same date) a pruned message (not a deleted one), by re-submitting it. As far as I can tell snapchain will allow this too, with the added bonus that we practically have a FIFO queue and a "resurrected" message is re-posted to the top of the heap. If all the above are correct (@varunsrin to confirm), it means we can have a service (or many services) where we flag messages to be preserved and it automatically resurrects them when they are pruned.

Which is great... could it be used as an attack vector? Example:

I have submitted 10k casts in two years, and only have one storage unit. This means that 5k casts are pruned for sure.
Can someone censor me by re-submitting these 5k casts every time I cast something new, causing my new cast to be pruned?

androidsixteen Dec 16, 2024

Oh interesting -- if you have a "nonce" on an account (thanks to ordering), you can basically bump any message by resending it with the current nonce to keep it within the storage window's allowance

How would a validator know whether the old post was valid and pruned vs. being a new post?

I think your attack vector could be solved by only permitting services that have app keys registered or messages that are explicitly signed by the user to "resurrect" an old post. Otherwise, it would fail if submitted by an attacker

vrypan Dec 17, 2024

I don't think there's an account nonce, but you could consider the block height as such, in a way.

Currently hubs will accept any valid message whose hash is not in their DB. When you delete a message, there is a tombstone message with the hash of the original one, so you can't resubmit the original. But when it's pruned, the hash is no longer there, so you can resubmit it.

varunsrin Dec 19, 2024
Maintainer Author

Is there a degradation issue if a malicious user creates a bunch of posts and then deletes them? Because these are stored in a Merkle trie, there is the added overhead of recalculating hashes for every level above the post that was removed

this is where per account rate limits come in, to prevent users from flooding the network with messages that cause disk thrash. its not yet clear what they need to be set to, depends on the performance metrics we're able to achieve on testnet.

One thing to consider is that not all posts are made equal. Some older posts are very high quality and should not be culled automatically by the sliding post window. Would be great if there was a way to intentionally flag an old post from being evicted, or perhaps this kicks in once a "banger threshold" has been hit

This can be implemented at the app level - when you run into your limits, apps an automatically select some less engaging cast to remove preventing you from crossing your limit, or they can just stop you from posting and ask you to manually delete stuff.

Isn't this a collective action problem if handled at the app level? Meaning the app that has the worst safeguards causes the user to lose data, even if all other apps have some type of standard for notification / safeguards.

Yes, I think that's the tradeoff of this design. The alternative is to simply stop a user from posting when they reach their limits, but a badly designed app might simply just encode logic to delete the oldest cast, so it may not be much of an improvement.

Are the shard validators a subset of global validators? Or are they distinct entities?

Initially the validators of shard 0 will be the chain/global validators as well since its not too hard to do both. At some point, this will have to change and there will be distinct global validators.

Which chain is halted? The shard or the entire snapchain? Asking because if the latter, somebody can just target 2 validators within the same shard to cause global liveness failure (rather than 4 total validators / 36% of total set)

In the early days when we have only 2 shards, it's catastrophic even if one shard fails so there isn't. much of a distinction. We have to figure out how to avoid halts entirely, which is something that L2's have figured out quite well.

Agree that sequencer liveness failure is the bigger concern. With potential rollbacks from fraud proofs, would it make sense to store accounts that are still within the challenge window in a distinct way from accounts that have been "finalized"? This way eviction or graceful re-submission of external blockchain txs is easier?

Yeah we still need to figure this out in the implementation. It's messy.

How would the network know if this is happening? If this requires on 1/3 shard validators inspecting their mempool and noticing censorship, that's a pretty tough whistle to blow (meaning even well intentioned folks wouldn't know to check for this unless there's some active monitoring of mempools built in)

If you know its happening to you, you can just push a message to the mempool and tell others to check that it's there. It's pretty easily testable if its gross censorship.

Would be better to have a "name shard" and specialize concerns IMO -- similar to Polkadot in that certain chains can be "common goods parachains" that handle a global service like name registry or bridged asset escrow / minting

We're exploring a model where we push name resolution to a service outside the chain and just ingest the relevant names at some cadence (once a day?). @sanjayprabhu is writing this up separately.

androidsixteen Dec 20, 2024

@varunsrin thanks for the thorough answers!

serkanmax · 2025-01-08T19:41:44Z

serkanmax
Jan 8, 2025

Suggestion for "FIP: Snapchain" Implementation
To address gossip failures in syncing nodes, have you considered integrating an adaptive gossip protocol (e.g., Plumtree or HyParView)? Such protocols dynamically manage peer selection, ensuring resilience against network partitions while reducing overhead. This may improve both real-time consistency and recovery time for nodes that fall behind.

Additionally, a clearer fallback mechanism for leader failures during block production, such as introducing a "grace block buffer," could minimize disruption in shard leadership rotations without halting epochs.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIP: Snapchain #207

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 27 replies

{{title}}

{{title}}

{{title}}

This comment was marked as spam.

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

FIP: Snapchain #207

varunsrin Nov 25, 2024 Maintainer

Abstract

Problem

Specification

1. Accounts

2. Transactions

3. Account State

4. Blocks

5. State Rent

6. Sharding

7. Sync

8. Handling Failures

9. Open Questions

Rationale

Release

Appendix

Appendix A: Consensus System

Appendix B: Extensions

Partial Sync

Appendix C: Why is sync hard today?

Appendix D: Why not fork a blockchain?

Replies: 6 comments · 27 replies

varunsrin Dec 5, 2024 Maintainer Author

This comment was marked as spam.

varunsrin Dec 8, 2024 Maintainer Author

varunsrin Dec 3, 2024 Maintainer Author

varunsrin Dec 19, 2024 Maintainer Author

varunsrin Dec 19, 2024 Maintainer Author

varunsrin
Nov 25, 2024
Maintainer

Replies: 6 comments 27 replies

varunsrin Dec 5, 2024
Maintainer Author

varunsrin Dec 8, 2024
Maintainer Author

varunsrin Dec 3, 2024
Maintainer Author

varunsrin Dec 19, 2024
Maintainer Author

varunsrin Dec 19, 2024
Maintainer Author