Proposal to Reinstate the Uptime Mission in the Shielded Expedition Incentivised Testnet

Thanks Chainflow and all other contributors for putting this proposal together. We at Citadel One are aligned with your reasoning and are in support of the initiative.

We’d like to take this chance to encourage the Namada team to conduct a thorough review of rules in the testnet. It is our opinion that the testnet could have benefited greatly from more transparency and clarity regarding participation and submission rules as we consider this crucial to retain and attract new contributors in the future.

Regardless of the outcome here, we hope that we can use this as a learning opportunity for future events as we consider clearer guidelines and increased transparency important to maintain and increase contributions to Namada.

2 Likes

@ChainflowPOS and others, thanks for your attention to this matter. I’m finding that there’s too much written in this proposal and the topic thread for me to afford the attention that the matter deserves, so please consider replying clearly and concisely to the assertions we’ve made in the topic I’ve posted here: Concerns about uptime mission removals

I’ve also included post-mortem for the issues mentioned in Assertion 1 (the primary reason for our proposal to remove the uptime missions) here: Concerns about uptime mission removals - #5 by Gavin

1 Like

(Sharing reply here also for visibility)

Pruned passing the cometbft data directory through cosmprund, which makes it a little lighter. With the cosmprund I would say that the data can be lightened, which is what LavenderFive did.
Everything together doesn’t weigh much. I think the SE2 dir weighed less than 20GB when the SE ended.

Nodeify shared discord evidence of someone posting how they grabbed snap on 2/19 to get back up. Not much else to say.

Link here to all the evidence and screenshots gathered from 19/02 from the Shielded expedition channels shielded-expedition, se-community support and se-100, covering and organised in 6 sections summarized here below:

  1. Resyncs fixing issue: several validators discussing how they managed to fix issues

  2. Increasing timeout fixing issue: Citadel advises a solution to increase timeouts to fix issue, several validators thanked him since it fixed the issues, and this was BEFORE the patch release was announced

  3. Issues fixed with resync from snapshot: validators confirming that the issue was fixed by resyncing from snapshot

  4. Validators with low specs VPS cloud server having issues regarding the timeouts: Citadel, Gavin and Adrian saying if machine isn’t fast enough then operators have timeout issue and advice to migrate to faster machine

  5. Validators having issues because of lack of knowledge and inexperienced, low specs VPS, or being late for restarts, and suggesting better to wait for the more skilled se-100 to get chain up first:
    -Pretoro claims he uses VPS and asked what is a bare metal server, when Gavin says timeout issue related to not fast machine, pretoro says his machine is very powerful (not true, since running in cloud VPS as self-declared himself)
    -Pretoro worries and asked if upgrade will be easy ‘install and run’ or complicated and self-declares as non-skilled
    -Pretoro suggests to better to wait for the SE-100 validators to get network up and running
    -Pretoro was not paying attention and late for the upgrade, and Labisque tagged him and asked if he was sleeping. Pretoro comes back and rushed to get up to date with what’s going on, being late for the upgrade

  6. Cosmostation provides tool for post-gen to check liveness missing: seeing the inexperience of many post-genesis validators, Cosmostation provides a tool to easily check liveness missing similar to Tenderduty

For all the evidence and screenshots of the above 6 sections check the link included above, didn’t add all screenshots here to keep it brief and more clear.

And about the ‘unjail bug’ presented as being the reason for not being able to achieve the Uptime mission:
There are 91 epochs in the shielded expedition testnet (0-90), and for post-genesis 89 epochs since not counting the first 2 epochs when they cannot be part of the active set. The jail period for downtime is minimum 2 epochs which means that it was already not possible to achieve the 99% Uptime mission, even if normal unjail was possible and there was no unjail bug. This is because over 88.11 epochs was the minimum to get the 99% uptime, but after a jail event, the maximum epochs possible is 87 epochs, lower than 88.11 epochs. So the unjail bug was not a reason for not achieving the 99% Uptime mission.
Therefore any potential discussion about Uptime missions should be only for the 95% uptime and the 99% Uptime mission should always remain. Can calculate 95% Uptime after removing Feb 19-March 26 period if that’s the only issue.

2pilot team fully supports the proposal of retaining uptime mission.
Unless there was some issue with the sdk that prevented only a subset of validators addresses to properly sign blocks ( which could be easily proved with github link to the culprit revision ) there is no reason to remove this task.

  • Unjailing bug could not be an excuse as if you got jailed it won’t be possible to get 99% uptime
  • DDOS attacks, either external or internal that could freeze the node were possible to spot with proper monitoring and solve simply by restarting the node. I encountered this several times myself, but this didn’t stop me from achieveing 99% uptime.
  • Restarts after network halt could be mitigated by doing resync or using snapshots that were provided by the community.

All those issues could happen in mainnet and pottential validators should be ready to deal with them. If no one cared about recovering network halts and were focused on s class tasks shielded expedition could be still not finished.

Maintaining 99% uptime was challenging but achievable with extra effort. Similar to other tasks with few successful completions due to external issues, such as S5 or S6, we believe the uptime task should be retained. If it is removed or rewarded separately from the main competition, the same logic should be applied to other problematic tasks to preserve integrity.

1 Like

We understand the logic behind the removal of uptime to recognize the few participants in the Shielded Expedition. However, we must support the proposal to reinstate it for the aforementioned reasons of fairness, consistency, and recognition of efforts during the Expedition. We believe that contributions and performance should be fairly rewarded in proportion to the effort put in.

I have had a number of replies to my (co-signed) post above, and as I said, I will not be engaging in further debate. It is remarkable however, that a number of the same disingenuous arguments keep being repeated instead of taking into account the very real counterarguments offered in amongst others my post above. It’s impossible to have any meaningful dialogue this way.

You argue that there was a bug that caused several validators to get jailed, however it wasn’t impossible to avoid getting jailed, many validators found ways to maintain uptime and fix the issue without getting jailed. Gavin ackowledged this and there are plenty of testimonies, evidence on discord and proven by many achieving the 99% uptime mission.

In contrast, the faucet was broken for an extended period of time and many validators couldn’t vote on many proposals and were waiting for Adrian to send them some tokens. So for all these validators it was actually impossible to vote on many proposals.

Summary:
Uptime: Possible to avoid getting jailed due to the restart bug → yet some validators were not skilled enough to fix the issue and avoid getting jailed → Uptime mission removed

Governance: Impossible to vote on several proposal and achieve 99% governance participation since faucet broken → Governance mission kept