Background
Disclaimer: Due to the link restrictions we have provided a link to the proposal with all the necessary evidence and links so you can read there and share your comments here on the forum.
This is the community’s response, led by Chainflow, Cosmic Validator, L0vd, Stakeup, Stakepool, Daniel from Keplr, Crouton Digital, Encipher, Nodeify and 2pilot, to the recent announcement on the Namada Forum dated May 16th:
We (validators/pilots/contestants of the Namada Shielded Expedition event) disagree with the removal of the entire uptime category, which was announced and happened after the conclusion of the Namada Shielded Expedition. We believe this change should be revoked for several reasons:
- Changing rules post-competition undermines the integrity and fairness of the Shielded Expedition event.
- Pilots relied on the initial uptime missions and point system to guide their strategies and efforts.
- Removing the uptime category after the fact retroactively disadvantages pilots who performed well under the original criteria.
- Such changes can erode trust in the organizing body and discourage future participation.
Moreover, we feel that insufficient evidence was provided to justify removing the category. Uptime is the most crucial baseline metric for assessing validator performance, and no competition can effectively measure a validator’s performance without it. We plan to discuss this further in the Arguments section and provide our counterarguments to the reasoning presented by the SE host team for removing the Uptime category.
Proposal
We request the Namada team to reinstate the uptime mission in the Shielded Expedition event.
This will ensure:
- Fairness in the competition and the event
- Consistency in the rules, maintaining the original terms
- Recognition and reward for the most diligent participants based on their efforts
- Accurate rankings, providing a reliable tool for selecting validators for the mainnet
Arguments
Upon registration, all participants are informed of the rules and task specifics, and they agree to them by entering the competition. We believe that no competition should change its rules after it has ended, especially right before the final results are announced and without formal recourse to debate the unilateral decision. This undermines the integrity and fairness of the competition, those who hold the power to make and set the rules, while devaluing the efforts of the most diligent participants.
Considering all of the above, we will examine the key arguments provided by the SE host team, which they found sufficient to justify removing the vital uptime performance category and changing the rules after the competition had ended.
Argument 1: “A limited set of participants were able to compete for the uptime missions & only 2 post genesis validators achieved the uptime task and they are outliers.”
Our counterarguments:
- When correctly counting uptime for post-genesis validators (e.g., excluding the first 2 epochs), a total of 13 post-genesis validators completed the uptime task.
- Overall, 55 pilots achieved the uptime task, similar to those who completed the governance task, which the SE host team decided to keep in the final rankings.
- 257 participants deployed validator nodes and competed for the uptime task, with 21.4% succeeding. This stat demonstrates that this mission is challenging yet far from impossible.
- The entire point of testnets such as the Shielded Expedition is to surface the validators with the dedication and skill to perform well, differentiating them from the crowd. To draw an analogy, this argument is similar to saying that since only one team won the soccer match, the match was disqualified, since both teams didn’t win.
Argument 2: ‘There was a client bug that caused a significant number of validators to be jailed (node kept restarting with no way to stop it).’
Our counterarguments:
- There was a way to fix this issue, and that’s why 55 validators completed the uptime tasks.
- This bug resulted in some downtime. However, with proper monitoring, alerts, redundancy setup, and backups, no jailing event would have happened, as proven by the 55 validators who were unaffected and completed the uptime task.
- Ways to avoid being jailed included but were not limited to spinning up another node and restoring from a snapshot, running Horcrux, or debugging with docker.
- This has no relation to validators. The so-called ‘client bug’ was with the RPCs, or if certain commands on a machine with low specs were executed. It is a matter of the operator and the server, not of the “bug” itself
- We only found several reports of such a bug happening (~4-7). These cases are very rare and should be classified as outliers.
- The SE host team didn’t provide any evidence to prove the number of participants who were impacted by the “silent bug” or the post-mortem confirming that it was impossible to avoid getting jailed, something tied into the SDK and their validator address. The testimony of some validators suggest it was actually possible to avoid jailing: ‘During upgrades I had to resync once because of some issue (don’t remember exactly what it was, maybe restarts) and the other time I used a backup node to avoid losing blocks’
- This bug was never reported again after the Hardfork upgrade.
Argument 3: ‘There was another bug that prevented unjailing for a month.’
Our counterarguments:
- There are cases where inexperienced pilots misinterpreted the reason for jailing. They confused the double-signing event where all their stake was slashed with the downtime jailing. As a result, they incorrectly assumed the reasons why unjailing was impossible.
- To be affected by the bug, validators had to not properly update their CometBFT keys or allow downtime and get jailed.
- This bug didn’t affect the competition results. Regardless of whether the validator encountered the “unjailing for a month” bug or not, it was impossible to complete an uptime mission for anyone who was already jailed for reasons due to operator, rather that code, error(s).
- This bug was fixed after the Hardfork upgrade.
Argument 4: ‘Post-genesis validators received delayed information about restarts or upgrades.’
Our counterarguments:
- Namada team members and pre-genesis validators provided frequent updates to post-genesis validators. E.g. Bengt provided frequent updates:
- Announced upcoming restarts in the shielded-expedition discord channel on 5th Feb at 11:51, on 6th Feb, on 9th Feb at 16:36.
- Created GitHub issue to reduce to 1k NAAN to join the active set and other important updates.
- There is no evidence in the post-genesis validators’ chat history of major complaints or discussions regarding a lack or delay of information.
- There is evidence that post-genesis validators are participating in various upgrades and restarts.
- When counting correctly, 13 post-genesis validators managed to complete Uptime missions. This would only be possible if they received timely information about the upgrades/restarts.
Argument 5: ‘Only the 257 validators in the active set of the SE could compete for the Uptime mission’
Our counterarguments:
- It is clear and obvious that only validators in the active set are signing blocks and can have uptime metrics
- The distribution of NAAN was not due to a bug, all validators received the amount of NAAN allocated from the Anoma Foundation. The Anoma Foundation and the SE host team were aware about this distribution of NAAN since the beginning and the Uptime mission and maintained the NAAN distribution when the SE started
- There were not ‘thousands’ of validators trying to get into the 257 slots validators set, in fact for most of the SE the active validator set was not even full, meaning the real and active post-genesis validators had the same chances of staying in the active set as pre-genesis validators
Argument 6: ‘Approximately 3 DoS were discovered during the SE. These DoS vulnerabilities caused the validator to freeze with a single query. In addition, this vulnerability was a bug that could be triggered externally or internally by the namada protocol.’
Our counterarguments:
- Externally your validator should not have any open connections to the outside other than p2p . no rpc no api , no grpc , etc . It is like knowing the alphabet for validator to close rpc port and avoid such issues. It can happen on any cosmos chain, some chains even do not delegate to validators with open rpc.
- For internally if that happens not only your validator will crash but all others. If randomly triggered internally, it can affect the attacker as well.
Our additional counterargument to reinstate the Uptime mission:
During a Validators Circle call on February 21st, hosted by the Namada SE team, all potential uptime challenges with post-genesis validators were discussed .This discussion took place within the 43-46 minute segment of the call. It was noted that the genesis validators might have a slight advantage due to the calculation approach. However, the team emphasized that this approach was initially chosen for the task and reassured that the S-class tasks would ensure overall balance.
Despite recognizing the uptime challenges for post-genesis validators, the Namada SE hosts did not find compelling reasons to alter the competition rules during the event. There was no official proposal to remove the uptime missions, reflecting the team’s stance that the initial rules and score calculation setups remain appropriate.
Conclusion:
As a result, in addition to the decision to change the rules, unilaterally, AFTER the competition ENDED, we did not find any arguments compelling enough to justify removing the entire uptime category and believe that the host team overstated their significance in the competition. While bugs in the testnet made task completion more challenging, they did not directly impact the results. Moreover, the number of participants who completed the task further supports our arguments.
Summary (TL;DR)
We (validators/pilots/contestants of the Namada Shielded Expedition event) disagree with the removal of the entire uptime mission which happened after the conclusion of the Namada Shielded Expedition. Changing rules post-competition undermines trust in the project and devalues participants’ efforts. We propose reinstating uptime missions for fairness, competition integrity, and accurate rankings. In our proposal, we outlined our reasons for reconsidering the removal of uptime missions and responded to key arguments in favor of it.
Proposal prepared and supported by:
Chainflow, Cosmic Validator, L0vd, Stakeup, Stakepool, Daniel from Keplr, Crouton Digital, Encipher, 2pilot, & Nodeify