Proposal to Reinstate the Uptime Mission in the Shielded Expedition Incentivised Testnet

Background

Disclaimer: Due to the link restrictions we have provided a link to the proposal with all the necessary evidence and links so you can read there and share your comments here on the forum.

This is the community’s response, led by Chainflow, Cosmic Validator, L0vd, Stakeup, Stakepool, Daniel from Keplr, Crouton Digital, Encipher, Nodeify and 2pilot, to the recent announcement on the Namada Forum dated May 16th:

We (validators/pilots/contestants of the Namada Shielded Expedition event) disagree with the removal of the entire uptime category, which was announced and happened after the conclusion of the Namada Shielded Expedition. We believe this change should be revoked for several reasons:

  1. Changing rules post-competition undermines the integrity and fairness of the Shielded Expedition event.
  2. Pilots relied on the initial uptime missions and point system to guide their strategies and efforts.
  3. Removing the uptime category after the fact retroactively disadvantages pilots who performed well under the original criteria.
  4. Such changes can erode trust in the organizing body and discourage future participation.

Moreover, we feel that insufficient evidence was provided to justify removing the category. Uptime is the most crucial baseline metric for assessing validator performance, and no competition can effectively measure a validator’s performance without it. We plan to discuss this further in the Arguments section and provide our counterarguments to the reasoning presented by the SE host team for removing the Uptime category.

Proposal

We request the Namada team to reinstate the uptime mission in the Shielded Expedition event.

This will ensure:

  1. Fairness in the competition and the event
  2. Consistency in the rules, maintaining the original terms
  3. Recognition and reward for the most diligent participants based on their efforts
  4. Accurate rankings, providing a reliable tool for selecting validators for the mainnet

Arguments

Upon registration, all participants are informed of the rules and task specifics, and they agree to them by entering the competition. We believe that no competition should change its rules after it has ended, especially right before the final results are announced and without formal recourse to debate the unilateral decision. This undermines the integrity and fairness of the competition, those who hold the power to make and set the rules, while devaluing the efforts of the most diligent participants.

Considering all of the above, we will examine the key arguments provided by the SE host team, which they found sufficient to justify removing the vital uptime performance category and changing the rules after the competition had ended.

Argument 1: “A limited set of participants were able to compete for the uptime missions & only 2 post genesis validators achieved the uptime task and they are outliers.”

Our counterarguments:

  • When correctly counting uptime for post-genesis validators (e.g., excluding the first 2 epochs), a total of 13 post-genesis validators completed the uptime task.
  • Overall, 55 pilots achieved the uptime task, similar to those who completed the governance task, which the SE host team decided to keep in the final rankings.
  • 257 participants deployed validator nodes and competed for the uptime task, with 21.4% succeeding. This stat demonstrates that this mission is challenging yet far from impossible.
  • The entire point of testnets such as the Shielded Expedition is to surface the validators with the dedication and skill to perform well, differentiating them from the crowd. To draw an analogy, this argument is similar to saying that since only one team won the soccer match, the match was disqualified, since both teams didn’t win.

Argument 2: ‘There was a client bug that caused a significant number of validators to be jailed (node kept restarting with no way to stop it).’

Our counterarguments:

  • There was a way to fix this issue, and that’s why 55 validators completed the uptime tasks.
  • This bug resulted in some downtime. However, with proper monitoring, alerts, redundancy setup, and backups, no jailing event would have happened, as proven by the 55 validators who were unaffected and completed the uptime task.
  • Ways to avoid being jailed included but were not limited to spinning up another node and restoring from a snapshot, running Horcrux, or debugging with docker.
  • This has no relation to validators. The so-called ‘client bug’ was with the RPCs, or if certain commands on a machine with low specs were executed. It is a matter of the operator and the server, not of the “bug” itself
  • We only found several reports of such a bug happening (~4-7). These cases are very rare and should be classified as outliers.
  • The SE host team didn’t provide any evidence to prove the number of participants who were impacted by the “silent bug” or the post-mortem confirming that it was impossible to avoid getting jailed, something tied into the SDK and their validator address. The testimony of some validators suggest it was actually possible to avoid jailing: ‘During upgrades I had to resync once because of some issue (don’t remember exactly what it was, maybe restarts) and the other time I used a backup node to avoid losing blocks’
  • This bug was never reported again after the Hardfork upgrade.

Argument 3: ‘There was another bug that prevented unjailing for a month.’

Our counterarguments:

  • There are cases where inexperienced pilots misinterpreted the reason for jailing. They confused the double-signing event where all their stake was slashed with the downtime jailing. As a result, they incorrectly assumed the reasons why unjailing was impossible.
  • To be affected by the bug, validators had to not properly update their CometBFT keys or allow downtime and get jailed.
  • This bug didn’t affect the competition results. Regardless of whether the validator encountered the “unjailing for a month” bug or not, it was impossible to complete an uptime mission for anyone who was already jailed for reasons due to operator, rather that code, error(s).
  • This bug was fixed after the Hardfork upgrade.

Argument 4: ‘Post-genesis validators received delayed information about restarts or upgrades.’

Our counterarguments:

  • Namada team members and pre-genesis validators provided frequent updates to post-genesis validators. E.g. Bengt provided frequent updates:
    • Announced upcoming restarts in the shielded-expedition discord channel on 5th Feb at 11:51, on 6th Feb, on 9th Feb at 16:36.
    • Created GitHub issue to reduce to 1k NAAN to join the active set and other important updates.
  • There is no evidence in the post-genesis validators’ chat history of major complaints or discussions regarding a lack or delay of information.
  • There is evidence that post-genesis validators are participating in various upgrades and restarts.
  • When counting correctly, 13 post-genesis validators managed to complete Uptime missions. This would only be possible if they received timely information about the upgrades/restarts.

Argument 5: ‘Only the 257 validators in the active set of the SE could compete for the Uptime mission’

Our counterarguments:

  • It is clear and obvious that only validators in the active set are signing blocks and can have uptime metrics
  • The distribution of NAAN was not due to a bug, all validators received the amount of NAAN allocated from the Anoma Foundation. The Anoma Foundation and the SE host team were aware about this distribution of NAAN since the beginning and the Uptime mission and maintained the NAAN distribution when the SE started
  • There were not ‘thousands’ of validators trying to get into the 257 slots validators set, in fact for most of the SE the active validator set was not even full, meaning the real and active post-genesis validators had the same chances of staying in the active set as pre-genesis validators

Argument 6: ‘Approximately 3 DoS were discovered during the SE. These DoS vulnerabilities caused the validator to freeze with a single query. In addition, this vulnerability was a bug that could be triggered externally or internally by the namada protocol.’

Our counterarguments:

  • Externally your validator should not have any open connections to the outside other than p2p . no rpc no api , no grpc , etc . It is like knowing the alphabet for validator to close rpc port and avoid such issues. It can happen on any cosmos chain, some chains even do not delegate to validators with open rpc.
  • For internally if that happens not only your validator will crash but all others. If randomly triggered internally, it can affect the attacker as well.

Our additional counterargument to reinstate the Uptime mission:

During a Validators Circle call on February 21st, hosted by the Namada SE team, all potential uptime challenges with post-genesis validators were discussed .This discussion took place within the 43-46 minute segment of the call. It was noted that the genesis validators might have a slight advantage due to the calculation approach. However, the team emphasized that this approach was initially chosen for the task and reassured that the S-class tasks would ensure overall balance.

Despite recognizing the uptime challenges for post-genesis validators, the Namada SE hosts did not find compelling reasons to alter the competition rules during the event. There was no official proposal to remove the uptime missions, reflecting the team’s stance that the initial rules and score calculation setups remain appropriate.

Conclusion:

As a result, in addition to the decision to change the rules, unilaterally, AFTER the competition ENDED, we did not find any arguments compelling enough to justify removing the entire uptime category and believe that the host team overstated their significance in the competition. While bugs in the testnet made task completion more challenging, they did not directly impact the results. Moreover, the number of participants who completed the task further supports our arguments.

Summary (TL;DR)

We (validators/pilots/contestants of the Namada Shielded Expedition event) disagree with the removal of the entire uptime mission which happened after the conclusion of the Namada Shielded Expedition. Changing rules post-competition undermines trust in the project and devalues participants’ efforts. We propose reinstating uptime missions for fairness, competition integrity, and accurate rankings. In our proposal, we outlined our reasons for reconsidering the removal of uptime missions and responded to key arguments in favor of it.

Proposal prepared and supported by:

Chainflow, Cosmic Validator, L0vd, Stakeup, Stakepool, Daniel from Keplr, Crouton Digital, Encipher, 2pilot, & Nodeify

13 Likes

I fully support the re-instantiation of up-time mission and the arguments laid down in the discussion are well thought and reasoned

3 Likes

Team of pro-nodes75 validator support the proposal to return points for uptime.
Thanks to the team of participants for making this suggestion! We too were somewhat puzzled by the removal from the main validator success metric from the competition leaderboard.
I am sure the Namada team will be able to dialog on this issue.

3 Likes

X-Posting my own thoughts for visibility here.

Not everyone affected by the restart bug experienced significant issues. Operators with a backup node and proper monitoring would be promptly alerted to the problem and could switch to the running node, minimizing any disruptions. The fact that 100 people did not take these precautions does not mean that the issue was unfixable.

From the outset, the SE-100 cohort was intentionally kept small to facilitate initial operations and reduce the need for extensive coordination. Despite the PvP nature of this testnet, some SE-100 participants posted updates in the general channel, demonstrating engagement. Additionally, some postgen validators successfully met the uptime metrics, proving that maintaining these standards was achievable. On a personal note, I had requested the team to stop using @here in Discord. While it’s important to retrospectively acknowledge the team’s misses, it is unjust to punish participants who had no control over these oversights. Recognizing past mistakes is valuable, but punishing the participants for issues beyond their control is not a reasonable approach.

This design flaw has been present from the beginning, and retroactively punishing participants for it is unfair. Instead, I propose taking a broader approach by calculating misses from the moment a validator posted the create-validator transaction to the marked end block of the testnet. However, this raises questions about when to determine eligibility, especially if a validator joined in later epochs like 10, 30, or 40. There was significant incentivization for this task, leading to the bounty being completed due to this motivation. Namada received prompt responses from validators during the testnet. Validators cannot submit more tasks once the testnet has ended, nor should the rules be changed after the fact.

In most jurisdictions, it is generally considered unethical and potentially illegal to change the rules of a contest after it has ended in a way that prevents winners from claiming their prizes. Contest rules form a binding agreement between the organizer and the participants.

3 Likes

Thanks for taking the initiative @ChainflowPOS.
Support from us as well.

4 Likes

The proposal posted entails many non-factual claims, and so warrants a response. And frankly, I would expect better from some of the signators on this proposal.

I also will not engage in extended debate around this. This will more or less be my one-time reply, and it should be taking as such.

Phychain and lankou wanted to co-sign this. If anyone else wants to put their name on as support, lmk and I will edit to add.

A significant point of order:

There is no place for formal proposals governing the shielded expedition. The document from these testnet-validators must therefore be seen as informal input, not a “proposal”, and not be given more weight than any other input.

Addressing unfactual claims in this proposal:

  1. “Argument 3”: The jailing bug was very real and confirmed by team. It was not for the reasons speculated by the “proposal”-authors, and is well attested as a genuine bug which affected both post-genesis and genesis validators significantly (you can ask amadison if you don’t believe me). As the “proposal”-authors state, it was fixed by updated code applies at hardfork. This in itself proves it was a tangible bug.
  2. “Argument 4”: “There is no evidence in the post-genesis validators’ chat history of major complaints or discussions regarding a lack or delay of information.” This is an outright lie, and there is plenty of evidence of complaints of the same. We have been complaining about this persistently before Gavin took over comms. Further, the particular event where chain was halted, we were told there would not be any restart for at least two hours (remember, this was after days of being on-edge waiting for a restart) and then suddenly everyone (shielded-100) spun up network within half an hour from coordination in the dedicated channel the rest of us don’t have access too. This event specifically lead to a number of validators getting jailed and also specifically lead to gavin taking over communications and zen being included in se-100 channel to prevent this from happening again. The allegations made here are put in bad faith and have no basis in what actually happened.
  3. “Argument 5”: There is some merit to some of the assertions made here, but they are not key to the discussion at hand. Some aspects are being omitted, such as the initial naan distribution being so skewed that genesis validators are for all practical purposes guaranteed a place in the active set, while everyone else (crew and pilots alike) would have to fight over the remaining 150 spaces. While I agree this has not been a big problem practically during most of the expedition, even one or two epochs where one would be below threshold, would severely limit the possibility to be included in the award-winning categories. Therefore, this is not a minor issue given 95 and 99 percent thresholds.
  4. A few other items could be mentioned, namely that there were some real issues along the testnet that required restarts, bad performance of the client, stuck blocks in validator etc, but these are not key either.
  5. One item that is key for a minor group is the bug around change-consensus-key which led to a situation of jail and inability to unjail before the convert-key feature was included in CLI.
  6. Regarding the argument that if one got jailed, one would automatically not meet the strictest criteria, this is also a disingenuous argument: There are 91 epochs in the shielded expedition testnet (0-90). For a post genesis validator that’s 89 epochs total where they could be part of the active set. The unjail mechanism works in such a way that if you go to jail, you’ll be in jail for a minimum of 2 epochs given that the mechanism works as intended and you unjail appropriately. Which as we know it did not for this testnet. But for the sake of the argument, yes that means that if one gets jailed for any reason, then one would not meet the 99% criterion. However, to meet the 95% criterion, one could have as little as 86.5 epochs in active set for genesis (two jailing periods possible), and 84.6 out of 89 epochs for post-genesis (also two jailing periods possible of two epochs each). In this way, it is simply not true that if one got jailed on the one time and the unjail-mechanism worked, then one would automatically fail both these reward-categories.

Let me also briefly address the repeated claim about “when counting correctly”, as that refer to my calculations that you all are using to draw inferences from:

  1. There is no “correct” way to count this. My main concern has been whether team has mapped all validator-TMs correctly, and Fraccaman states they have. This surprised me but will have to take that at face value.
  2. As for the calculations themselves, mine are very simply based on recorded signatures in the Namadexer, divided by total blocks (subtracting first two blocks for post-genesis, and doing some filtering logic to make sure noone is credited for any double-signs or such). I am aware (as per Daniel-keplr’s well-made comments on uptime-counting for instance) that there are probably much better ways to do so, and I can’t say if my way is better than team’s, simply because we have so little information.

With regards to the inferences drawn from these assertions, the following logical fallacy need to be addressed:

  1. That some validators managed to avoid being jailed etc, does not necessarily mean everyone else could have done the same (it also does not mean the opposite). This is self-evident and will not be argued further. In other words, validators could have real problems that you guys did not experience, and it does not necessarily have to be because you are better or smarter.
  2. Rules must be upkept regardless of what happens during testnet. This is also pretty self-evident. I am for upkeep of rules, but if circumstances change, it can (but does not have to) warrant changes in terms of competition and rewards structure.

As for some of the inferences made in the document, I want to just briefly remind the authors that terms and conditions of the Shielded Expedition gives organizing team very wide powers in what they can change. Just wanted to also get that in here, though I do believe this power should be used as sparsely as possible.

Gavin said: ‘I’ve asked for more details about the bug to share here, because I don’t know there were some validators that were not affected’ and ‘i don’t have a post mortem to refer to, and the thread (⁠inconnu) is quite a saga… i’ve asked Brent if recalls what the problem was’

There is no post-mortem that reveals that it was literally impossible to avoid jailing, meaning something tied into the SDK and their validator address.

Where is the evidence? I went through all the chat history in the shielded-expedition channel and I couldn’t find any evidence of complaints, anyone can verify this for themselves. If there was so much evidence you would have shared it in your reply rather than just saying ‘there is plenty of evidence’ and then showing no evidence.

Here you confirm our counterargument as a post-genesis validator that it was not a problem to remain in the active set of the SE. As mentioned, it was claimed that there were over 10,000 validators applications, but the truth is that there were less than 257 active validators in the SE, including pre and post genesis. Therefore, post-genesis validators didn’t have to ‘fight’ to be in the active set, since for most of the SE the active set was not even full, so all active post-genesis validators could stay in the active set. This truth is very different from claims that Gavin made ‘the active set had 256 slots and there were thousands of Pilots with the same number of tokens that had to find ways to get enough NAAN to compete with one another to get into and stay in the active set’. No, there weren’t ‘thousands’ of post-genesis validators competing to get in the active set of the SE, there were only around 100 active post-genesis validators and the active set of the SE was mostly never even full during the SE.

Changing consensus looks like the last thing I would do on my main validator. If I had do this, I would test it on secondary validator or on different chain before attempting in production. Also this was a testnet, so such things were expected to break.
If all validators are running the same binaries and there is really a bug all should be affected. If only a minority is affected it is likely that the reasons for jailing are not due to a ‘bug’ but rather mistakes done by the validator operator or bad setups.

You confirm that if there was a jailing event then achieving 99% uptime was already not possible as we already said in our counterarguments. Moreover, we also said that there are cases of validators who got jailed for downtime and then unjailed and they could only achieve 95% uptime confirming that achieving 99% was not possible. Therefore, you cannot claim that the unjail bug affected some validators’ ability to achieve the 99% uptime mission since you said also yourself that it wasn’t already possible to achieve even without the unjail bug. So any discussions about removing uptime missions should be specifically about the 95% uptime mission and not the 99% uptime mission.

Anyone with the data available can verify that when not counting the first two epochs since post-genesis validators cannot join within the first two epochs, a total of 13 post-genesis validators achieved the Uptime mission, rather than only 2 as calculated incorrectly by the SE host team.

There is no post-mortem that reveals that it was literally impossible to avoid jailing, meaning something tied into the SDK and their validator address. We all were in the same boat. So unless there is a github revision link on the code that affects only subset of validator adresses this is not an argument.
Until a post-mortem proves the opposite it was possible to avoid being jailed. If some validators got jailed it was due to no monitoring or alerts, low specs servers or errors that other validators didn’t do. The SE is a competition to reward the best validators, not the validators making errors that lead to jailing.

On the 21st Feb. during a validator call the arguments presented now to remove the Uptime mission were discussed also then, and since then there were no new arguments or evidence. It was decided then that there wasn’t enough evidence or justification to remove the Uptime mission and there is no new arguments or evidence since then.

2 Likes
  1. I’m not denying the existence of a restart bug, I did not experience it but I would like to see deterministic failure steps and that it was impossible to fix. Were other backup nodes deployed to debug it quickly? How fast was the reaction time after missing blocks? Was it immediate or delayed by three hours?
  2. If you had monitoring like Tenderduty, you would immediately know if the chain was moving and if you were missing blocks. We all knew which binary was coming or some took the initiative to check GitHub. However, the timing wasn’t communicated quickly to everyone. It would have been clear what to do if the chain started moving, even if the timing wasn’t precise. Additionally, there were retrospective faults in communication. SE-100 had no choice and often relayed information when communication was lacking. I requested to stop using @here because if you weren’t online, you wouldn’t be pinged.
  3. This design was clear from the outset. People joined the competition knowing this in advance. By joining, you agreed to the rules.
  4. Of course, there were issues; it’s a testnet. We all dealt with the same challenges.
  5. Changing the consensus key on a validator is risky. We all had the same binaries. Was it not tested first on another validator? If you highly prioritize uptime, ask yourself: Would I do this on mainnet before testing?
  6. Achieving 99% was impossible if jailed, and one of the arguments for its removal is being proposed.

You cannot completely rely on the team to get everything perfect; at some point, self-reliance needs to take charge.

3 Likes

Okay, so before I express my opinion on the matter, two things:

  1. I was not part of the jailfest, and
  2. My ranking on the leaderboard will not be affected whether uptime gets reinstated or not.

I wanted to point this out to show that I’m, competition-wise, quite unbiased when it comes to this particular case.


This would have been a good write-up, if it didn’t contain false statements and one-sided or skewed perceptions of the whole debacle.

First, let me say that this is not the first post-competition change that was made. Many of us were especially afflicted by the re-evaluations that happened, so this overall undertone that reinstating this particular case would act like some sort of redeemer of the “competition’s fairness”, “consistency of rules”, “for the trust in the project in general” or “accurate rankings” is just silly. If there was one thing that came as a surprise it would’ve definitely be the S Class re-evaluations. Not this uptime removal since there were countless times where this became a point of discussion in Discord and validator circles.

Second, please don’t make maintaining uptime sound like it’s the hardest thing in the world. Most of you act as if you need to be behind your server 24/7. There were far harder S classes than maintaining uptime. This comes from someone who had 96% uptime (I got below the threshold at some point and didn’t get timely notice of chain starts/upgrades at the beginning of the competition).

The first real uptime problem that surfaced was that coordination only happened from within SE-100 and no one bothered to keep us in the loop. This happened 2-3 times before comms got adjusted to Gavin (I also at that time received the role to “spy”). This was clearly a fault of the team to not include every pilot and we had to deal with that disparity. There’s actually enough evidence of this in the chats. The fact that no one in SE-100 even bothered to keep us posted back then, already speaks volumes. I guess there’s no time to be altruistic during a competition even if it would’ve been the fair thing to do, right?

I remember it was said that we’d get a minimum of a 24 hours notice to make sure everyone had enough time to prepare. This was unfortunately multiple times not the case. Imagine this to happen during a mainnet? This would’ve been a tremendous error from the team’s side, especially if it caused validators to get jailed.

Now, let’s talk about the jailfest. Back then many got jailed for a variety of reasons. I won’t go into them all. The team will probably be better to relay such info (concerning the bug). I just want to shed more light on one particular case, namely the confusion around change-consensus-key. I know how many had to learn about that command the hard way before a PR of Cosmostation got implemented. One would have assumed this command to automatically change the keys around. Unfortunately it did not, they started to miss blocks after the 2 epoch pipeline and voilà they got jailed.

Now okay, whether the jailing incident was a fault of the validator or not, it didn’t mean you were unable to reach a minimum of 95% uptime. This post seems to treat 99% and 95% as equals, which they’re not.

In my opinion, the real problem arised when no one was able to unjail anymore. This was very real and not due to getting “tombstoned” or whatever is being insinuated. Back then if you were afflicted by it you had to deal with the uncertainty of the situation: “Do I start a new validator or do I wait for the team to fix this? But, I don’t have enough NAAN…crap. My uptime is getting ruined.”

Here we already started talking bout the potential of removing the uptime mission or at least invalidating this period of time due to the rising level of unfairness of the situation (which grew ever since the lack of proper coordination).


Now, I understand there’s a lack of fairness for all pilots involved. It kind of sucks that everything went this way. But we could at least admit that the odds were in favor of the pre-genesis pilots. Especially for the 99% task (for reasons expressed by a plethora of people already).

The 95% task was perhaps more feasible if:

  1. we conclude that this whole jailfest happening was the fault of the operator, plus
  2. we don’t take the first two epochs into account for post-genesis.

But how much ROIDs would that get you?

I’m not so sure if that’s worth the fight.

You are an example that it was very possible for post-genesis validators to avoid getting jailed and achieving the uptime task. You actually achieved the 95% uptime task so it does affect your total ROID points. Hence, you are indeed biased since you are arguing against this proposal despite negatively affecting you. There is a bias for you to do this.

The vulnerabilities and protocol & cryptography improvements S class tasks were NOT removed. They were simply re-evaluated, not entirely removed from the competition. Re-evaluating a category and totally removing a category are two very different things.

It was discussed for example in the validator call on the 21st Feb. from 43-46 minutes and it was decided that there wasn’t enough evidence or justification to remove the Uptime mission, and since then there wasn’t any new arguments or evidence to reconsider the decision made then to maintain the Uptime mission.

The uptime missions give around 1.7 billion ROID points. This is more than several S class tasks like RPCs, relayers or explorers, meaning than in terms of the impact on the ranking and the final results the Uptime mission is one of the most significant and relevant across all the A, B, C and S missions. And yes, indeed achieving the Uptime task was done by many, 21.4% of the 257 active set validators achieved the Uptime mission.

From @kw1knode previous answer: ‘if you had monitoring like Tenderduty, you would immediately know if the chain was moving and if you were missing blocks. We all knew which binary was coming or some took the initiative to check GitHub’.

Bengt and several SE-100 validators provided timely updates in the shielded-expedition channel, evidence of this is provided above in the counterargument for Argument 4.

We provided in the counterargument for Argument 4 specific evidence with day and specific times when Bengt for example provided updates in the shielded-expedition channel, where is the ‘enough evidence’ that you claim?

Sharing again my reply above: ‘Changing consensus looks like the last thing I would do on my main validator. If I had do this, I would test it on secondary validator or on different chain before attempting in production. Also this was a testnet, so such things were expected to break.’

No, removing the Uptime mission is treating them as equal. The 99% Uptime mission is confirmed that it was not possible to achieve after a jail event, independently of the unjail bug, therefore any potential decision from Gavin to remove Uptime mission should had only considered the 95% Uptime mission, not the 99% Uptime mission.

Starting a new validator was not an option because the uptime was counted only for the validator accounts registered when the SE started.

The 99% Uptime mission gives around 1 billion ROIDs and the 95% Uptime mission gives around 0.7 billion ROIDs. For a comparison, 0.7 billion ROIDs is more or similar than the S class tasks of RPCs, relayers or explorers. Also, 0.7 billion ROIDs is similar to the whole (90% and 99%) governance mission which gives around 1 billion ROIDs.

1 Like

I already calculated it for myself. My ranking won’t change.

I’m talking about post-competition changes and how this is being used as an argument for “fairness” and a whole bunch of other ethical reasons. Re-evaluating two categories is btw even worse than removing it altogether cause it selectively put participants in a good or bad spot.

I can’t comment on any of this, that’s for the team to decide.

Putting my words out of context. I’m speaking of the overall sentiment being portrayed concerning uptime. Many speak about it as if they had to sacrifice a ton to keep their uptime on point. And I can believe some did, but most use it as fuel. The time and effort for some S classes and actually building something was in my experience far more challenging.

You know that someone actually took a release off Github and updated. And it just so happened that the team said that release was getting skipped due to it containing a bug? It should never be in the hands of the validator to upgrade on their own. There’s a reason why the chain is build on consensus. And for consensus to work communication is key.

I was part of the convos back then. Here’s one, but there’s a bunch since it was a problem we had to face and even made many of us already doubtful of how things got coordinated (I think most of the convos were in #se-community-support).

That’s fair. I agree that this was a problematic command to try out. I myself was lucky enough to do this after seeing Cosmostation’s PR and running his PR on a local net.

So removing 95%, but keeping 99%? So the other way around? Then to put it in terms of fairness: why did post only receive ‘handicaps’ for competing in this mission? Or let’s say only pre received special treatment? You know, if the coordination was fair from the start I’d give you this one, because I do get what you mean by this. But it didn’t, kept leaning towards SE-100 getting the front seat and making an already imbalanced task even more crooked.

Ah okay. Then no one was able to fix their uptime.

Thanks for the breakdown!


Btw, pff I’d be tired to write this much Hector! I see you’re very invested in this. The reason I replied is to also show the other side of the tale. Back then I was helping others in Namada and had to experience and deal with the problems that started to unfold when we didn’t get any communication whatsoever. It felt for many of us as if the team didn’t care about post-pilots (and crew members).

The decision to remove the Uptime mission was done with ONLY one side of the tale, in fact your side, that Gavin presented to the Anoma Foundation. This is just not right, to say that you are the victim and presenting your version here, when the real victims are those who got the Uptime mission removed because of an incorrect and misleading version of the tale presented to the Anoma Foundation. This proposal is actually to show the more accurate version of the facts related to Uptime.

But the ranking of your post-genesis friends yes. So although with the removal of Uptime you lose the 0.7 billion ROIDs from the 95% Uptime mission you have a bias to support this removal because it doesn’t affect your ranking while benefits your post-genesis friends.

But not all changes are the same, re-evaluating a category still maintains the category and the ROIDs for this category. Removing entirely a category also removes all the ROIDs from that category. The re-evaluation greatly benefitted you since only a few vulnerabilities were approved including yours so the points are much less diluted. I never heard you complaining that the re-evaluation was worse than removing a category after you were one of the approved vulnerabilities post re-evaluation.

Timeline:

19/02 18:08 se-announcements, Gavin announces the release in the general se-announcements channel, mentions that he will inform about the restart plan ideally later today or may be tomorrow

19/02 21:01 shielded-expedition channel, Adrian informs in the shielded-expedition general channel the instructions and that the restart is ongoing

19/02 21:01 se-100 announcements, Adrian shares the same message at the same time 21:01 in the se-100 announcements channel

19/02 22:09 se-announcements, Gavin informs in the se-announcements general channel that the chain is making blocks at 22:09

19/02 22:30, Zenode in the se-100 channel missed the previous messages and announcements from Adrian and Gavin in the se-announcements and the shield-expedition general channels and incorrectly complains about delayed information for post-genesis validators at 22:30

19/02 22:35-22:41, several SE-100 validators complaining about the lack of announcement and just seeing the restart:

2 Likes

Please don’t rush with your replies and keep it short. You simply miss the arguments and cherry picking what you like and what suits your point. Zen in the screenshot mentions that there is an 1 hour difference between public and private announcement, so he did not miss previous messages. I understand that uptime task is your biggest achievement during whole SE and you feel frustration, that’s okay, just read others messages carefully, they also have a point.

It interesting how you conveniently forget messages in you timeline, that give context to Zen’s reaction in se-100 :

  • awa announced after the previous restart that there would be a 24h notice period before any relaunch. Was this respected ? No
  • Adrian message states that there is already 50%, which means that several pregen have already restarted/patched. Because you all synced through the hackmd file and the spreadsheet, without us knowing. Also, Adrian posted in the expedition channel, not in announcements, which means that very few people did see the message (you can see that there are only 2 emojis on it, compared to tons of them on Gavin announcement a bit later)

But it seems to be a habit of omitting relevant stuff in your argument, so let’s put another record straight "

  • There is no evidence in the post-genesis validators’ chat history of major complaints or discussions regarding a lack or delay of information."
    Discord

Now, I know from experience of debating with you in Discord that this is always the same kind of half truth to fit a specific side.
So I’ll go directly to my conclusion to not spend 2h digging for old messages in discord to prove you wrong :
I get it, pregenesis are pissed that their specific tasks are removed, we are pissed that most of the security and protocol that would allow post gen to make a comeback and on which we spent a lot of time investigating were cancelled. Every change of rule/task after the start of the competition is messed up.
You want to balance the competition, re-instate all the tasks as they were initially evaluated, including shield apps (and ditch all the crappy IBC shielded transfers CLI). THEN with that we all compete with fair rules, because IT WAS WHAT WAS STATED FROM THE START.

Go pass an exam for anything, if you change the sheet of the exam mid-day, everyone gets screwed over way or the other. Same here

We were affected by both decisions :sweat_smile: (By uptime removal, and for re-evaluation of S tasks)
And i think we should stop debating here, we are not decision makers, same as you.

2 Likes

Okay I’m only going to respond this one time because the amount of word bending and changing what I’ve said is mind-boggling.


I never said I am a victim, nor presenting myself in such a way. This is actually a problem to see it in such a light. One who puts him or herself as a victim and present it in such a manner will not be able to look at anything objectively. This post was mostly aimed at bending the narrative to SE-100’s perspective, my response was to shed some more light on the other. Mine nor the OP’s was a complete, and in that sense, accurate version of the whole story.

I have friends that are in SE-100 who would not like that this would be removed. Again, the other side of the tale also needs to be addressed in order for this to become an all-encompassing proposal to process objectively.

This is a matter of perspective, because those who got disapproved basically also had all their work nullified due to changing criteria, outside of their own wrongdoing. And about the latter you said about me…you know how often I tried for this to not go through? EVEN after I kept my approval. Many know this of me, I think I even pissed off Gavin at some point for how emotional I became. Because it’s worse for a project if the community gets stabbed like this. What value does any one of us gain if a project potentially flunks due to community treatment?

…you understand that I said this because of seeing those two announcements and the time difference between the two posts, right? This was actually when we tried to adapt to better comms, but the team still continued to only communicate in SE-100 first and announce separately.

On another note, you also saw how my words were received and responded to in that screenshot?


Hector, I’ll be honest. This is getting approached way too subjectively and I can’t keep replying to this if this is what the back and forth looks like. I’ll refrain from talking not to dilute this forum and to give room for others to share their views on the matter.

The timeline I provided is announcements made by Gavin and Adrian both in the announcements channels se-100 and se-announcements, and also in the se-100 channel and shielded-expedition channel. But if you really want to talk only about the announcements channels, on the 19/02 the only announcement on the se-100 announcements channel was at 21:01. However, in the se-announcements channel there were announcements at 18:08 and 22:09. And 18:08 is around 3h earlier than 21:01, so if you want to talk only about the announcement channels, post-genesis/general got the announcement around 3h earlier than the se-100 announcement.

Gavin already announced at 18:08 the release and provided the link, so you were aware and even if some were still not aware the chain hadn’t started yet and there was time to upgrade without missing any block.

Gavin posted at 18:08 in the se-announcements channel so people were aware and should had been ready and monitoring for the restart that Gavin said at 18:08 in the se-announcements channel that ideally would happen that day meaning within a few hours

07/02 at 19:12 se-100 announcements, no 24h notice for pre-genesis pilots

07/02 at 21:22 se-100, after some time debugging the first blocks
image

07/02 at 21:30 se-announcements, a few minutes later Awa announcement in se-announcements, a few blocks missed doesn’t lead to downtime jail or affect much the uptime %

07/02 22:06 shielded-expedition, shortly after Awa announcement post-genesis validators also signing blocks
image

1 Like

From the very beginning, I have disagreed with the policy of allowing outside participants, especially those who have never run a public testnet before. This decision, coupled with the availability of only 100 seats, has caused some consternation among us about the current policy and future policies. If the main goal was to test aspects other than nodes, such as developers and bug-finders, the team should have organized a separate event. For example, they could invite white hat hackers with an allocation of 20 million tokens, while the other 10 million tokens are allocated for node runners. In my opinion, this would be much fairer. Also, if the Nebb platform gave real-time rewards based on the XP earned by participants, instead of being fixed as it is now, then the system would be fairer. All in all, I rate this testnet as a complete mess, and quite frankly, I’m very disappointed with the way things are going. That’s all from me, thank you.

1 Like

The unjustified removal of the Uptime missions in the last minute is having a major impact and influence in the final rankings. For example, Emberstake achieved governance and the 95% uptime mission, but missed the 99% uptime mission because of an early downtime jail. Because the SE host team removed the Uptime mission while keeping governance, they put unfairly Emberstake in the top 10 while putting other validators like ourselves out of the top 10 where we should be with the Uptime missions:

Furthermore, by removing the Uptime missions in the last minute without any solid and proven evidence, they also put even outside of the top 25 several validators that were in the top 25 such as Crouton Digital, Keplr, Kintsugi, P2P, DSRV, Nodeify, deNodes:

They also put out of the top 100 by removing the Uptime missiong without justification the following validators: Lavender Five, Chainflow, Stakecito, Swiss Staking, Stakin: https://namada.net/shielded-expedition/pilot-rankings

To understand the seriousness and significance of this, you can see that the size of the prize for the top 10, top 11-25 or top 26-100 is hugely different:

@cwgoes @adrian @awa Additional counterargument to reinstate Uptime missions:

Argument: ‘Approximately 3 DoS were discovered during the SE. These DoS vulnerabilities caused the validator to freeze with a single query. In addition, this vulnerability was a bug that could be triggered externally or internally by the namada protocol. Trying to calculate uptime is a nonsense.’

Counterargument: Externally your validator should not have any open connections to the outside other than p2p . no rpc no api , no grpc , etc . and for the internaly if that happen not only your validator will crash but all others. If randomly trigerred internally, it can affect the attacker as well. It is like knowing the alphabet for validator to close rpc port and avoid such issues. It can happen on any cosmos chain, some chains even do not delegate to validators with open rpc.
There are many validators in the SE who do not even know these basic things. Additionally, they do not have the ability to protect validators from tx spamming storms. They don’t know what validator architecture is and they don’t know how to take action. Even if a node was frozen due to a bug, they should have taken action such as monitoring the risk situation and switching the validator to an appropriate spare node. They didn’t do anything, they just got jailed. I’m afraid they are joining the mainnet. Whenever a problem occurs in the network, they will hinder the normal validators and make their work difficult.

In addition to all the above, there are still obvious bugs and errors in the current Nebb calculations as evidenced by this example:
The only difference from cryptosjnet old and new score is the 10.41B from s6 (62.5B/6=10.41B) and uptime removed, because in old score he had s1, s2, s3, all C tasks, all B tasks and all A tasks except update steward commission. So if we remove s6 from new score, that is equal to old score - uptime (5.11 billion ROIDs). So adding uptime should be equal to old score of 6.39B ROIDs, if not equal could be less because of s1-s3 dilution but never more as in the current nebb. Although incorrectly counted, in the old score uptime was calculated as 30 achieving 99% and 44 achieving 95% so that’s (31.25/30=1.041) and 44 for 95% (31.25/44=0.71), so 1.041+0.71= 1.75, adding 1.75B to 5.11B is 6.86B which is more than old score of 6.39B, something is really off here showing bugs in the Nebb calculations. This is just one example used because it is so clear and evident to see but many other errors for sure in current Nebb



Here is first screenshot with old cryptosjnet score, including s1, s2, s3 and all A, B, C tasks, he just missed update steward commission. In second screenshot, it is the same score, just with uptime removed (-1.75B) and plus 10.41B from s6. So, 6.39B -1.75B +10.41B= 15.05B, but in current Nebb it says ‘15.52B’ roids. And it should actually be less than 15.05B, because the s1, s2, s3 in old score were from 29th March db update, but with more s1, s2, s3 approved until 11th April ROIDs are diluted so score should be less than 15.05B but it shows 15.52B, something is really off and affecting all validators, this is just one example 100% clear of the bug.

I support the proposal to return the uptime points for Validators who meet the criteria.

Even though I was one of the Validator nodes that encountered problems from the very beginning and could not unjail, this issue could have been resolved if my operations were more stringent. For the Validators who were able to maintain uptime despite various incidents during the past Testnet, it is because of your dedication and high standards, and this is something that you truly deserve.

One thing I would like to see improved is the communication, which often has been unclear or not notified in advance for various activities that occurred in the system.

Additionally, this Testnet has been a great experience for me to work alongside every Validator nodes. It’s not often that we encounter such situations in other networks.

Thank you,
POR | ContributionDAO Team

2 Likes