Even though this post says exactly the thing that most Proper Analysts will say, and write long LinkedIn posts about where other Proper Analysts congratulate them on standing up for Proper Analysis in the face of Evil And Stupid Business Dummies who just want to make bad decisions based on too little data, it's wrong. The Jedi Bell Curve meme is in full effect on this topic, and I say this as someone who took years to get over the midwit hump and correct my mistaken beliefs.
The business reality is, you aren't Google. You can't collect a hundred million data points for each experiment that you run so that you can reliably pick out 0.1% effects. Most experiments will have a much shorter window than any analyst wants them to, and will have far too few users, with no option to let them run longer. You still have to make a damned decision, now, and move on to the next feature (which will also be tested in a heavily underpowered manner).
Posts like this say that you should be really, REALLY careful about this, and apply Bonferonni corrections and make sure you're not "peeking" (or if you do peek, apply corrections that are even more conservative), preregister, etc. All the math is fine, sure. But if you take this very seriously and are in the situation that most startups are in where the data is extremely thin and you need to move extremely fast, the end result is that you should reject almost every experiment (and if you're leaning on tests, every feature). That's the "correct" decision, academically, because most features lie in the sub 5% impact range on almost any metric you care about, and with a small number of users you'll never have enough power to pick out effects that small (typically you'd want maybe 100k, depending on the metric you're looking at, and YOU probably have a fraction of that many users).
But obviously the right move is not to just never change the product because you can't prove that the changes are good - that's effectively applying a very strong prior in favor of the control group, and that's problematic. Nor should you just roll out whatever crap your product people throw at the wall: while there is a slight bias in most experiments in favor of the variant, it's very slight, so your feature designers are probably building harmful stuff about half the time. You should apply some filter to make sure they're helping the product and not just doing a random walk through design space.
The best simple strategy in a real world where most effect sizes are small and you never have the option to gather more data really is to do the dumb thing: run experiments for as long as you can, pick whichever variant seems like it's winning, rinse and repeat.
Yes, you're going to be picking the wrong variant way more often than your analysts would prefer, but that's way better than never changing the product or holding out for the very few hugely impactful changes that you are properly powered for. On average, over the long run, blindly picking the bigger number will stack small changes, and while a lot of those will turn out to be negative, your testing will bias somewhat in favor of positive ones and add up over time. And this strategy will provably beat one that does Proper Statistics and demands 95% confidence or whatever equivalent Bayesian criteria you use, because it leaves room to accept the small improvements that make up the vast majority of feature space.
There's an equivalent and perhaps simpler way to justify this, which is to throw out the group labels: if we didn't know which one was the control and we had to pick which option was better, then quite obviously, regardless of how much data we have, we just pick the one that shows better results in the sample we have. Including if there's just a single user in each group! In an early product, this is TOTALLY REASONABLE, because your current product sucks, and you have no reason to think that the way it is should not be messed with. Late lifecycle products probably have some Chesterton's fence stuff going on, so maybe there's more of an argument to privilege the control, but those types of products should have enough users to run properly powered tests.
None of that seems at all user-hostile to me, it's literally all aimed at making sure what the user is shown is more likely to actually be useful to them.
I guess this is a big and probably unbridgeable divide, some people think this sort of thing is obviously evil and others, like me, actually prefer it very strongly over a world where all advertising is untargeted but there is massively more of it because it's so much less valuable...
There are a couple important things to also keep in mind:
First: just like there can be individuals who lift up an entire team but are not ticking off tasks themselves, there can be apparently individually productive team members who slow the entire team down for any of a number of reasons. In my experience it's usually either that they are a) fast but terrible, b) have really weird coding styles that might not be bad but are very difficult for others to work with (architecture astronauts and people who barely speak the language often fall here), or c) are process bullies who try to set up the entire review system to enforce their preferences in a hardline way that suits them but delays everyone but them. Each needs to be dealt with very differently, to varying degrees of success, but my honest opinion at this stage is that no matter how productive these people seem by themselves it's mostly harmful to have them on a team. Behavioral issues in senior people tend to be really tough to adjust, and take a lot of energy from a manager that is better spent helping your top performers excel; that said, if you can get them to adjust sometimes it's worth the effort.
Second: pair programming works great for some people, but it is terrible for others. I've measured this on teams by trial and error over fairly long periods, and unfortunately it's also the case that people don't segment neatly based on their preferences, so the obvious "let them choose" method isn't ideal. There are pairs of people who really love pair programming and desperately want to do it all the time who are almost fully 2x as productive when split apart instead (yes, including downstream bugs and follow-ons, meaning that they really are just having two people do the job of one), and there are people who hate pairing who see similar multiples by being forced into it even though they hate it. My rough intuition is that there are two independent variables here, a social factor that measures how much you enjoy pairing, and a style factor that determines how much benefit you derive from it, with the two not correlating much at all. There might even be a slight anticorrelation, because the more social people who love it have already naturally done as much knowledge sharing as is helpful, and the people who hate it are underinvested there and could use some more focus on the fact that they're part of a team.