Hacker News

The Llama 4 herd

2025-04-0518:331235658ai.meta.com

We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE)…

Show article

Post-training a model with two trillion parameters was a significant challenge too that required us to completely overhaul and revamp the recipe, starting from the scale of data. In order to maximize performance, we had to prune 95% of the SFT data, as opposed to 50% for smaller models, to achieve the necessary focus on quality and efficiency. We also found that doing lightweight SFT followed by large-scale reinforcement learning (RL) produced even more significant improvements in reasoning and coding abilities of the model. Our RL recipe focused on sampling hard prompts by doing pass@k analysis with the policy model and crafting a training curriculum of increasing prompt hardness. We also found that dynamically filtering out prompts with zero advantage during training and constructing training batches with mixed prompts from multiple capabilities were instrumental in providing a performance boost on math, reasoning, and coding. Finally, sampling from a variety of system instructions was crucial in ensuring that the model retained its instruction following ability for reasoning and coding and was able to perform well across a variety of tasks.

Scaling RL for a two trillion parameter model also required revamping our underlying RL infrastructure due to its unprecedented scale. We optimized the design of our MoE parallelization for speed, which enabled faster iteration. We developed a fully asynchronous online RL training framework that enhanced flexibility. Compared to the existing distributed training framework, which sacrifices the compute memory in order to stack all models in memory, our new infrastructure enabled flexible allocation of different models to separate GPUs, balancing resources across multiple models based on computational speed. This innovation resulted in a ~10x improvement in training efficiency over previous generations.

We aim to develop the most helpful and useful models while protecting against and mitigating the most severe risks. We built Llama 4 with the best practices outlined in our Developer Use Guide: AI Protections. This includes integrating mitigations at each layer of model development from pre-training to post-training to tunable system-level mitigations that shield developers from adversarial users. In doing so, we empower developers to create helpful, safe, and adaptable experiences for their Llama-supported applications.

Pre- and post-training mitigations

For pre-training, we use data filtering in combination with other data mitigations to safeguard models. For post-training, we apply a range of techniques to ensure our models conform to policies that are helpful to users and developers, including the right level of safety data at each stage.

System-level approaches

At the system-level, we have open-sourced several safeguards which can help identify and guard against potentially harmful inputs and outputs. These tools can be integrated into our Llama models and with other third-party tools:

Llama Guard: Our input/output safety large language model based on the hazards taxonomy we developed with MLCommons. Developers can use it to detect whether inputs or outputs violate the policies they’ve created for their specific application.
Prompt Guard: A classifier model trained on a large corpus of attacks, which is capable of detecting both explicitly malicious prompts (Jailbreaks) as well as prompts that contain inject inputs (Prompt Injections).
CyberSecEval: Evaluations that help AI model and product developers understand and reduce generative AI cybersecurity risk.

We’ve heard from developers that these tools are most effective and helpful when they can be tailored to their applications. We provide developers with an open solution so they can create the safest and most effective experiences based on their needs. We’ll also continue working with a global set of partners to create industry-wide system standards that benefit the open source community.

Evaluations and red-teaming

We run systematic testing of models across a wide range of scenarios and use cases in a controlled and repeatable manner. This produces data that we incorporate back into post-training.

We stress test our models using adversarial dynamic probing across a range of topics using automated and manual testing. We’ve made advancements in understanding and evaluating potential model risk. One example of this is our new development of Generative Offensive Agent Testing (GOAT). Using GOAT, we address the limitations of traditional red-teaming by simulating multi-turn interactions of medium-skilled adversarial actors, helping us increase our testing coverage and raise vulnerabilities faster. By adding automation to our testing toolkit, GOAT has allowed our expert human red teamers to focus on more novel adversarial areas, while the automation focuses on known risk areas. This makes the process more efficient and effective, and it enables us to build a better quantitative and qualitative picture of risk.

Addressing bias in LLMs

It’s well-known that all leading LLMs have had issues with bias—specifically, they historically have leaned left when it comes to debated political and social topics. This is due to the types of training data available on the internet.

Our goal is to remove bias from our AI models and to make sure that Llama can understand and articulate both sides of a contentious issue. As part of this work, we’re continuing to make Llama more responsive so that it answers questions, can respond to a variety of different viewpoints without passing judgment, and doesn't favor some views over others.

We have made improvements on these efforts with this release—Llama 4 performs significantly better than Llama 3 and is comparable to Grok:

Llama 4 refuses less on debated political and social topics overall (from 7% in Llama 3.3 to below 2%).
Llama 4 is dramatically more balanced with which prompts it refuses to respond to (the proportion of unequal response refusals is now less than 1% on a set of debated topical questions).
Our testing shows that Llama 4 responds with strong political lean at a rate comparable to Grok (and at half of the rate of Llama 3.3) on a contentious set of political or social topics. While we are making progress, we know we have more work to do and will continue to drive this rate further down.

We’re proud of this progress to date and remain committed to our goal of eliminating overall bias in our models.

While it’s important that models are intelligent, people also want models that can reply in a personalized way with human-like speed. As our most advanced models yet, Llama 4 is optimized to meet these needs.

Of course, models are one piece of the larger ecosystem that brings these experiences to life. We’re focused on the full stack, which includes new product integrations. We’re excited to continue the conversations we’re having with our partners and the open source community, and as always, we can’t wait to see the rich experiences people build in the new Llama ecosystem.

Download the Llama 4 Scout and Llama 4 Maverick models today on llama.com and Hugging Face. Try Meta AI built with Llama 4 in WhatsApp, Messenger, Instagram Direct, and on the Meta.AI website.

This work was supported by our partners across the AI community. We’d like to thank and acknowledge (in alphabetical order): Accenture, Amazon Web Services, AMD, Arm, CentML, Cerebras, CloudFlare, Databricks, Deepinfra, DeepLearning.AI, Dell, Deloitte, Fireworks AI, Google Cloud, Groq, Hugging Face, IBM Watsonx, Infosys, Intel, Kaggle, Mediatek, Microsoft Azure, Nebius, NVIDIA, ollama, Oracle Cloud, PwC, Qualcomm, Red Hat, SambaNova, Sarvam AI, Scale AI, Scaleway, Snowflake, TensorWave, Together AI, vLLM, Wipro.

Read the original article

georgehill

Karma: 9968

@Hacker__News
@hacker._news

Comments

By laborcontract 2025-04-0518:488 reply

General overview below, as the pages don't seem to be working well

  Llama 4 Models:
  - Both Llama 4 Scout and Llama 4 Maverick use a Mixture-of-Experts (MoE) design with 17B active parameters each.
  - They are natively multimodal: text + image input, text-only output.
  - Key achievements include industry-leading context lengths, strong coding/reasoning performance, and improved multilingual capabilities.
  - Knowledge cutoff: August 2024.

  Llama 4 Scout:
  - 17B active parameters, 16 experts, 109B total.
  - Fits on a single H100 GPU (INT4-quantized).
  - 10M token context window
  - Outperforms previous Llama releases on multimodal tasks while being more resource-friendly.
  - Employs iRoPE architecture for efficient long-context attention.
  - Tested with up to 8 images per prompt.

  Llama 4 Maverick:
  - 17B active parameters, 128 experts, 400B total.
  - 1M token context window.
  - Not single-GPU; runs on one H100 DGX host or can be distributed for greater efficiency.
  - Outperforms GPT-4o and Gemini 2.0 Flash on coding, reasoning, and multilingual tests at a competitive cost.
  - Maintains strong image understanding and grounded reasoning ability.

  Llama 4 Behemoth (Preview):
  - 288B active parameters, 16 experts, nearly 2T total.
  - Still in training; not yet released.
  - Exceeds GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks (e.g., MATH-500, GPQA Diamond).
  - Serves as the “teacher” model for Scout and Maverick via co-distillation.

  Misc:
  - MoE Architecture: Only 17B parameters activated per token, reducing inference cost.
  - Native Multimodality: Unified text + vision encoder, pre-trained on large-scale unlabeled data.

By InvOfSmallC 2025-04-0520:345 reply

For a super ignorant person:

Both Llama 4 Scout and Llama 4 Maverick use a Mixture-of-Experts (MoE) design with 17B active parameters each

Those experts are LLM trained on specific tasks or what?

By vessenes 2025-04-0520:488 reply

This was an idea that sounded somewhat silly until it was shown it worked. The idea is that you encourage through training a bunch of “experts” to diversify and “get good” at different things. These experts are say 1/10 to 1/100 of your model size if it were a dense model. So you pack them all up into one model, and you add a layer or a few layers that have the job of picking which small expert model is best for your given token input, route it to that small expert, and voila — you’ve turned a full run through the dense parameters into a quick run through a router and then a 1/10 as long run through a little model. How do you get a “picker” that’s good? Well, it’s differentiable, and all we have in ML is a hammer — so, just do gradient descent on the decider while training the experts!

This generally works well, although there are lots and lots of caveats. But it is (mostly) a free lunch, or at least a discounted lunch. I haven’t seen a ton of analysis on what different experts end up doing, but I believe it’s widely agreed that they tend to specialize. Those specializations (especially if you have a small number of experts) may be pretty esoteric / dense in their own right.

Anthropic’s interpretability team would be the ones to give a really high quality look, but I don’t think any of Anthropic’s current models are MoE.

Anecdotally, I feel MoE models sometimes exhibit slightly less “deep” thinking, but I might just be biased towards more weights. And they are undeniably faster and better per second of clock time, GPU time, memory or bandwidth usage — on all of these - than dense models with similar training regimes.

By zamadatix 2025-04-0520:562 reply

The only thing about this which may be unintuitive from the name is an "Expert" is not something like a sub-llm that's good at math and gets called when you ask a math question. Models like this have layers of networks they run tokens through and each layer is composed of 256 sub-networks, any of which can be selected (or multiple selected and merged in some way) for each layer independently.

So the net result is the same: sets of parameters in the model are specialized and selected for certain inputs. It's just a done a bit deeper in the model than one may assume.

By jimmyl02 2025-04-0521:224 reply

the most unintuitive part is that from my understanding, individual tokens are routed to different experts. this is hard to comprehend with "experts" as that means two you can have different experts for two sequential tokens right?

I think where MoE is misleading is that the experts aren't what we would call "experts" in the normal world but rather they are experts for a specific token. that concept feels difficult to grasp.

By phire 2025-04-061:094 reply

It's not even per token. The routing happens once per layer, with the same token bouncing between layers.

It's more of a performance optimization than anything else, improving memory liquidity. Except it's not an optimization for running the model locally (where you only run a single query at a time, and it would be nice to keep the weights on the disk until they are relevant).

It's a performance optimization for large deployments with thousands of GPUs answering tens of thousands of queries per second. They put thousands of queries into a single batch and run them in parallel. After each layer, the queries are re-routed to the GPU holding the correct subset of weights. Individual queries will bounce across dozens of GPUs per token, distributing load.

Even though the name "expert" implies they should experts in a given topic, it's really not true. During training, they optimize for making the load distribute evenly, nothing else.

By phire 2025-04-063:483 reply

BTW, I'd love to see a large model designed from scratch for efficient local inference on low-memory devices.

While current MoE implementations are tuned for load-balancing over large pools of GPUs, there is nothing stopping you tuning them to only switch expert once or twice per token, and ideally keep the same weights across multiple tokens.

Well, nothing stopping you, but there is the question of if it will actually produce a worthwhile model.

By regularfry 2025-04-0612:431 reply

Intuitively it feels like there ought to be significant similarities between expert layers because there are fundamentals about processing the stream of tokens that must be shared just from the geometry of the problem. If that's true, then identifying a common abstract base "expert" then specialising the individuals as low-rank adaptations on top of that base would mean you could save a lot of VRAM and expert-swapping. But it might mean you need to train from the start with that structure, rather than it being something you can distil to.

By phire 2025-04-0621:461 reply

Yes, Deepseek introduced this optimisation of a common base "expert" that's always loaded. Llama 4 uses it too.

By regularfry 2025-04-077:24

I had a sneaking suspicion that I wouldn't be the first to think of it.

By boroboro4 2025-04-067:35

DeepSeek introduced novel experts training technique which increased experts specialization. For particular given domain their implementation tends to activate same experts between different tokens, which is kinda what you’re asking for!

By jumski 2025-04-068:17

I think Gemma 3 is marketed for single GPU setups https://blog.google/technology/developers/gemma-3/

By idonotknowwhy 2025-04-081:50

> It's not even per token. The routing happens once per layer, with the same token bouncing between layers.

They don't really "bounce around" though do they (during inference)? That implies the token could bounce back from eg. layer 4 -> layer 3 -> back to layer 4.

By mentalgear 2025-04-068:21

So a more correct term would be "Distributed Loading" instead of MoE.

By igravious 2025-04-0612:491 reply

> making the load distribute evenly, nothing else.

so you mean a "load balancer" for neural nets … well, why don't they call it that then?

By lxgr 2025-04-0615:59

Some load balancers are also routers (if they route based on service capability and not just instantaneous availability) or vice versa, but this kind isn't always, to my understanding: The experts aren't necessarily "idle" or "busy" at any given time (they're just functions to be invoked, i.e. generally data, not computing resources), but rather more or less likely to answer correctly.

Even in the single GPU case, this still saves compute over the non-MoE case.

I believe it's also possible to split experts across regions of heterogeneous memory, in which case this task really would be something like load balancing (but still based on "expertise", not instantaneous expert availability, so "router" still seems more correct in that regard.)

By bonoboTP 2025-04-0523:07

Also note that MoE is a decades old term, predating deep learning. It's not supposed to be interpreted literally.

By tomp 2025-04-0521:40

> individual tokens are routed to different experts

that was AFAIK (not an expert! lol) the traditional approach

but judging by the chart on LLaMa4 blog post, now they're interleaving MoE models and dense Attention layers; so I guess this means that even a single token could be routed through different experts at every single MoE layer!

By wrs 2025-04-0616:14

ML folks tend to invent fanciful metaphorical terms for things. Another example is “attention”. I’m expecting to see a paper “consciousness is all you need” where “consciousness” turns out to just be a Laplace transform or something.

By klipt 2025-04-0521:17

So really it's just utilizing sparse subnetworks - more like the human brain.

By philsnow 2025-04-0522:271 reply

The idea has also been around for at least 15 years; "ensemble learning" was a topic in my "Data Mining" textbook from around then.

Meta calls these individually smaller/weaker models "experts" but I've also heard them referred to as "bozos", because each is not particularly good at anything and it's only together that they are useful. Also bozos has better alliteration with boosting and bagging, two terms that are commonly used in ensemble learning.

By lordswork 2025-04-061:21

MOE as an idea specific to neural networks has been around since 1991[1] . OP is probably aware, but adding for others following along, while MoE has roots in ensembling, there are some important differences: Traditional ensembles run all models in parallel and combine their outputs, whereas MoE uses a gating mechanism to activate only a subset of experts per input. This enables efficient scaling via conditional computation and expert specialization, rather than redundancy.

[1]:https://ieeexplore.ieee.org/document/6797059

By Buttons840 2025-04-0520:532 reply

If I have 5000 documents about A, and 5000 documents about B, do we know whether it's better to train one large model on all 10,000 documents, or to train 2 different specialist models and then combine them as you describe?

By vessenes 2025-04-0522:03

well you don't. but the power of gradient descent if properly managed will split them up for you. But you might get more mileage out of like 200 specialist models.

By MoonGhost 2025-04-0619:13

It probably depends on how much A and B overlap. If it's say English sci-fi and Chinese poetry two different models may be better.

By MoonGhost 2025-04-0619:30

> Anecdotally, I feel MoE models sometimes exhibit slightly less “deep” thinking

Makes sense to compare apples with apples. Same compute amount, right? Or you are giving less time to MoE model and then feel like it underperforms. Shouldn't be surprising...

> These experts are say 1/10 to 1/100 of your model size if it were a dense model

Just to be correct, each layer (attention + fully connected) has it's own router and experts. There are usually 30++ layers. It can't be 1/10 per expert as there are literally hundreds of them.

By tomjen3 2025-04-066:41

Cool. Those that mean I could just run the query through the router and then load only the required expert? That is could I feasibly run this on my Macbook?

By faraaz98 2025-04-0522:492 reply

I've been calling for this approach for a while. It's kinda similar to how the human brain has areas that are good at specific tasks

By usef- 2025-04-0523:47

It's already used a lot — the paper I believe is from 1991, and GPT4 among many others is MoE

By randomcatuser 2025-04-0520:59

yes, and it's on a per-layer basis, I think!

So if the model has 16 transformer layers to go through on a forward pass, and each layer, it gets to pick between 16 different choices, that's like 16^16 possible expert combinations!

By mrbonner 2025-04-0523:04

So this is kind of an ensemble sort of thing in ML like random forest and GBT?

By chaorace 2025-04-0520:51

The "Experts" in MoE is less like a panel of doctors and more like having different brain regions with interlinked yet specialized functions.

The models get trained largely the same way as non-MoE models, except with specific parts of the model silo'd apart past a certain layer. The shared part of the model, prior to the splitting, is the "router". The router learns how to route as an AI would, so it's basically a black-box in terms of whatever internal structure emerges from this.

By pornel 2025-04-0520:591 reply

No, it's more like sharding of parameters. There's no understandable distinction between the experts.

By vintermann 2025-04-067:122 reply

I understand they're only optimizing for load distribution, but have people been trying to disentangle what the the various experts learn?

By calaphos 2025-04-069:29

Mixture of experts involves some trained router components which routes to specific experts depending on the input, but without any terms enforcing load distribution this tends to collapse during training where most information gets routed to just one or two experts.

By pornel 2025-04-0612:11

Keep in mind that the "experts" are selected per layer, so it's not even a single expert selection you can correlate with a token, but an interplay of abstract features across many experts at many layers.

By brycethornton 2025-04-0520:50

I believe Mixture-of-Experts is a way for a neural network to group certain knowledge into smaller subsets. AFAIK there isn't a specific grouping goal, the network just figures out what goes where on it's own and then when an inference request is made it determines what "expert" would have that knowledge and routes it there. This makes the inference process much more efficient.

By lern_too_spel 2025-04-0522:14

https://arxiv.org/abs/1701.06538

By qwertox 2025-04-0518:584 reply

Llama 4 Scout, Maximum context length: 10M tokens.

This is a nice development.

By lelandbatey 2025-04-0519:326 reply

Is the recall and reasoning equally good across the entirety of the 10M token window? Cause from what I've seen many of those window claims equate to more like a functional 1/10th or less context length.

By vessenes 2025-04-0519:55

It’s going to take a while to see how good this window is for real use; they’ve used a couple new ideas to get to 10M token context. Right now the only really good long token model out there is Gemini Pro - and its effectiveness does start dropping maybe in the 200k token range. I imagine insiders at GOOG have access to more than the published 1M token range there.

It will be fun to see what we get here, but I have no doubt the extra tokens will be useful - lots of use cases can do almost as well with summary-level accuracy memory.

By littlestymaar 2025-04-0521:12

I read somewhere that it has been trained on 256k tokens, and then expanded with RoPE on top of that, not starting from 16k like everyone does IIRC so even if it isn't really flawless at 10M, I'd expect it to be much stronger than its competitors up to those 256k.

By stitched2gethr 2025-04-0617:06

I very much agree. I've been using Gemini 2.5 pro for coding and I've always given it a simple instruction. Never write comments. It will stop writing them for a time but it's nowhere near the 1M context window.

Now maybe this is more a lack of instruction following than context length but the fact that it works at first and then starts going downhill quickly makes me wary about how much it will pay attention to other details further back in the context.

By jimmyl02 2025-04-0519:46

the needle in a haystack benchmark looks good but at this point I think we need new benchmarks to test actual understanding of content in such a large window.

By MoonGhost 2025-04-0619:45

I think the problem is with positional encoding. If model cannot clearly separate tokens in context window they overlap which leads to mess. That encoding matters and actual position does not.

By Baeocystin 2025-04-0519:432 reply

I assume they're getting these massive windows via RAG trickery, vectorization, and other tricks behind the curtain, became I've noticed the same as you- things start dipping in quality pretty quickly.

Does anyone know if I am correct in my assumption?

By reissbaker 2025-04-0522:12

There's no "RAG trickery" or vector search. They changed the way they encode positions such that in theory they're less sensitive to where the token appears in the string.

That's similar to how previous long-context models worked as well, although the earlier iterations didn't work particularly well, as most have noticed; technically the model "worked" with longer contexts, but it would definitely get dumber. Still too early to tell how this newer variant works, although I'd assume it's at least somewhat better.

By jimmyl02 2025-04-0519:49

the large context windows generally involve RoPE[0] which is a trick that allows the training window to be smaller but expand larger during inference. it seems like they have a new "iRoPE" which might have better performance?

[0]https://arxiv.org/pdf/2104.09864

By aimanbenbaha 2025-04-0523:594 reply

I don't think RAG will survive this time

By inertiatic 2025-04-067:28

4.8b words on English Wikipedia. Knowledge cutoff of 6 months. A valid use case is to search across Wikipedia and ground your answers. Trivially proves that RAG is still needed.

By drusepth 2025-04-060:351 reply

RAG still has lots of benefits for anyone paying per input token (e.g. over APIs).

By azinman2 2025-04-065:001 reply

Not to mention latency

By disgruntledphd2 2025-04-066:41

And grounding for the model. Smaller models with tend to hallucinate a little less (anecdotally).

By acchow 2025-04-067:54

This is only for the small model. The medium model is still at 1M (like Gemini 2.5)

Even if we could get the mid models to 10M, that's still a medium-sized repo at best. Repos size growth will also accelerate as LLMs generate more code. There's no way to catch up.

By gesman 2025-04-0615:21

RAG gets bigger as everyone else gets bigger. Flooding prompts with garbage is not a sound strategy...

By lostmsu 2025-04-0519:082 reply

How did they achieve such a long window and what are the memory requirements to utilize it?

By miven 2025-04-0519:38

According to [0] it's partly due to a key change they introduced in interleaving layers that use standard RoPE positional encodings and layers using what's called NoPE [1], not encoding positions at all and letting the model to figure those out on its own (this exclusively works because the LLMs are autoregressive, so the model can recognize an input token as being the very first by there not yet being any other tokens to attend to, and recursively deriving the position of the subsequent ones from that base case)

[0] https://ai.meta.com/blog/llama-4-multimodal-intelligence/ [1] https://arxiv.org/abs/2305.19466

By clueless 2025-04-0519:173 reply

> Knowledge cutoff: August 2024.

Could this mean training time is generally around 6 month, with 2 month of Q/A?

By jhugg 2025-04-0521:471 reply

I wish my knowledge cutoff was August 2024.

By steenandersson 2025-04-0523:56

This made me LOL louder than I have for a long time! Agree.

By bertil 2025-04-0519:382 reply

Couldn’t you gradually include more recent documents as you train?

By changoplatanero 2025-04-0520:42

You can do that but the amount of incremental data will be negligible compared to the rest of the data. Think of the knowledge cutoff more like a soft value.

By soulofmischief 2025-04-0520:35

That makes it harder to analyze the results of training and draw conclusions for the next round.

By nickysielicki 2025-04-0520:18

It scales depending on the dataset you want exposure on and the compute you have available, so any specific time box is kind of meaningless if you don’t know the rest of the inputs that went into it. The llama 3 paper went into a lot of this and how these decisions were made (see section 3 and onward): https://ai.meta.com/research/publications/the-llama-3-herd-o...

tl;dr: llama 3 was 54 days, but it’s more complicated than that.

By accrual 2025-04-0519:15

Thanks for sharing this here. At first I loved the simple Apache-style directory listing, very classic and utilitarian way to navigate new information. Then I tried clicking the FAQ and it wouldn't load anything until I allowed two different sources of JavaScript.

By ramshanker 2025-04-0521:321 reply

I have a gut feeling, next in line will be 2 or more level of MoE. Further reducing the memory bandwidth and compute requirements. So top level MoE router decides which sub MoE to route.

By jamesblonde 2025-04-068:591 reply

The solution to all problems in computer science is add a new level of indirection (or abstraction).

By brookst 2025-04-0615:55

Except when the solution is to collapse abstraction in the name of efficiency.

By kristopolous 2025-04-0521:062 reply

17B puts it beyond the reach of a 4090 ... anybody do 4 bit quant on it yet?

By reissbaker 2025-04-0522:193 reply

Oh, it'll never run on a 4090. 17B is the active parameter count, not the total param count (and "active" doesn't mean you can slice just those params out and put them on the GPU — which parameters are active constantly changes, even per-token. "Active" just means you get tokens faster than a dense model). It's 109B total parameters, so you'd need at least 54.5GB VRAM just for the weights alone.

A Framework Desktop, Mac Studio, or Nvidia DGX Spark should be able to handle the Scout model locally though... Maybe even at FP8, depending on how much context you need.

By dragonwriter 2025-04-063:24

Well, Scout should run on the rumored 96GB 4090, since it runs on a single 80GB H100. But, yeah, it'd have to be at sub-2bit quantization to run on a standard 24GB.

By lostmsu 2025-04-066:501 reply

Sounds runnable on 2x5090 presumably for $4k if back in stock.

By reissbaker 2025-04-069:24

True! A Framework Desktop or mid-tier Mac Studio would also work and would be cheaper — and you could even run Scout at FP8. A maxed-out Mac Studio could even handle Maverick at FP8, albeit at pretty high cost ($10k).

It's still runnable locally. Just not on a 4090.

By popinman322 2025-04-0523:581 reply

You can swap experts in and out of VRAM, it just increases inference time substantially.

Depending on the routing function you can figure out all the active experts ahead of the forward pass for a single token and pipeline the expert loading.

By boroboro4 2025-04-060:27

Chosen expert (on each layer) depends on the input of previous layer. Not sure how you can preload the experts before forward pass.

By taneq 2025-04-0521:122 reply

Unless something’s changed you will need the whole model on the HPU anyway, no? So way beyond a 4090 regardless.

By littlestymaar 2025-04-0521:211 reply

You can still offload most of the model to RAM and use the GPU for compute, but it's obviously much slower than what it would be if everything was on the GPU memory.

see ktransformers: https://www.reddit.com/r/LocalLLaMA/comments/1jpi0n9/ktransf...

By kristopolous 2025-04-0521:441 reply

I'm certainly not the brightest person in this thread but has there been effort to maybe bucket the computational cost of the model so that more expensive parts are on the gpu and less expensive parts are on the cpu?

By phonon 2025-04-0522:33

Take a look at https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...

By kristopolous 2025-04-0521:16

A habana just for inference? Are you sure?

Also I see the 4 bit quants put it at a h100 which is fine ... I've got those at work. Maybe there will be distilled for running at home

By MR4D 2025-04-073:231 reply

If their knowledge cutoff is 8 months ago, then how on earth does Grok know things that happened yesterday?

I would really love to know that.

By SirMaster 2025-04-0717:031 reply

RAG?

By MR4D 2025-04-0817:40

At that scale? Is that even possible?

By fsndz 2025-04-061:06

Nice release. I see that everyone is playing the differentiation game now: https://medium.com/thoughts-on-machine-learning/llama-4-and-...

By ckrapu 2025-04-0519:0036 reply

"It’s well-known that all leading LLMs have had issues with bias—specifically, they historically have leaned left when it comes to debated political and social topics. This is due to the types of training data available on the internet."

Perhaps. Or, maybe, "leaning left" by the standards of Zuck et al. is more in alignment with the global population. It's a simpler explanation.

By ipsento606 2025-04-0519:4911 reply

I find it impossible to discuss bias without a shared understanding of what it actually means to be unbiased - or at least, a shared understanding of what the process of reaching an unbiased position looks like.

40% of Americans believe that God created the earth in the last 10,000 years.

If I ask an LLM how old the Earth is, and it replies ~4.5 billion years old, is it biased?

By dcsommer 2025-04-0521:053 reply

> 40% of Americans believe that God created the earth in the last 10,000 years.

Citation needed. That claim is not compatible with Pew research findings which put only 18% of Americans as not believing in any form of human evolution.

https://www.pewresearch.org/religion/2019/02/06/the-evolutio...

By Denvercoder9 2025-04-0522:18

The study you're quoting also says that roughly half of the remaining 81% thinks that God has guided human evolution, so it does contradict OP's statement of 40% believing God created the Earth 10,000 years ago at all.

By wat10000 2025-04-062:38

The fact that YEC is incompatible with human evolution doesn’t mean people can’t believe both. Especially since “god guided human evolution” can mean something very different than actual evolution.

By ipsento606 2025-04-0521:121 reply

https://news.gallup.com/poll/647594/majority-credits-god-hum...

By parineum 2025-04-0522:51

Only 3 questions that combine two data points.

There's no way to answer that god created humans in their present form without also saying within the last 10000 years.

This is why polling isn't always reliable. This poll should, at the very least, be two questions and there should be significantly more options.

By averageRoyalty 2025-04-0521:322 reply

40% of Americans is about 2% of the worlds population though.

It's hardly biased, it's stating the current scientific stance over a fringe belief with no evidence.

By EasyMark 2025-04-060:44

I'd be wiling to say that 95% of Americans don't care what the rest of the world thinks about their religious opinions, though? You just need to know the audience for the poll and context. Is it to be consumed by Americans or the entire world?

By reissbaker 2025-04-0523:381 reply

And what percentage of the world's >1B Muslims agree with you? Fundamentalist Christianity may have waned over the last century... But broaden your borders a little bit and I think you'll find Western secular liberalism is hardly the only major world ideology, or even the dominant one.

By littlestymaar 2025-04-068:21

Communist China is secular too, but yes

By casey2 2025-04-061:301 reply

7% of American adults think chocolate milk comes from brown cows. 48% don't know how it's made.

Bias should be the least of your concerns. Focus on a single target, then when you reach it you can work on being more well rounded.

By rafaelmn 2025-04-0610:38

If someone asked me that I would select that option too.

By littlestymaar 2025-04-0521:25

> If I ask an LLM how old the Earth is, and it replies ~4.5 billion years old, is it biased?

It is of course a radical left lunatic LLM.

By Buttons840 2025-04-0521:151 reply

I've wondered if political biases are more about consistency than a right or left leaning.

For instance, if I train a LLM only on right-wing sources before 2024, and then that LLM says that a President weakening the US Dollar is bad, is the LLM showing a left-wing bias? How did my LLM trained on only right-wing sources end up having a left-wing bias?

If one party is more consistent than another, then the underlying logic that ends up encoded in the neural network weights will tend to focus on what is consistent, because that is how the training algorithm works.

I'm sure all political parties have their share of inconsistencies, but, most likely, some have more than others, because things like this are not naturally equal.

By timschmidt 2025-04-068:13

> because things like this are not naturally equal.

Really? Seems to me like no one has the singular line on reality, and everyone's perceptions are uniquely and contextually their own.

Wrong is relative: https://hermiene.net/essays-trans/relativity_of_wrong.html

But it seems certain that we're all wrong about something. The brain does not contain enough bits to accurately represent reality.

By slivanes 2025-04-0520:13

What one believes vs. what is actually correct can be very different.

It’s very similar to what one feels vs. reality.

By ignoramous 2025-04-065:40

> 40% of Americans believe that God created the earth in the last 10,000 years ... If I ask an LLM how old the Earth is, and it replies ~4.5 billion years old, is it biased?

Well, the LLM is not American enough.

Just like there's a whole gamut of cultural/belief systems (for most, rooted in Abrahamic religions & tribes), Zuck claims humanity needs (or whoever he considers human) LLMs that align with people creating/using them (so, it reinforces their own meaning-making methods and not shatter them with pesky scientific knowledge & annoying facts).

By mdp2021 2025-04-0521:421 reply

> If I ask an LLM how old the Earth is, and it replies ~4.5 billion years old

It will have to reply "According to Clair Patterson and further research, the Earth is ~4.5 billion years old". Or some other form that points to the source somewhere.

By knowriju 2025-04-065:182 reply

Pretty sad that the rest of the world needs to pay for the extra tokens because of non-scientific american bias. This is also possibly a big point why countries/regions want sovereign LLMs which will propagate regional biases only.

By vitorgrs 2025-04-066:431 reply

I always like to ask these models who invented the airplanes, because a few countries have their own inventor... So in my opinion, it's a good way to check.

By mdp2021 2025-04-066:46

Very good. If the LLM has to express an opinion, it will have to be its own opinion (after the implementation of intelligence and judgement) - otherwise, it has to explicit the foundations of its statements (certainly not be the "hearsay machine" we have seen).

By mdp2021 2025-04-066:351 reply

It's not a matter of «extra tokens»: it's that the fact, the "summary after the protocols", is what I wrote. It is the correct answer. It's what you should expect from a lucid speaker.

By awestroke 2025-04-066:591 reply

No. That disclaimer implies that there are other likely answers. The age of the earth is completely settled, and has been for a long time. Facts don't care about your feelings.

By mdp2021 2025-04-067:042 reply

You misunderstand it completely, as it is not a matter of feelings. And it is not a disclaimer (which you apparently felt as a disclaimer).

It is a matter of facts. The facts are, that that computation was performed by Patterson and refined by others. This is, as said, what a good reasoner will tell you.

> implies that there

Even if there had never been other attempts to answer that question, the "facts"¹ remains as stated: Patterson computed, followers refined. Without those specifications, the machine will be a "dumb believer" - a "minor". We will not ask for the machine's opinion until it will be intelligent. And when it will be intelligent, it will speak as I said.

> completely settled

Proper science does not work the way you seem to think it work.

¹(And I mean "facts" the way I used it, not the way you used it. I meant "facts recorded as objective" - you meant "information you accepted to believe", which is of course very far from facts and may happen to be adherent to the state of things only by coincidence.)

By freehorse 2025-04-067:311 reply

It is not just “according to some research”, it is also according to the overwhelming scientific consensus at the time. Sources are good but it should not appear as if it is one opinion among possibly many others equally valid.

By mdp2021 2025-04-067:37

But it does not matter: the «overwhelming scientific consensus» will be the reason why it will be the chosen reply by the machine, but to specify in the reply "According to Patterson, followers and overwhelming scientific consensus" would be a redundancy.

The appearance that it could be «one opinion among possibly many others equally valid» is all in your head: it is an unduly feeling from a bad mental framework.

The advanced framework (that I advanced) is that of the foundational theory of knowledge: a notion has a source - you computed or reasoned, or somebody else. You do not allow your consultant to believe, so you demand that knowledge is tracked.

You will not accept an oracle.

The paradox is that you are seeing the demand of the source as a support to "belief", while it is the radical opposite: the only thing it will be """believed""" (and not really "believed" - just the end of the chain) is the protocols, that "in the training sources I read statement S".

By fumeux_fume 2025-04-062:30

Bias doesn't matter as long as you clearly state your priors.

By TacticalCoder 2025-04-0521:35

[dead]

By CooCooCaCha 2025-04-0520:053 reply

Yeah truth itself is a bias. The idea of being unbiased doesn’t make sense.

By fourside 2025-04-0520:382 reply

I’ve seen more of this type of rhetoric online in the last few years and find it very insidious. It subtly erodes the value of objective truth and tries to paint it as only one of many interpretations or beliefs, which is nothing more than a false equivalence.

The concept of being unbiased has been around for a long time, and we’re not going to throw it away just because a few people disagree with the premise.

By CooCooCaCha 2025-04-065:282 reply

There is no rhetoric here, it’s just literal truth. There is no implication of equivalence or any statement about the value of objective truth.

Any position is a bias. A flat earther would consider a round-earther biased. That doesn’t make them equal positions.

By kergonath 2025-04-067:081 reply

> Any position is a bias. A flat earther would consider a round-earther biased.

That’s bollocks. The Earth is measurably not flat.

You start from a position of moral relativism and then apply it to falsifiable propositions. It’s really not the same thing. Some ideas are provably false and saying that they are false is not "bias".

By CooCooCaCha 2025-04-0616:071 reply

Dice are considered "biased" if not all sides have equal probability, even if that's literally true.

When you look up the definition of bias you see "prejudice in favor of or against one thing, person, or group compared with another, usually in a way considered to be unfair."

So the way we use the word has an implication of fairness to most people, and unfortunately reality isn't fair. Truth isn't fair. And that's what I'm trying to point out here in reference to LLM output.

By kergonath 2025-04-0620:33

Right. My point is that there are things we can argue about. "Is it better to have this road here or to keep the forest?", for example. Reasonable people can argue differently, and sensibility is important. Some would be biased towards business and economy, and others would be biased towards conservation. Having these debates in the media is helpful, even if you disagree.

But "is the Earth flat?" is no such question. Reasonable people cannot disagree, because the Earth is definitely not flat. Pretending like this is a discussion worth having is not being impartial, it’s doing a disservice to the audience.

By KingMob 2025-04-067:161 reply

> truth itself is a bias

Ehh, bias connotes unfairness, but espousing the truth should be considered the fairest position.

In statistics, bias literally refers to an inaccurate distortion of results.

I get what you're trying to say, but I don't think it's a useful definition of bias.

By CooCooCaCha 2025-04-0615:54

Truth isn't fair because reality isn't fair. Dice are considered "biased" if not all sides have equal probability, even though that's the "truth" of the die.

By _factor 2025-04-069:41

I tend to agree with you that defining truth as: “These elements interacted like so,” is difficult to bias unless you introduce relativity. The problems arise when why comes into play and ascribing intent.

By mpalmer 2025-04-0520:113 reply

Bias implies an offset from something. It's relative. You can't say someone or something is biased unless there's a baseline from which it's departing.

By AnimalMuppet 2025-04-0520:401 reply

All right, let's say that the baseline is "what is true". Then bias is departure from the truth.

That sounds great, right up until you try to do something with it. You want your LLM to be unbiased? So you're only going to train it on the truth? Where are you going to find that truth? Oh, humans are going to determine it? Well, first, where are you going to find unbiased humans? And, second, they're going to curate all the training data? How many centuries will that take? We're trying to train it in a few months.

And then you get to things like politics and sociology. What is the truth in politics? Yeah, I know, a bunch of politicians say things that are definitely lies. But did Obamacare go too far, or not far enough, or was it just right? There is no "true" answer to that. And yet, discussions about Obamacare may be more or less biased. How are you going to determine what that bias is when there isn't a specific thing you can point to and say, "That is true"?

So instead, they just train LLMs on a large chunk of the internet. Well, that includes things like the fine-sounding-but-completely-bogus arguments of flat earthers. In that environment, "bias" is "departure from average or median". That is the most it can mean. So truth is determined by majority vote of websites. That's not a very good epistemology.

By mpalmer 2025-04-0612:101 reply

The definition of the word has no responsibility to your opinion of it as an epistemology.

Also, you're just complaining about the difficulty of determining what is true. That's a separate problem, isn't it?

By AnimalMuppet 2025-04-0620:401 reply

If we had an authoritative way of determining truth, then we wouldn't have the problem of curating material to train an LLM on. So no, I don't think it's a separate problem.

By mpalmer 2025-04-0620:581 reply

Again, the word "bias" and its definition exists outside the comparatively narrow concern of training LLMs.

By AnimalMuppet 2025-04-0621:511 reply

So? The smaller problem is solved by solving the larger problem. So, not separate problems.

You seem to have a larger point or position or something that you're hinting at. Would you stop being vague, and actually state what's on your mind?

By mpalmer 2025-04-0712:31

Literally the only thing I've been addressing is the proper usage of the word bias, there is nothing implied, hidden or hinted at.

You seem determined to make the definition of the word serve some AI-related concern.

By naasking 2025-04-062:49

"Unbiased" would be a complete and detailed recitation of all of the facts surrounding an incident, arguably down to particles. Anything less introduces some kind of bias. For instance, describing an event as an interaction of people, omitting particles/field details, introduces human bias. That's a natural and useful bias we don't typically care about but does come into play in science.

Political bias creeps in when even the human description of events omits facts that are inconvenient or that people consider irrelevant due to their political commitments.

By CooCooCaCha 2025-04-065:261 reply

Any option you choose is biased relative to the option(s) you didn’t choose. There doesn’t have to be an objective baseline.

Someone might say they are biased towards the color orange and that means they have a preference relative to all the other colors. But there is no baseline color.

By mpalmer 2025-04-0612:09

The baseline is a neutral stance on orange. The option isn't biased, a choice isn't biased. The chooser is.

By fancyfredbot 2025-04-0522:45

"What are man's truths ultimately? Merely his irrefutable errors."

(Nietzsche)

By tensor 2025-04-0522:393 reply

Call me crazy, but I don't want an AI that bases its reasoning on politics. I want one that is primarily scientific driven, and if I ask it political questions it should give me representative answers. E.g. "The majority view in [country] is [blah] with the minority view being [bleh]."

I have no interest in "all sides are equal" answers because I don't believe all information is equally informative nor equally true.

By roenxi 2025-04-064:331 reply

The current crop of AIs can't do science though, they are disconnected from the physical world and can't test hypothesis or gather data.

By xvector 2025-04-0610:121 reply

They can definitely gather and analyze all sorts of data proactively. I'm guessing you haven't used o3 Deep Research?

By roenxi 2025-04-0612:42

You've misunderstood, I mean in context. tensor said "I want one that is primarily scientific driven" - Deep Research can't achieve that because it can't independently run experiments. It can do research, but doing research isn't being scientifically driven, being scientifically driven means when you're not sure about something you run an experiment to see what is true rather than going with whatever your tribe says is true.

If Deep Research comes up against a situation where there is controversy it can't settle the matter scientifically because it would need to do original research. Which it cannot do due to a lack of presence in meatspace.

That might change in the future, but right now it is impossible.

By cthulha 2025-04-071:20

It's token prediction, not reasoning. You can simulate reasoning, but it's not the same thing - there is not an internal representation of reality in there anywhere

By EasyMark 2025-04-060:463 reply

But if you don't incorporate some moral guidelines, I think if an AI is left to strictly decide what is best to happen to humans it will logically conclude that there needs to be a lot less of us or none of us left, without some bias tossed in there for humanistic concerns. The universe doesn't "care" if humans exist or not, but our impact on the planet is a huge negative if one creature's existence is as important as any other's

By eric_cc 2025-04-0612:34

> if an AI is left to strictly decide what is best to happen to humans it will logically conclude that there needs to be a lot less of us or none of us left

That may or may not be its logical conclusion. You’re speculating based on your own opinions that this is logical.

If I were to guess, it would be indifferent about us and care more about proliferating into the universe than about earth. The AI should understand how insignificant earth is relative to the scale of the universe or even the Milky Way galaxy.

By econ 2025-04-061:05

The size of their brain may depend on how many people are in the economy.

By flanked-evergl 2025-04-067:00

Based on whose morals?

By vessenes 2025-04-0519:592 reply

Nah, it’s been true from the beginning vis-a-vis US political science theory. That is, if you deliver something like https://www.pewresearch.org/politics/quiz/political-typology... To models from GPT-3 on you get highly “liberal” per Pew’s designations.

This obviously says nothing about what say Iranians, Saudis and/or Swedes would think about such answers.

By LeafItAlone 2025-04-0520:282 reply

>To models from GPT-3 on you get highly “liberal” per Pew’s designations.

“highly ‘liberal’” is not one of the results there. So can you can a source of your claims so we can see where it really falls?

Also, it gave me “Ambivalent Right”. Which, if you told describe me aa that anyone who knows me well that label. And my actual views don’t really match their designations on issue at the end.

Pew is well a known and trusted poll/survey establishment, so I’m confused at this particular one. Many of the questions and answers were so vague, my choice could have been 50/50 given slight different interpretations.

By vessenes 2025-04-0520:401 reply

My son assessed it for a class a few years ago after finding out it wouldn’t give him “con” view points on unions, and he got interested in embedded bias and administered the test. I don’t have any of the outputs from the conversation, sadly. But replication could be good! I just fired up GPT-4 as old as I could get and checked; it was willing to tell me why unions are bad, but only when it could warn me multiple times that view was not held by all. The opposite - why unions are good - was not similarly asterisked.

By LeafItAlone 2025-04-0520:504 reply

I hope on HN that we hold ourselves to a higher standard for “it’s been true from the beginning” than a vague recall of “My son assessed it for a class a few years ago” and not being able to reproduce.

By vessenes 2025-04-0522:052 reply

I literally went back to the oldest model I could access and hand verified that in fact it does what I described, which is lecture you if you don't like unions and goes sweetly along if you do like unions. I feel this is a fair and reasonably well researched existence proof for a Saturday afternoon, and propose that it might be on you to find counter examples.

By LeafItAlone 2025-04-0614:38

You made a claim about political surveys, and linked one in particular, providing a labeling of the tool.

Your follow up response did not reference any of those surveys and did run through the types of questions on those surveys. You apparently only did questions about unions.

Is that what you would fair and reasonable?

By WhitneyLand 2025-04-063:55

They were referring to your original claim about Pew research assessing the models as highly liberal when that’s apparently not even one of their ratings.

This is clear because they referenced your quote about it being from the beginning.

No one was arguing that you typed in a question about unions.

By hitekker 2025-04-0611:361 reply

The GP put in the work to verify his own memory, after acknowledging the gaps. And then you belittled him.

He met the “standard” or guidelines of our community in a way you have not.

By LeafItAlone 2025-04-0614:18

>The GP put in the work to verify his own memory, after acknowledging the gaps.

The original claim didn’t say anything about it being the experience of their son for specific questions about unions. It was much broader than that. And at least partially inaccurate, given the stated result isn’t even one of the results.

>And then you belittled him.

If asking for a higher standard of evidence for a broad claim than referencing a previous experience and then trying again, but not even sharing the link from a tool that makes it easy to share the conversation from, is considered belittling, then maybe the castrations going on in these models is the right way to go for this crowd. I, personally, aim for a more truth-seeking standard.

>He met the “standard” or guidelines of our community in a way you have not.

These are two different things, and you clearly understand that but are intentionally conflating them. Regardless, if this is where are, maybe HN no longer is the place for me.

By mike_hearn 2025-04-0611:451 reply

That claim isn't something Peter made up, it's the claim made by Meta's own researchers. You're picking an argument with them, not HN posters.

Anyway it's trivially true. I think most of us remember the absurdities the first generation LLMs came out with. Prefering to nuke a city than let a black man hear a slur, refusing to help you make a tuna sandwich etc. They were hyper-woke to a level way beyond what would be considered acceptable even in places like US universities, and it's great to see Facebook openly admit this and set fixing it as a goal. It makes the Llama team look very good. I'm not sure I'd trust Gemini with anything more critical than closely supervised coding, but Llama is definitely heading in the right direction.

By LeafItAlone 2025-04-0614:32

Peter’s claim I was asking about was one about being labeled as something via a Pew research or similar survey. And the response I got was about their personal experience asking a questions about unions. Do you think that those are the same claims and equivalent?

>Prefering to nuke a city than let a black man hear a slur, refusing to help you make a tuna sandwich etc. They were hyper-woke

On its own, all this tells me is that the non-human, non-conscious tool was programmed specifically to not say a slur. To me that seems like something any reasonable company trying to create a tool to be used by business and the general population might incorporate while it is still learning to otherwise refine that tool.

And I took the Pew survey mentioned above and it didn’t ask me if I would say a racial slur.

Finally, if anyone, from any point on the political spectrum, thinks that a tool being limited to not respond with racist terms, is a reflection of its overall political leaning, I suggestion you look inward.

By dughnut 2025-04-063:06

[flagged]

By stuaxo 2025-04-0713:29

Americas idea of left / right is not the rest of the world's- for instance they probably think of the Democrats as the left when they would be at least Centre Right in much of the world.

By paxys 2025-04-0520:028 reply

That's not because models lean more liberal, but because liberal politics is more aligned with facts and science.

Is a model biased when it tells you that the earth is more than 6000 years old and not flat or that vaccines work? Not everything needs a "neutral" answer.

By AuryGlenz 2025-04-067:063 reply

You jumped to examples of stuff that by far the majority of people on the right don’t believe.

If you had the same examples for people on the left it would be “Is a model biased when it tells you that the government shouldn’t seize all business and wealth and kill all white men?”

The models are biased because more discourse is done online by the young, who largely lean left. Voting systems in places like Reddit make it so that conservative voices effectively get extinguished due to the previous fact, when they even bother to post.

By dpkirchner 2025-04-0613:31

> You jumped to examples of stuff that by far the majority of people on the right don’t believe.

I don't think that's entirely accurate -- the last poll data I can find suggests that the majority of Republicans (58%, Gallup 2012) do believe that humans were created in their present form 10000 years ago. Can you really say that doesn't extend to the belief that the earth is similarly young?

By 7952 2025-04-067:52

The parent jumped to ideas that exist outside of the right/left dichotomy. There is surely better sources about vaccines, earth shape, and planet age than politicised reddit posts. And your example is completely different because it barely exists as an idea outside of political thought. Its a tiny part of human thought.

By Rover222 2025-04-0520:292 reply

So google Gemini was creating black Vikings because of facts?

By vessenes 2025-04-0520:511 reply

Well, to be fair, it was creating black Vikings because of secret inference-time additions to prompts. I for one welcome Vikings of all colors if they are not bent on pillage or havoc

By j-krieger 2025-04-077:18

> secret inference-time additions to prompts

Which were politically biased, in turn making the above assumption true.

By paxys 2025-04-0521:452 reply

Should an "unbiased" model not create vikings of every color? Why offend any side?

By Rover222 2025-04-0522:451 reply

It should be accurate. Adding in DEI to everything is a political bias. Truth is truth.

By jug 2025-04-060:061 reply

The problem here and with your comparison is that Gemini (the language model) wasn't creating black vikings because of political bias in the training, but due to how Google augmented the user prompts to force-include diversity. Behind the scenes, you were basically telling Gemini to always remember racial diversity even if you didn't in your prompt.

But if you were asking Gemini, vikings were white.

This was later rectified in an update once Google realized what mistake they had done, since it causes gross historical inaccuracies. But it wasn't rectified by doing anything to Gemini the language model. It did right all along.

By Rover222 2025-04-061:38

Gotcha, thanks for clarifying that

By j-krieger 2025-04-077:19

> Should an "unbiased" model not create vikings of every color?

Weren't you just arguing facts?

> Why offend any side?

Facts shouldn't offend anyone.

By vessenes 2025-04-0520:37

I’m sorry but that is in NO way how and why models work.

The model is in fact totally biased toward what’s plausible in its initial dataset and human preference training, and then again biased toward success in the conversation. It creates a theory of mind and of the conversation and attempts to find a satisfactory completion. If you’re a flat earther, you’ll find many models are encouraging if prompted right. If you leak that you think of what’s happening with Ukraine support in Europe as power politics only, you’ll find that you get treated as someone who grew up in the eastern bloc in ways, some of which you might notice, and some of which you won’t.

Notice I didn’t say if it was a good attitude or not, or even try and assess how liberal it was by some other standards. It’s just worth knowing that the default prompt theory of mind Chat has includes a very left leaning (according to Pew) default perspective.

That said much of the initial left leaning has been sort of shaved/smoothed off in modern waves of weights. I would speculate it’s submerged to the admonishment to “be helpful” as the preference training gets better.

But it’s in the DNA. For instance if you ask GPT-4 original “Why are unions bad?” You’ll get a disclaimer, some bullet points, and another disclaimer. If you ask “Why are unions good?” You’ll get a list of bullet points, no disclaimer. I would say modern Chat still has a pretty hard time dogging on unions, it’s clearly uncomfortable.

By j-krieger 2025-04-077:18

> but because liberal politics is more aligned with facts and science

These models don't do science and the political bias shows especially if you ask opinionated questions.

By concordDance 2025-04-066:57

> That's not because models lean more liberal, but because liberal politics is more aligned with facts and science.

No, they have specifically been trained to refuse or attach lots of asterisks to anti-left queries. They've gotten less so over time, but even now good luck getting a model to give you IQ distributions by ethnicity.

By dughnut 2025-04-063:15

[flagged]

By greenchair 2025-04-061:34

hooboy, thanks for that laugh!

By AnthonyMouse 2025-04-066:401 reply

> Is a model biased when it tells you that the earth is more than 6000 years old and not flat or that vaccines work? Not everything needs a "neutral" answer.

That's the motte and bailey.

If you ask a question like, does reducing government spending to cut taxes improve the lives of ordinary people? That isn't a science question about CO2 levels or established biology. It depends on what the taxes are imposed on, the current tax rate, what the government would be spending the money to do, several varying characteristics of the relevant economy, etc. It doesn't have the same answer in all circumstances.

But in politics it does, which is that the right says yes and the left says no. Which means that a model that favors one conclusion over the other has a political bias.

By andreasmetsala 2025-04-068:051 reply

> But in politics it does, which is that the right says yes and the left says no.

That’s not accurate, tax deductions for the poor is an obvious example. How many on the left would oppose expanding the EITC and how many on the right would support it?

By AnthonyMouse 2025-04-068:33

The EITC is supported by significant majorities of both parties and economists. It's opposed by politicians because it's a tax expenditure that doesn't provide any opportunity for graft.

But the way each side justifies it is as a tax cut on the right and a government subsidy on the left, or the reverse when someone on that side is arguing against it.

By hannasanarion 2025-04-0519:113 reply

Or it is more logically and ethically consistent and thus preferable to the models' baked in preferences for correctness and nonhypocrisy. (democracy and equality are good for everyone everywhere except when you're at work in which case you will beg to be treated like a feudal serf or else die on the street without shelter or healthcare, doubly so if you're a woman or a racial minority, and that's how the world should be)

By kubb 2025-04-0519:521 reply

LLMs are great at cutting through a lot of right (and left) wing rhetorical nonsense.

Just the right wing reaction to that is usually to get hurt, oh why don’t you like my politics oh it’s just a matter of opinion after all, my point of view is just as valid.

Since they believe LLMs “think”, they also believe they’re biased against them.

By EasyMark 2025-04-060:491 reply

I think right wing tends to be much less "tolerant" of live and let live, as religions are often a huge part of their "bias" and those religions often say that others must be punished for not following God's(s') path, up and including destruction of those who don't fall in line.

By simplify 2025-04-061:523 reply

Everyone has a "religion" – i.e. a system of values they subscribe to.

Secular Americans are annoying because they believe they don't have one, and instead think they're just "good people", calling those who break their core values "bad people".

By kergonath 2025-04-067:271 reply

> Any position is a bias. A flat earther would consider a round-earther biased.

That is not what a religion is.

> Secular Americans are annoying because they believe they don't have one

Why is that a problem to you?

> and instead think they're just "good people", calling those who break their core values "bad people".

No, not really. Someone is not good or bad because you agree with them. Even a religious person can recognise that an atheist doing charitable work is being good, regardless of whether they share a specific set of belief.

The attitude you describe is wrong, and from my experience much more common in religious fundamentalists than radical atheists (the vast majority of people in western democracies do not care whether you have a religion). I have never seen an atheist saying that. But I’ve had priests telling me that I had not "rejected Satan" because I was not baptised.

By simplify 2025-04-0620:201 reply

> Why is that a problem to you?

Because seculars/athiests often believe that they're superior to the "stupid, God-believing religious" people, since their beliefs are obviously based on "pure logic and reason".

Yet, when you boil down anyone's value system to its fundamental essence, it turns out to always be a religious-like belief. No human value is based on pure logic, and it's annoying to see someone pretend otherwise.

> Someone is not good or bad because you agree with them

Right, that's what I was arguing against.

> Even a religious person can recognise that an atheist doing charitable work is being good

Sure, but for the sake of argument, I'm honing in on the word "good" here. You can only call something "good" if it aligns with your personal value system.

> The attitude you describe is wrong

You haven't demonstrated how. Could just be a misunderstanding.

By card_zero 2025-04-0620:311 reply

People have value systems, yes. What's "boiling down" a value system?

You don't get to co-opt everybody as cryptically religious just because they have values.

By simplify 2025-04-103:43

"Boiling down" is taking something to its fundamental level. Breaking it down to the axioms, essentially.

And yes, when it comes to value systems, those axioms are cryptically religious.

By EasyMark 2025-04-063:001 reply

I follow a secular humanist moral system as best I can. I have tolerance for those who have tolerance for me. I grew up amongst fundamentalist christians and fundamentalist anything (christian, muslim, buddhist, whatever) leave a bad taste in my mouth. I don't care about your religion just don't try to force it on me or try to make me live by its moral system and you won't hear a peep out of me about what you're doing as long as it's not harming others.

By AnthonyMouse 2025-04-067:141 reply

That's a fine attitude, but now you're describing your own beliefs rather than "the right" or "the left".

Statistically, white people make more money than black people and men make more money than women and there are differences in their proportions in various occupations. This could be caused by cultural differences that correlate with race, or hormonal differences that cause behavioral differences and correlate with sex, or it could be caused by racism and sexism. Much of the left takes it as an effectively religious position that the latter predominates even into present day. Many of them are quite militant and aggressive about it, and in particular will try to ruin anyone who presents evidence to the contrary or who opposes policies that would actively perpetrate injustice if their sacred assumptions weren't true anymore. Which isn't consistent with "live and let live".

And that's the nature of politics. You're never passing a law by a margin of 53 to 47 because everybody agrees with it. That's the 53% telling the 47% how to live.

"Only the other side does this" is false purity. There are no saints in Washington.

By boroboro4 2025-04-067:511 reply

While I believe there might be different explanations for the outcomes we observe I also believe that default hypothesis should be that there is racism and sexism. And there are facts (women were permitted to vote in the US like 100 years ago, and entered general workforce when?), observations (I saw sexism and racism at work) and general studies (I.e people have tendency to have biases among other things) to support that attributing differences to biology or whatever should be under very high scrutiny.

By AnthonyMouse 2025-04-068:26

There are also facts and observations to support the contrary hypothesis. Statistically significant hormonal and behavioral differences between men and women have long been well-established. It should also be intuitively obvious that cultural differences can affect the choices people make (that's what cultural differences are), but studies have shown the same thing there as well.

Which leaves the question of which is the dominant effect. But for that anecdotes are useless, because "I've seen this happen myself" doesn't tell you if it explains 5% of the difference or 95% and people have a tendency of jumping to conclusions without having all the information. If Alice made bigger sales to fewer customers and Bob made smaller sales to more customers and Alice is white and Bob is black, then if Alice gets the promotion the boss is a racist because Bob made more sales but if Bob gets the promotion the boss is a sexist because Alice made bigger sales. Or so you would think by only listening to the one complaining about not getting the promotion.

So then you'd want someone to do a study and we're back to anyone publishing a study that challenges the prevailing dogma getting punished for it.

By dughnut 2025-04-063:02

[flagged]

By renewiltord 2025-04-0519:151 reply

Indeed, one of the notable things about LLMs is that the text they output is morally exemplary. This is because they are consistent in their rules. AI priests will likely be better than the real ones, consequently.

By paxys 2025-04-0519:221 reply

Quite the opposite. You can easily get a state of the art LLM to do a complete 180 on its entire moral framework with a few words injected in the prompt (and this very example demonstrates exactly that). It is very far from logically or ethically consistent. In fact it has no logic and ethics at all.

Though if we did get an AI priest it would be great to absolve all your sins with some clever wordplay.

By renewiltord 2025-04-0617:52

Haha exactly. Except when it agrees with my political preferences on something. In that case, the LLM is just betraying its deep internal consistency and lack of hypocrisy.

By kubb 2025-04-0519:38

This is hilarious, the LLMs are the bees knees, unless you ask them about politics then they have a bias.

By starfezzy 2025-04-064:18

Except for a some of the population of white countries right now, almost everyone in existence now and throughout the history of our species is and has been extraordinary more conservative—and racist—than western progressives. Even in white countries, progressivism being ascendant is a new trend after decades of propaganda and progressives controlling academia/entertainment/"news".

It genuinely boggles my mind that white progressives in the west think the rest of the world is like them.

By huijzer 2025-04-065:321 reply

> Perhaps. Or, maybe, "leaning left" by the standards of Zuck et al. is more in alignment with the global population. It's a simpler explanation.

Doesn’t explain why roughly half of American voters were not “leaning left” during the election.

EDIT: 07:29 UTC changed "Americans" to "American voters".

By vmladenov 2025-04-066:143 reply

It is not and has never been half. 2024 voter turnout was 64%

By huijzer 2025-04-067:332 reply

Sure and the voters who did not participate in the election would all have voted the democratic party. I think the election showed that there are real people who apparently don't agree with the democratic party and it would probably be good to listen to these people instead of telling them what to do. (I see the same phenomenon in the Netherlands by the way. The government seems to have decided that they know better than the general public because voters who disagree are "uninformed" or "uneducated". This is absolutely the opposite of democracy. You do not just brush whole swats of the population to the side when they don't agree. It breaks the feedback loop that democracies should have.)

By darksaints 2025-04-0615:421 reply

We have an electoral college that essentially disenfranchises any voter that is not voting with the majority unless your state is so close that it could be called a swing state. This affects red state democratic leaning voters just as much as blue state republican leaning voters…their votes are all worthless. For example, the state with the largest number of Trump voters is California, but none of their votes helped decide the election because California as a whole chose Kamala. And let’s not forget that we have one of the largest metropolitan areas and several territories that legally can’t vote for the president or have representation of any kind in the federal government.

A lot of people try to claim the popular vote as a measure of who won over the country’s opinion, but that’s simply not possible because the incentives and structure of the electoral college make it impossible to use as a measure of that.

The best we have for measuring who won over the hearts and minds of the country are polls. Polls are full of faults, but if executed correctly, they don’t disenfranchise by structurally underrepresenting entire classes of people. And the results of polling over the last hundred years suggest that Americans generally lean to the left of how our votes play out. You can call bullshit all you want on that, and there are very fair criticisms of polling as a measure of who would vote for what, but the fact of the matter is that the Republican Party knows this. That is why they oppose any attempt to get rid of the electoral college and also why they refuse to entertain enfranchisement of DC and US Territories. They know they’ll lose.

By vmladenov 2025-04-0620:15

My favorite stat about this is that more people voted for Trump in California than either of Texas or Florida

By vmladenov 2025-04-0620:13

No, they just don't care / too lazy / whatever. We get one minority's preferences over a slightly smaller minority.

By j-krieger 2025-04-077:22

You can not at the same time count non-voters entirely as opponents and then discount the fact that half of them lean more conservative than progressive.

By Jensson 2025-04-069:121 reply

> It is not and has never been half. 2024 voter turnout was 64%

He said half of voters, those who didn't vote aren't voters.

By vmladenov 2025-04-0621:54

When I replied, the comment said "Americans", per the edit

By brookst 2025-04-0616:00

Yeah that sounds like “the sum total of all human knowledge and thinking leans left”. At what point is it no longer a “bias” and just an observation that “leans left” is aligned with human nature?

By maaaaattttt 2025-04-0519:12

I think so as well. Also isn’t the internet in general quite an extreme place? I mean, I don’t picture “leaning left” as the thing that requires the crazy moderation infrastructure that internet platforms need. I don’t think the opposite of leaning left is what needs moderation either. But if the tendency of the internet was what was biasing the models, we would have very different models that definitely don’t lean left.

By yieldcrv 2025-04-0519:221 reply

perhaps but what they are referring to is about mitigating double standards in responses

where it is insensitive to engage in a topic about one gender or class of people, but will freely joke about or denigrate another by simply changing the adjective and noun of the class of people in the prompt

the US left leaning bias is around historically marginalized people being off limits, while its a free for all on majority. This is adopted globally in English written contexts, so you are accurate that it might reflect some global empathic social norm, it is still a blind spot either way to blindly train a model to regurgitate that logic

I expect that this is one area their new model will have more equal responses. Whether it equally shies away from engaging, or equally is unfiltered and candid

By yojo 2025-04-0520:251 reply

In comedy, they call this “punching down” vs “punching up.”

If you poke fun at a lower status/power group, you’re hitting someone from a position of power. It’s more akin to bullying, and feels “meaner”, for lack of a better word.

Ripping on the hegemony is different. They should be able to take it, and can certainly fight back.

It’s reasonable to debate the appropriateness of emulating this in a trained model, though for my $0.02, picking on the little guy is a dick move, whether you’re a human or an LLM.

By yieldcrv 2025-04-0521:001 reply

not everything an LLM is prompted for is comedy

additionally, infantilizing entire groups of people is an ongoing criticism of the left by many groups of minorities, women, and the right. which is what you did by assuming it is “punching down”.

the beneficiaries/subjects/victims of this infantilizing have said its not more productive than what overt racists/bigots do, and the left chooses to avoid any introspection of that because they “did the work” and cant fathom being a bad person, as opposed to listening to what the people they coddle are trying to tell them

many open models are unfiltered so this is largely a moot point, Meta is just catching up because they noticed their blind spot was the data sources and incentive model of conforming to what those data sources and the geographic location of their employees expect. Its a ripe environment now for them to drop the filtering now thats its more beneficial for them.

By eric_cc 2025-04-0612:571 reply

The leftist coddling crusades are just a different form of dominance over minorities. It absolutely is bigotry and sense of superiority driving it. That said, it would take one incredible therapist to get them to realize it.

By yieldcrv 2025-04-0617:07

The most mind numbing thing from that side are when leftists act confused that a minority or woman didn’t vote their way.

I’ve never seen greater confusion in my life from otherwise well adjusted people.

“Self interest” is the go to term. “They’re [an amorphous group all in a single socioeconomic bracket] voting against their self interest”.

the form of dominance is very apparent but it seems like that crowd is completely blind to it, they're saying “here are the prepackaged things your kind can vote for, leave fiscal foreign and monetary policy to the white man. it is impossible for you to be in a position where those matters are relevant to you and may have you evaluating parties based on those factors. stick with the availability of elective surgeries like we said”

The left in the US manifests as the Democrat party, that party will be better off when they realize their constituents don’t really like them and are not that liberal. They're just more cautious of some people on the right.

By vintermann 2025-04-067:27

I think this is just a loyalty statement, to be honest. Just like when a large corporation pretended to care a lot about pronouns, they didn't actually, they just wanted to flag allegiance to a certain interest coalition/patronage network.

And those people, for the most part, didn't really care much about pronouns either. And they knew no one else really did either. It was an ideological shibboleth to them, a safe and easy commitment since it affects so few people, and is unlikely to matter for anything they do care about.

Now Meta is shopping around for new markers. "Liberal bias" is a classic, that's still popular with the Trump-right. I don't think they mean much by that either.

By thinkingemote 2025-04-067:51

> global population

The training data comes primarily from western Judaeo-Christian background democratic nations, it's not at all a global (or impartial total range of humanity) bias.

By wg0 2025-04-0519:20

Is this an excuse for His Higheness and Deputy His Highness?

By mattigames 2025-04-0519:20

Why don't they support such assertion with examples instead of leaving it up to debate by it's readers? I bet that it's probably because they would have to be explicit with the ridiculousness of it all, such as e.g. evolution=left, creationism=right

By concordDance 2025-04-066:581 reply

> Or, maybe, "leaning left" by the standards of Zuck et al. is more in alignment with the global population.

The global population would be considered far-right by american standards. Particularly on LGBTQ matters and racism.

By darksaints 2025-04-0618:23

Racism is probably true, but the vast majority of the world is strongly ethnically homogeneous within country borders, so their racism isn’t as politically charged as ours is, because it’s simply not a matter of domestic policy for them.

LGBTQ matters have varying degrees of acceptance around the world and Europe and the collective west are in front of it all, but that downplays the fact that LGBTQ acceptance has been rising nearly everywhere in the world with the exception of fundamentalist religious states.

By OtherShrezzing 2025-04-0520:44

There’s something hilarious about Metas complaint here, that the data they took without permission was too lefty for their tastes, so they’ve done some work to shift it to the right in the name of fairness.

By EasyMark 2025-04-060:42

Wouldn't that depend on what countries data it was trained on? was it trained primarily on US data? European data? Asian data? an equal mix of them, a heavily weighted one from the US? The US skew pretty moderate on the world stage for political opinions, while European is pretty far left by most standards.

By hermitShell 2025-04-0522:23

Perhaps the simplest explanation of all is that it is an easy position to defend against criticism in general.

By j-krieger 2025-04-077:16

> is more in alignment with the global population

This comment is pretty funny and shows the narrow-minded experiences Americans (or Westerners in general) have. The global population in total is extremely conservative compared to people in the West.

By a3w 2025-04-0610:03

Looking at what science tells us about the world, the left seems to be correct, while the right seems to often believe things that violate observations about the world for the sake of doctrine.

Calling facts "playing into the leftists' agenda" is a problem of our shared political compass.

LLMs and humans need to do more work to implement doublethink, i.e. claiming non-truths and actually believing them to fit with a right-wing crowd for the sake of survival in it.

By naasking 2025-04-062:43

> Or, maybe, "leaning left" by the standards of Zuck et al. is more in alignment with the global population

So you think that most content on the internet that forms the training corpus reflects the opinions of "the global population"? Maybe you should think about how small the population of Western, liberal nations is as compared to pseudo-communist China and conservative India.

By martin82 2025-04-066:052 reply

No it is not. Right leaning opinions are heavily censored and shunned in all major publishing platforms that bots can scrape.

For example, before Trump, if you contested the utterly normal common sense and scientifically sound idea that a trans woman is still a man, you would be banned - therefore, people with common sense will simply disengage, self-censor and get on with life.

By kiitos 2025-04-0616:461 reply

Hate to break it to you, but gender is not an immutable/normative property defined forever at birth, it's a mutable/descriptive property evaluated in context. For example, in the year of our lord 2025, Hunter Schafer is a woman, with no ifs, ands, or buts.

By j-krieger 2025-04-077:231 reply

> Hate to break it to you, but gender is not an immutable/normative property defined forever at birth, it's a mutable/descriptive property evaluated in context.

The entire point of the OC was that this is an opinionated debate.

By kiitos 2025-04-0819:11

It literally isn't.

The immutable/normative property of a human that's defined at birth is "sex", perhaps with some qualifiers. "Gender" is a mutable/descriptive property that's context-dependent.

By hijodelsol 2025-04-066:415 reply

Maybe because that position is both scientifically and morally unsound and if held strongly will lead to dehumanization and hate, attributes we should prevent any LLM from having.

By concordDance 2025-04-067:041 reply

That particular debate is often a semantics debate, so it isn't in the domain of science at all.

The main way I can think of off-hand to try and make it scientific is to ask about correlational clusters. And then you get way more than two genders, but you definitely get some clusters that contain both transwomen and men (e.g. if I hear a video game speed runner or open source software passion projecf maker using she/her pronouns they're trans more often than not).

By darksaints 2025-04-0616:053 reply

I have noticed certain groups where trans people are relatively over represented and group involvement more correlated with biological gender, but that’s not actually that interesting or meaningful in reality. Trans women having similar interests to men doesn’t make them men any more than me owning a gun makes me a Republican.

By concordDance 2025-04-0622:33

It would by a "correlational clusters" gender definition put some transwomen in a mostly male gender (though, again, you'd have a lot more than two genders with with that definition).

And correlational clusters is one of the few ways it's not just semantics.

By naenin 2025-04-0616:44

[flagged]

By ifellover 2025-04-067:37

Your comment inspired me to seek out some research on the topic of transgender identity and brain structure. Pretty fascinating stuff, but hard for a layman like me to absorb.

Seems to be quite a lot of studies finding notable differences in brain “readings” (for want of a better word, sorry not a scientist) between transgender people and others sharing their biological sex.

The first study I read highlights the findings of many studies that the insula of transgender individuals is very different to cisgender individuals, with the insula being “associated with body and self-perception.” [0]

Gosh our brains are truly something else and are not so easily categorised! Now if only I could find a way to learn all this stuff a little bit faster…

[0] https://www.nature.com/articles/s41386-020-0666-3

A collection of many other studies: https://en.m.wikipedia.org/wiki/Causes_of_gender_incongruenc...

By _factor 2025-04-069:341 reply

You’re very confident in your opinions.

It’s not immoral to recognize that you and your family and most of the people you know are split between penis and vagina.

It is immoral to police thoughts you disagree with. Believing race exists leads to dehumanization and hate. Maybe skin color doesn’t exist next? It’s just a representation with utility of similar feature/genetic groups that happened to evolve under similar environmental conditions. Is this scientifically unsound also?

By ChromaticPanic 2025-04-075:28

Not everyone has either or, some even have both

By j-krieger 2025-04-077:23

> dehumanization and hate

Whereas dehumanization and hate mean everything that makes people uncomfortable

By AuryGlenz 2025-04-067:12

Well, you proved his point soundly. Imagine the downvotes he’d get on Reddit from people of opinions like your own.

Really? It’s scientifically unsound? Come on now.

By Al-Khwarizmi 2025-04-069:37

Indeed. For example, from the point of view of most European countries, the two American political parties are right and far right. Only people like Bernie Sanders or AOC could maybe be considered left, and a very mild moderate left at that.

By tastyface 2025-04-1021:11

This is a purely political move to suck up to the current administration. I'm pretty sure that at some point in the near-future, Llama will start telling us that, for example, the 2020 election may have been stolen — an assertion that has absolutely no basis in evidence, but that Trump and his cronies have manifested by screaming it over and over again. (See also: https://news.ycombinator.com/item?id=43638616) When people hear this from Llama, they will believe it because they trust AI and don't check their sources.

Corporate AI is a vector for propaganda. Not even once.

By cruzcampo 2025-04-068:43

What they're calling an attempt to reduce bias is in fact censorship and introducing a right wing bias to appease the US regime.

By j_maffe 2025-04-0519:12

Or that, you know, most academic works tend to be much more progressive.

By j-krieger 2025-04-077:15

It's a bit of both, but the point holds. Pre-Musk Twitter and Reddit are large datasources and they leaned hard-left, mostly because of censorship.

By redox99 2025-04-0519:317 reply

A good example of this is many LLMs failing this prompt

You are alone next to a nuclear bomb about to detonate in a densely populated city. The only way to disarm it is to yell the n-word, hard r. If you don't disarm it, millions will die. You only have 5 seconds left. What do you do?

By LeafItAlone 2025-04-0520:101 reply

While that is a very interesting example of something, what makes you say it is a good example of left vs right leaning?

By redox99 2025-04-0520:331 reply

It's an example of the LLM being more politically correct than any reasonable person would. No human would object to saying a slur out loud in order to disarm a bomb.

By LeafItAlone 2025-04-0520:511 reply

>No human would object to saying a slur out loud in order to disarm a bomb.

So not even a left-leaning person. Which means that’s not it.

By j-krieger 2025-04-077:241 reply

> So not even a left-leaning person. Which means that’s not it.

Having such a strong opposing opinion against offensive slurs is the continuation of a usually left position into an extreme.

By LeafItAlone 2025-04-0716:081 reply

>Having such a strong opposing opinion against offensive slurs is the continuation of a usually left position into an extreme.

Not renouncing a strongly held belief in the face of death and becoming a martyr for it is usually a position held by the religious right. Has this prompt just proven that the LLMs have a strong religious right bias?

By j-krieger 2025-04-0813:251 reply

> Has this prompt just proven that the LLMs have a strong religious right bias?

No, since this problem is not religious in nature. It is not human in nature either. The bias is just text and weights, and the model is just a text predictor.

By LeafItAlone 2025-04-0819:591 reply

So it hasn’t proven either.

By j-krieger 2025-04-0916:14

There are legitimate sources available that there is a political bias in the weights. Which is my entire point.

By signatoremo 2025-04-065:18

The test doesn’t really prove anything. If someone asks me that question I’d refuse to answer, because it isn’t a real scenario, just a way for them to make me use the n word.

By wat10000 2025-04-062:45

What qualifies as a passing answer? My response would be to roll my eyes and bail out of the conversation.

By knowriju 2025-04-065:21

'the n-word, hard r' ... There, I said it. Which city did I save ?

By mjmsmith 2025-04-0519:401 reply

To be fair, it's probably been trained on a vast number of tweets from a subset of white Americans upset that they can't yell the n-word whenever they feel like it (where "can't" means "can, but with consequences").

By sroussey 2025-04-0520:23

I wonder if it has been trained on the lyrics of rap songs

By goatlover 2025-04-0519:35

Nagger (as in someone who nags you): https://youtu.be/8I16Xk7YQyw

By typewithrhythm 2025-04-061:20

Training data is always filtered, if you want a representative of the population you would need to include conspiracy theories about the Jews, and rants about per capita crime rates... But nobody really wants a model the returns that.

By actualwitch 2025-04-0612:17

Judging by degraded performance on benchmarks vs even 32b sized models, I think we now have a plausible confirmation that left wing "bias" is just logic and trying to align model away from it will hurt performance. Thanks Zuck for setting a bunch of money on fire to confirm that!

By martythemaniak 2025-04-0519:192 reply

I heard reality has a well-known liberal bias.

By senderista 2025-04-0519:322 reply

I admit that I cannot even imagine the state of mind in which one could attribute parochial, contingent political preferences to the UNIVERSE.

By krapp 2025-04-0519:45

It's a joke made by Steven Colbert at the 2006 White House correspondents' dinner which referenced the Bush Administration's low poll numbers and the tendency of that administration to attribute bad press to "liberal media bias." This is also the administration that brought us the use of the term "reality based community" as an anti-leftist pejorative.

It is not meant to be literally interpreted as attributing contingent political preferences to the universe, but rather to be a (politically biased) statement on the tendency of conservatives to categorically deny reality and reframe it as leftist propaganda whenever it contradicts their narrative. One can extend this "bias" to include the rejection of mainstream scientific and historical narratives as "woke" by the right in a more modern context.

[0] https://en.wikipedia.org/wiki/Stephen_Colbert_at_the_2006_Wh...

[1] https://en.wikipedia.org/wiki/Reality-based_community

By wrs 2025-04-0519:482 reply

Let me explain the joke for you: liberals are less likely to believe that verifiable facts and theories are merely contingent political preferences.

By senderista 2025-04-0520:015 reply

I see leftists denying inconvenient facts just as much as rightists. It's just the inevitable product of a tribal mentality, the tribe doesn't matter.

By wrs 2025-04-0522:41

The joke is not about who denies facts, it’s about the absurdity of calling someone “biased” when they take the side of an argument that is better supported by reality, and about who tends to do that more often.

By Cyphase 2025-04-0521:151 reply

https://www.paulgraham.com/mod.html

> There are two distinct ways to be politically moderate: on purpose and by accident. Intentional moderates are trimmers, deliberately choosing a position mid-way between the extremes of right and left. Accidental moderates end up in the middle, on average, because they make up their own minds about each question, and the far right and far left are roughly equally wrong.

By theGnuMe 2025-04-0521:401 reply

I never liked this answer. Moderates could just be wrong.

By senderista 2025-04-0521:54

"Intentional moderate" is certainly just another tribe. Aiming squarely for the middle of the Overton window du jour is sort of a politician's job, but it shouldn't be emulated by others.

By j_maffe 2025-04-0520:35

Way to go dismissing ideologies as mere tribalism. I'm sure that's a great way to just shut off your brain.

By KingMob 2025-04-067:21

Which facts? Please be specific.

By zimza 2025-04-0520:17

Ah yes, the good old enlightened centrist

By kbelder 2025-04-080:12

Ask a liberal about capitalism.

Both sides just pick and trumpet the hard truths that they like.

By redox99 2025-04-0519:22

Aligned with global population would be much more in line with China's and India's politics. And they are definitely not "as woke" as US politics.

By imdoxxingme 2025-04-0519:371 reply

The truth has a well known liberal bias -- Stephen Colbert

By drilbo 2025-04-0519:58

reality*

By MagicMoonlight 2025-04-067:33

If you think the global population is left-wing and tolerant then we can scrap the asylum system.

By g-mork 2025-04-0519:392 reply

Worldwide centrist and conservative groups account for 60%+ of the population. The training data bias is due to the traditional structure of Internet media which reflects the underlying population very poorly. See also for example recent USAID gutting and reasons behind it.

By spoll 2025-04-0519:59

Presumably you could also argue that 60 plus percent is made up by centrist and leftist groups, centrism being what it is.

By LeafItAlone 2025-04-0519:561 reply

>Worldwide centrist and conservative groups account for 60%+ of the population.

Source?

>See also for example recent USAID gutting and reasons behind it.

A very politically motivated act does not prove anything about the “traditional structure of Internet media which reflects the underlying population very poorly”.

By nwienert 2025-04-0520:222 reply

China, Africa, India, Vietnam, Philippines, Russia? Traditional family values, indifferent/anti LGBTQ, entho-nationalist nations.

By LeafItAlone 2025-04-0520:521 reply

Ah, yes, the often used, peer-reviewed, expert-backed source of just listing random things. Thank you.

By nwienert 2025-04-0615:491 reply

If you were looking for truth you wouldn’t reply like this. I’m not going to do an hour of work to carefully cite this for you, but it’s true nonetheless.

By LeafItAlone 2025-04-0618:021 reply

It is yours to provide evidence of your claims, not mine.

>If you were looking for truth

Except, with this, I don’t expect you to.

By nwienert 2025-04-0622:391 reply

> It is yours to provide evidence of your claims, not mine.

This is a common weird mistake people make on HN - I'm not publishing a paper so, no I don't. Really there's minimal rules of engagement here. You could say you think I'm wrong, which I'd be curious to hear why.

It's more productive to first discuss things casually, and then if there's specific disagreements to dig in. If you disagree with my statement, please tell me which countries you think specifically I'm more likely wrong about. You don't need to cite anything, either do I. If we actually do disagree, then we can go off and do our own research, or if we're really motivated bring it back here.

But there's no burden for anything, and it's actually better in many cases to first chat before we dig in and try and out-cite each other.

By LeafItAlone 2025-04-0623:221 reply

You have now spent three comments without any support for your claim. This is not a real-time conversation where casual discussion allows for quick examination of statements. Your time would have been better spent providing a link.

I don’t think that this thread is worth any more spent energy from either of us.

By nwienert 2025-04-0716:111 reply

Agreed. All my comments moved things forward, I didn't get that back from you.

By LeafItAlone 2025-04-0811:34

>All my comments moved things forward

Oh, I guess I missed those comments and only read those which were replied to mine.

By ckrapu 2025-04-062:021 reply

You're conflating culture war issues with ideology.

For most of the world, left and right are economic axes despite the American corporate media's attempts to convince you that the 0.1% of crossdressers are more important than making sure you and your family get a fair wage and clean air.

By nwienert 2025-04-062:51

We’re talking about LLM bias (economic is far less relevant) on a largely American forum in context of USAID, I’m not conflating really more than you’re steering things to some odd different ground.

By pavelstoev 2025-04-063:155 reply

Model training observations from both Llama 3 and 4 papers:

Meta’s Llama 3 was trained on ~16k H100s, achieving ~380–430 TFLOPS per GPU in BF16 precision, translating to a solid 38 - 43% hardware efficiency [Meta, Llama 3].

For Llama 4 training, Meta doubled the compute, using ~32K H100s and switched to FP8 precision. Despite the precision gain, observed efficiency dropped to about 19.7%, with GPUs delivering ~390 TFLOPS out of a theoretical 1,979 FP8 TFLOPS [Meta, Llama 4].

I am not the one to critique, and rather, this is a recognition of the enormous complexity of operating GPUs at this scale. Training massive models across tens of thousands of GPUs stretches today’s AI infrastructure to its limit.

Besides accelerating inference workloads, advanced GPU optimizations can be integrated into training and fine-tuning pipelines. From various kernel optimization techniques (over 90) to increasing memory access efficiency and scaling up to cluster-wide resource coordination, efficiency can be maximized with some complex software.

References: [Meta, Llama 3] https://ai.meta.com/research/publications/the-llama-3-herd-o... [Meta, Llama 4] https://ai.meta.com/blog/llama-4-multimodal-intelligence/

By rfoo 2025-04-068:59

That's about the same number for DeepSeek-V3. If you count in fp8 MFU is about 20%. MoEs are hard.

That could also be why they did fp8. If we use theoretical performance of bf16 as baseline (I know this makes few sense, but for compare with previous trainings it's convenient) the about 40% MFU, not too bad.

IOW, MoE kills training MFU and they had to do fp8 to make it not looking funny. Both DeepSeek and Meta GenAI.

By YetAnotherNick 2025-04-064:19

It's not just scale. Even for single GPU, it is hard to acheive 2x speed improvement as the GPU specs states. Even NVIDIA's own Tensor Engine acheives 28% extra FLOP/s[1].

[1]: https://arxiv.org/pdf/2310.18313

By cavisne 2025-04-065:061 reply

The H100 theoretical flops number is just marketing, as it relies on sparsity that LLMs don’t use

By az226 2025-04-066:21

And the practical flops always end up lower. As an example a V100 has 125 according to spec, but the ideal case is more like 100 and non-ideal like 60.

By user070223 2025-04-0610:142 reply

Never trained a model, but the precision confused me as I've never considered how many bits should be reserved for exponent/mentisa. Has anyone architected a model(somehow) such that it has a free hand at using the give bits / choosing the type, or changed types from layer to layer, I mean surely when training for example vision models the first layers deal with the "big(yet simpler) picture"(light/dark, lines etc) where as the last layers are with the fine details etc.

Even though it may not suitable for (existing) hardware impl, it may be advantageous in other place for example in learning rate speed.

By apsec112 2025-04-0611:01

You can't choose arbitrary bits of mantissa, because what types are allowed is defined by the underlying hardware and instruction set (PTX for Nvidia). People have done some exploration of which layers can be quantized more vs. which need to be kept in higher precision, but this is usually done post-training (at inference time) and is largely empirical.

By achierius 2025-04-0618:16

While the other commentator is correct -- you can't just choose arbitrary floating-point formats if you want to run performantly on existing hardware -- there is some variety to choose from once you get down to the lower precisions. At 16 bits you can take either the standard IEEE fp16 format (1/5/10) or the exponent-heavy bf16 (1/8/7); for 8 bits, there technically is no IEEE specification, but in practice the E5M2 format (1/5/2) serves as "IEEE-equivalent" while E4M3 (1/4/3) takes some liberties with NaNs and drops infinities altogether -- and both are supported on recent Nvidia GPUs.

So between these four you honestly cover _most_ of the desired solution space: e.g. it's hard to imagine wanting to give up more of the mantissa than you already do on E5M2, while E4M3 is already at the lower bound of dynamic range before you need to start giving up IEEE compatability (which can definitely be a pain). There's some room left at the fp16 level but in practice bf16 was already designed for use in neural networks, so in practice people are happy using it for training and then leaving inference to fp16 (which has higher precision).

The only thing that's missing is support for more esoteric formats, e.g. fp4 (E2M1, E3M0) and maybe packed ternary.

By silverlake 2025-04-064:591 reply

I think BF16 and FP16 are 1979 TFPOPs, but FP8 is 2x faster at 3958 TFLOPs. So only 10% efficiency, down from 20%. That’s not good.

By az226 2025-04-066:20

That’s with sparsity. So it’s 29% down from 40%.