gkamradt

2025-06-10 9:26

Commented: "OpenAI o3-pro"

o3-pro is not the same as the o3-preview that was shown in Dec '24. OpenAI confirmed this for us. More on that here: https://x.com/arcprize/status/1932535380865347585

2025-03-24 11:58

Commented: "Arc-AGI-2 and ARC Prize 2025"

Ah yes, two things

1. We had a no-data retention agreement with them. We were assured by the highest level of their company + security division that the box our test was run on would be wiped after testing

2. We only tested o3 against the semi-private set. We didn't test it with the private eval.

2025-03-24 11:47

Commented: "Arc-AGI-2 and ARC Prize 2025"

#4 (private test set) doesn't get used for any public model testing. It is only used on the Kaggle leaderboard where no internet access is allowed.

2025-03-24 11:46

Commented: "Arc-AGI-2 and ARC Prize 2025"

Good question! This was one of the main motivations of our "Paper Prize" track. We wanted to reward conceptual progress vs leaderboard chasing. In fact, when we increased the prizes mid year we awarded more money towards the paper track vs top score.

We had 40 papers submitted last year and 8 were awarded prizes. [1]

On of the main teams, MindsAI, just published their paper on their novel test time fine tuning approach. [2]

Jan/Daniel (1st place winners last year) talk all about their progress and journey building out here [3]. Stories like theirs help push the field forward.

[1] https://arcprize.org/blog/arc-prize-2024-winners-technical-r...

[2] https://github.com/MohamedOsman1998/deep-learning-for-arc/bl...

[3] https://www.youtube.com/watch?v=mTX_sAq--zY

2025-03-24 9:00

Commented: "Arc-AGI-2 and ARC Prize 2025"

We have a few sets:

1. Public Train - 1,000 tasks that are public 2. Public Eval - 120 tasks that are public

So for those two we don't have protections.

3. Semi Private Eval - 120 tasks that are exposed to 3rd parties. We sign data agreements where we can, but we understand this is exposed and not 100% secure. It's a risk we are open to in order to keep testing velocity. In theory it is very difficulty to secure this 100%. The cost to create a new semi-private test set is lower than the effort needed to secure it 100%.

4. Private Eval - Only on Kaggle, not exposed to any 3rd parties at all. Very few people have access to this. Our trust vectors are with Kaggle and the internal team only.

Hacker News

gkamradt

307

2015-07-02

Recent Activity

Commented: "OpenAI o3-pro"

Commented: "Arc-AGI-2 and ARC Prize 2025"

Commented: "Arc-AGI-2 and ARC Prize 2025"

Commented: "Arc-AGI-2 and ARC Prize 2025"

Commented: "Arc-AGI-2 and ARC Prize 2025"

HackerNews