The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers

November 26, 2024

2

Each time a brand new AI mannequin is launched, it’s sometimes touted as acing its efficiency towards a sequence of benchmarks. OpenAI’s GPT-4o, for instance, was launched in Might with a compilation of outcomes that confirmed its efficiency topping each different AI firm’s newest mannequin in a number of assessments.

The issue is that these benchmarks are poorly designed, the outcomes onerous to duplicate, and the metrics they use are incessantly arbitrary, in keeping with new analysis. That issues as a result of AI fashions’ scores towards these benchmarks decide the extent of scrutiny they obtain.

AI firms incessantly cite benchmarks as testomony to a brand new mannequin’s success, and people benchmarks already kind a part of some governments’ plans for regulating AI. However proper now, they may not be ok to make use of that manner—and researchers have some concepts for the way they need to be improved.

—Scott J Mulligan

We have to begin wrestling with the ethics of AI brokers

Generative AI fashions have develop into remarkably good at conversing with us, and creating pictures, movies, and music for us, however they’re not all that good at doing issues for us.

AI brokers promise to alter that. Final week researchers revealed a brand new paper explaining how they skilled simulation brokers to duplicate 1,000 individuals’s personalities with gorgeous accuracy.

AI fashions that mimic you may exit and act in your behalf within the close to future. If such instruments develop into low-cost and straightforward to construct, it is going to increase a lot of new moral issues, however two specifically stand out. Learn the total story.

—James O’Donnell

Previous articleWhat Are the Finest Gig Apps? Make Cash with These High Choices

Next articlePaul Wesley To Star In Ukrainian Battle Thriller Sequence ‘Unstated’

The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers

The Crypto Business Is Serving to Trump Choose SEC Chair

Might tweaks to the tax code result in extra marriages — and extra youngsters?

Thomson Reuters’ CoCounsel redefines authorized AI with OpenAI’s o1-mini mannequin

LEAVE A REPLY Cancel reply

Most Popular

The Kendrick Lamar/Drake Beef, Defined

What to Know About John Andrew and Burke Ramsey

United Nomads Group Acquires Caravana from Cho Ventures

From Service to Safety: My Path to Empowerment at Cisco

Recent Comments

ABOUT US

POPULAR POSTS

The Kendrick Lamar/Drake Beef, Defined

What to Know About John Andrew and Burke Ramsey

United Nomads Group Acquires Caravana from Cho Ventures

POPULAR CATEGORY