Researchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAI, Google, Meta, and others — demonstrated ...
AI models are still easy targets for manipulation and attacks, especially if you ask them nicely. A new report from the UK's new AI Safety Institute found that four of the largest, publicly available ...
OpenAI just released the full version of its new o1 model -- and it's dangerously committed to lying. Apollo Research tested six frontier models for "in-context scheming" -- a model's ability to take ...