News

Anthropic's new Claude Opus 4 and Sonnet 4 AI models deliver state-of-the-art performance in coding and agentic workflows.
LMArena, a popular benchmark for large language models, has been accused of giving preferential treatment to AIs made by big ...
They called the benchmark Elephant, for Evaluation of LLMs as Excessive SycoPHANTs, and found that every large language model (LLM) has a certain level of sycophany. By understanding how sycophantic ...