Can you include GPT 5.5 non-pro (extra high thinking I guess) in your comparison? GPT Pro is the "I am willing to torch cash for a sooometimes slighty better result" option, not the one people are actually expected to use daily. That's probably part of the reason it's not in Codex
OOM on CUDA GPUs is relatively graceful (the process crashes). However, on macOS if torch MPS tries to allocate too much memory, the whole kernel will simply lock up and the only option is to reboot the computer. I have no idea why Apple doesn’t reserve memory for stuff like the OOM/kernel watchdog, but it seems they either don’t or there is a bug.
Love me some JSD. Here is a problem most people don't consider with generative modeling (e.g., AI text, image, music, video models): basically all standard pre-training algorithms for generative models (i.e., cross entropy, basically all diffusion/flow formulations) are closer to a Forward KL divergence. In other words, given limited capacity the model will try to stretch itself to cover every mode. This gives you a jack of all trades (lots of knowledge and diversity), but a master of none (you get blurry images and text filled with nonsense).
The real magic in generative modeling comes from the post training process that comes after, which usually (e.g., RLHF) approximates Reverse KL (given limited capacity, try to perfectly cover what you can, but it's fine to drop the rest entirely). This gives amazing results, but is also the cause of AI oddities like the "AI Image Pixar Look", many of the verbal tics of LLMs, and all AI music using the same small set of voices. Jensen-Shannon Divergence sits right in the middle of Forward and Reverse KL and is what many GANs are claimed to approximate. Ideally, it is a better trade-off between diversity and fidelity.
Seems pretty clear, Claude and Codex were getting a lot of free publicity by instructing their models to do the same and MS wanted similar results. However, a bug caused this to be applied to all commits instead of all Copilot-influenced commits.
I bumped from $20 -> $100 today but the Codex CLI lacking code rewind and "you can change files but ask me every time" mode from Claude Code is quite annoying. Sometimes I want to code, not vibe code lol.
I train music generation models. They are very trivial to detect. In fact, detecting them then training them to evade detection by the detection model is a big part of training them! But the detectors win instantly without some hardcore regularization. Simply turn that off and you've instantly got a perfect classifier.
This isn't like text classification, the signal many orders of magnitude higher bitrate and so many more corners need to be cut. It's likely going to be nearly impossible or at least not remotely worth it to generate an audio signal that is truly undetectable in the foreseeable future.
You are right, the output of a model that generates music directly is, for now, easy to categorize as AI.
What this big flux of AI generated music online isn't really that. It'a a tiny bit autogenerated stuff and a whole lot of automatically remixed stuff. The reason it can not be easily classified as AI is because quite a bit of human produced music is also that, and you'd just shut out real users.
Today. Trying to detect AI is like extracting water from puddles in a lake that is quickly drying up. What is the point in the short term if it's impractical in the long term? It will catch some low-hanging fruit in the best case, and will find false positives in the worst.
My point is you should consider creating truly undetectable audio end to end with AI to be effectively impossible for the foreseeable future (i.e., I would bet money it is still trivially detectable five years from now). It won't be detectable to humans, though, only models.
in the broad strokes of ai generated, i wouldnt be so sure.
if the ai picked a bunch of samples and combined them together and mastered using an mcp to a DAW, how is that particularly distinguishable vs a person doing the same thing badly?
i can see how the llm generation pictures of spectrograms is essy to spot, but much less so with tool following.
even worse of you using a vla to have it actually play the guitar and use the recording as a sample.
theres some time and setup to make it happen sure, but somebody put that all in a studio and expose an mcp
What's your reasoning effort set to? Max now uses way more tokens and isn't suggested for most usecases. Even the new default (xhigh) uses more than the old default (medium).
That's what I'm wondering. Is it people are defaulting to xhigh now and that's why it feels like it's consuming a lot more tokens? If people manually set it to medium, would it be comparable?
reply