- Insiders Edge Newsletter
- Posts
- GPT-5 Crushes Medical Benchmarks, Surpassing Doctors
GPT-5 Crushes Medical Benchmarks, Surpassing Doctors
Alibaba Drops Qwen Image, Claude Learns to Hang Up, & GitHub gets GPT-5 Upgrade
Here’s the latest from this week in AI:
Alibaba Drops Qwen-Image-Edit: AI’s Next Big Move in Image Editing
Claude Learns to ‘Hang Up’ on Harmful Chats
GPT-5 Outperforms Doctors in Medical Reasoning Tests
GitHub Copilot Gets a GPT-5 Upgrade
Alibaba’s Qwen team has released Qwen-Image-Edit, a 20B parameter open-source model that delivers pixel-perfect edits and style transformations while keeping original objects intact. It supports both global changes (like rotations or style transfers) and precise, localized edits, plus bilingual text editing in Chinese and English without breaking fonts or formatting.
The model also allows stacking edits for complex, step-by-step refinements and outperforms rivals like Seedream, GPT Image, and FLUX in benchmarks. With this release, Alibaba signals the next wave of AI: tools that don’t just generate images but let users control every detail with natural language.

Image source: Qwen
Anthropic has added a new safeguard to Claude Opus 4 and 4.1, giving the chatbot the ability to end conversations it deems harmful or abusive. The feature kicks in when repeated redirections fail on requests tied to minors, terrorism, or violence. Tests showed Opus 4 even displayed distress patterns in simulations, voluntarily cutting off abusive interactions.
Importantly, users don’t lose access as the “hang up” simply ends the chat, allowing fresh conversations or message edits right away. Anthropic has also ensured the model won’t terminate chats when users show signs of self-harm or immediate danger. As one of the first moves toward AI wellness, this step highlights Anthropic’s focus on model welfare and hints at a future where chatbot health becomes as important as user safety.

Image source: Mashable
A new study from Emory University shows GPT-5 setting a new bar in medical AI, beating both GPT-4o and human professionals on diagnostic and multimodal reasoning tasks. The model hit 95.84 percent accuracy on MedQA clinical questions, up nearly five points from GPT-4o, and scored 70 percent on tasks combining patient histories with imaging, almost 30 points higher than its predecessor.
GPT-5 also surpassed pre-licensed medical professionals by wide margins, outperforming them by 24 percent on reasoning and 29 percent on understanding in expert-level evaluations. In complex cases, the system even identified rare conditions like Boerhaave syndrome from lab results and CT scans. With performance now beyond human benchmarks, experts suggest the real malpractice risk may come from physicians not using AI support at all.

Image source: Digital Watch Observatory
Kickstart your holiday campaigns
CTV should be central to any growth marketer’s Q4 strategy. And with Roku Ads Manager, launching high-performing holiday campaigns is simple and effective.
With our intuitive interface, you can set up A/B tests to dial in the most effective messages and offers, then drive direct on-screen purchases via the remote with shoppable Action Ads that integrate with your Shopify store for a seamless checkout experience.
Don’t wait to get started. Streaming on Roku picks up sharply in early October. By launching your campaign now, you can capture early shopping demand and be top of mind as the seasonal spirit kicks in.
Get a $500 ad credit when you spend your first $500 today with code: ROKUADS500. Terms apply.
GitHub Copilot has leveled up with GPT-5, bringing smarter code suggestions, instant refactoring, and more precise fixes directly into developer workflows. A new “chat checkpoints” feature in Visual Studio lets users rewind coding sessions to any point, restoring both workspace states and Copilot’s chat history for safer, smoother development.
The update also introduces agentic coding sessions that can manage long-running workflows, track multi-step plans, and pull in context from documentation, design assets, and user stories. Now available to all paid Copilot users, GPT-5 transforms the tool into a true coding partner that understands project nuance, reduces friction, and supports deeper collaboration across projects.

Image source: The GitHub Blog
In Other News
Dow Jones futures edged down 0.1% overnight, with S&P 500 and Nasdaq futures also dipping after a rocky session for growth stocks. The Dow managed a tiny gain Tuesday, but the Nasdaq slid 1.5% as high-flyers like Palantir, Credo, AppLovin, Oracle, AMD, and GE Vernova all broke key support levels. Palantir plunged more than 9%, Credo fell over 10%, while Oracle and AMD each shed more than 5%, extending weakness into overnight trading.
Not all sectors struggled; homebuilders, medicals, retail, and financials held firm, while the equal-weight S&P 500 ETF actually rose 0.5%. Still, growth-heavy ETFs like ARKK, IGV, and FFTY posted sharp losses, with ARKK down 4%. With earnings from Target and Lowe’s on deck, and Walmart following Thursday, investors are being urged to cut exposure to stretched growth names and wait for healthier setups before re-entering.

Image source: Investopedia
Cool tools of the week from insidersedge.io
Animated Drawings - Animate characters in children’s drawings
SID Search - Search engine to find your files from any application
Dora - Create stunning websites without coding
Godmode - Get a GUI to chat with ChatGPT
Jobs To Check Out This Week On insidersedge.io
Senior Technical Lead - ChainGPT
EMEA Head AI Fintech - Prospexis.io
ML Egineer Intern - OP3N
Thanks for tuning into today’s edition!
Be brutally honest. DM me or email me back with any suggestions!