Oxen.ai Blog

Welcome to the Oxen.ai blog 🐂

The team at Oxen.ai is dedicated to helping AI practictioners go from research to production. To help enable this, we host a research paper club on Fridays called ArXiv Dives, where we go over state of the art research and how you can apply it to your own work.

Take a look at our Arxiv Dives, Practical ML Dives as well as a treasure trove of content on how to go from raw datasets to production ready AI/ML systems. We cover everything from prompt engineering, fine-tuning, computer vision, natural language understanding, generative ai, data engineering, to best practices when versioning your data. So, dive in and explore – we're excited to share our journey and learnings with you 🚀

Why GRPO is Important and How it Works
Why GRPO is Important and How it Works

Last week on Arxiv Dives we dug into research behind DeepSeek-R1, and uncovered that one of the techniques they use in the their training pipeline is called Group Relative Policy O...

Greg Schoeninger
Greg Schoeninger
2/12/2025
12 min read
🧠 GRPO VRAM Requirements For the GPU Poor
🧠 GRPO VRAM Requirements For the GPU Poor

Since the release of DeepSeek-R1, Group Relative Policy Optimization (GRPO) has become the talk of the town for Reinforcement Learning in Large Language Models due to its effective...

Greg Schoeninger
Greg Schoeninger
2/6/2025
- Practical ML
9 min read
How DeepSeek R1, GRPO, and Previous DeepSeek Models Work
How DeepSeek R1, GRPO, and Previous DeepSeek Models Work

In January 2025, DeepSeek took a shot directly at OpenAI by releasing a suite of models that “Rival OpenAI’s o1.” From their website: In the spirit of Arxiv Dives we are going to...

Greg Schoeninger
Greg Schoeninger
2/4/2025
15 min read
No Hype DeepSeek-R1 Reading List
No Hype DeepSeek-R1 Reading List

DeepSeek-R1 is a big step forward in the open model ecosystem for AI with their latest model competing with OpenAI's o1 on a variety of metrics. There is a lot of hype, and a lot o...

Greg Schoeninger
Greg Schoeninger
1/30/2025
- Arxiv Dives
27 min read
Oxen v0.25.0 Migration
Oxen v0.25.0 Migration

Today we released oxen v0.25.0 🎉 which comes with a few performance optimizations, including how we traverse the Merkle Tree to find files and folders. The main improvement is how...

Greg Schoeninger
Greg Schoeninger
1/28/2025
3 min read
🌲 Merkle Tree VNodes
🌲 Merkle Tree VNodes

In this post we peel back some of the layers of Oxen.ai’s Merkle Tree and show how we make it suitable for projects with large directories. If you are unfamiliar with Merkle Trees ...

Greg Schoeninger
Greg Schoeninger
1/27/2025
8 min read
🌲 Merkle Tree 101
🌲 Merkle Tree 101

Intro Merkle Trees are important data structures for ensuring integrity, deduplication, and verification of data at scale. They are used heavily in tools such as Git, Bitcoin, IPF...

Greg Schoeninger
Greg Schoeninger
1/27/2025
9 min read
arXiv Dive: RAGAS - Retrieval Augmented Generation Assessment
arXiv Dive: RAGAS - Retrieval Augmented Generation Assessment

RAGAS is an evaluation framework for Retrieval Augmented Generation (RAG). A paper released by Exploding Gradients, AMPLYFI, and CardiffNLP. RAGAS gives us a suite of metrics that ...

Greg Schoeninger
Greg Schoeninger
1/21/2025
- Arxiv Dives
13 min read
The Best AI Data Version Control Tools [2025]
The Best AI Data Version Control Tools [2025]

Data is often seen as static. It's common to just dump your data into S3 buckets in tarballs or upload to Hugging Face and leave it at that. Yet nowadays, data needs to evolve and ...

Greg Schoeninger
Greg Schoeninger
12/27/2024
6 min read
OpenCoder: The OPEN Cookbook For Top-Tier Code LLMs
OpenCoder: The OPEN Cookbook For Top-Tier Code LLMs

Welcome to the last arXiv Dive of 2024! Every other week we have been diving into interesting research papers in AI/ML. In this blog we’ll be diving into Open Coder, a paper and co...

Greg Schoeninger
Greg Schoeninger
12/24/2024
- Arxiv Dives
14 min read