Arxiv Dives

Arxiv Dive Manifesto

Greg Schoeninger
Nov 5, 2023

Every Friday the team at Oxen.ai gets together and goes over research papers, blog posts, or books that help us stay up to date with the latest in Machine Learning and AI. We call it Arxiv Dives because https://arxiv.org/ is a great resource for the latest research in the field.

In September of 2023, we decided to make it public so that anyone can join. We’ve had amazing minds from hundreds of companies like Amazon, DoorDash, Meta, Google, and Tesla join the conversation, but I thought it would be good to give a little background on why we do it - and why we would love you to join.

Anyone Can Built It

One of our core principles at Oxen.ai is that anyone with a keyboard and an internet connection can compete. Whether it is training state-of-the-art machine learning models, or building a scalable, efficient, robust, data version control system. Do not be intimidated by large companies or large budgets. Innovation comes from constraints.

Why Do We Believe This?

Back in 2013 when I was a junior engineer just getting started in the field, I was working at a company called AlchemyAPI. We were building APIs that leveraged deep learning to democratize Natural Language Processing (NLP) for the masses.

The holy grail of NLP at the time was IBM Watson, which had gone on Jeopardy and won. IBM had published research papers describing how all the complicated subsystems worked. They poured millions of dollars into the effort, and made it seem like only they had the prowess to do it.

The CEO of the company at the time disagreed. He dumped all research papers on our desks, and asked we if we could reproduce their results. Unknowing of the adventure we were about to embark on, we read paper after paper. Bouncing back and forth between the NLP teams, the backend infra teams, and the research papers, we pieced together subsystem by subsystem. Collectively, as we built more and more, we hit benchmark after benchmark.

Fast forward a year or two, we got approached by IBM Watson itself. All the hard work from this small team culminated in an acquisition by IBM as they saw the value of what we had built. We got the opportunity to merge our deep learning techniques with the core Watson platform, augmenting their core services with ours.

As a small startup, we were able to not only compete with the giant at the time, but eventually join forces with them.

The Arxiv Dive Edge

This experience affirmed that with dedication, any engineer, junior to advanced researcher, can contribute meaningfully to state-of-the-art innovation.

With AI moving faster than it ever has, it is worth it to take a second to slow down, read the research papers, and integrate the key takeaways into our own products. It's the slight variation on existing ideas, that can sometimes be the unlock to brand new innovation.

Arxiv Dives are how we stay on the bleeding edge.

I truly believe it is not only the job of advanced researchers at large companies, but the job of anyone with access to a keyboard and the internet to take these ideas and make them their own.

Come join us every Friday to go over everything from the fundamentals to the latest in AI

Arxiv Dives with Oxen.ai · Luma
Hey Nerd, join the Herd!... for a little book/paper review. Make sure to also join our Discord here (https://discord.gg/s3tBEn7Ptg) to share recommendations for future reads and more…


You can find all our past dives on YouTube

Oxen
Each week we dive deep into a topic in machine learning or general artificial intelligence research. The sessions are live with a group of smart Oxen every Friday. Join the discussion: https://lu.ma/oxenbookclub

and in blog format

Arxiv Dives - Oxen.ai
Each week we dive deep into a topic in machine learning, data management, or general artificial intelligence research. These are notes from a live reading group we do every Friday. Captured for future reference.

Best and Moo,

~ The Herd at Oxen.ai

Who is Oxen.ai?

Oxen.ai is an open source project aimed at solving some of the challenges with iterating on and curating machine learning datasets. At its core Oxen is a lightning fast data version control tool optimized for large unstructured datasets. We are currently working on collaboration workflows to enable the high quality, curated public and private data repositories to advance the field of AI, while keeping all the data accessible and auditable.

If you would like to learn more, star us on GitHub or head to Oxen.ai and create an account.