Everyone at Hiro cares deeply about building a great product - one that solves real problems while maintaining an extremely high quality bar. We take inspiration from other software products like Linear (the first issue tracking product that has actually given me joy) where every interaction feels intentional and polished. This obsession with quality isn’t just about aesthetics - we want Hiro to be something that people can rely upon, software that doesn’t break in unexpected ways. In an age of AI driven development, this has become more achievable and paradoxically more challenging.
A constant tension against the high quality bar is a desire to move fast by cutting corners. This is often a false choice - moving quickly this way feels fast, but is often followed by weeks of “fixes” to make the thing really work. This has not stopped teams, including ours, from falling into this trap - the temptation to push something out before it is properly engineered or ready.
Enter LLMs: Tools like Claude, ChatGPT, Cursor and VSCode/Copilot have transformed the software development workflow as we know. 2021 (and before) feel like the dark ages of coding. As a company that is on the cutting edge of LLMs and software development, we want to use the latest, greatest tools to their fullest extent. But this represents a fundamental tension: LLM coding tools today can create a lot of slop, which isn’t the only problem.
Amongst some of the issues we have encountered first hand:
- LLMs tend to be verbose and want to write oodles of code that looks right but can hide subtle bugs. This is especially true in “agent-mode” where the LLM is doing multi-file edits.
- Misdiagnoses of bugs can result in a large volume of code and entire rewrites when a surgical fix, or a completely different approach, is appropriate.
- LLMs prefer to follow commonly accepted styles (based on their training set) rather than the styles and frameworks of your codebase. For example, we use tailwind, but have specific components like that have our own styles. LLMs would prefer to write a from scratch.
- Even when the code is reasonable, the overall architecture can be inelegant. LLM code can be repetitive and miss similar code in other files.
The dichotomy of obvious LLM coding superpowers combined with the aforementioned issues can create problems that must be addressed through a combination of tooling and culture. Through discussions with other engineers and leaders at AI startups, we know that we are not alone here. Everyone faces this question of “how much AI coding is okay”. Tab completions are obviously fine, but what about letting agents run amok? What about vibe coding entire features?
Our goals are to move quickly while maintaining excellent software quality. To achieve these, we want to lean into agents and llm coding tools as much as possible. Everyone at Hiro can, and is encouraged to write code. Ethan (my co-founder and co-CEO) regularly writes PRs for minor changes and will even ship larger features. Contrary to traditional thinking, we want everyone to write code. That’s the whole point - with LLMs the barrier is significantly lower, we need to lean into it.
To enable this degree of code democracy while also allowing us to sleep at night, we are very rigorous about how we structure, write and maintain our code. Here is a list of some of things that have worked for us:
- You own your vibed code. Vibe coding makes writing code feel a lot easier but reviewing vibed code is painful. Throwing some vibed stuff over the fence to another engineer to review is lazy and rude. We treat vibed code as our own and carefully review it before sending it over for a formal review. There are cases where we’re not sure - for example, I’m no terraform expert and often have to rely on the LLM to generate the right things. In these cases, we make it really clear in our PR which bits were vibed, and need extra eyes and help.
- Monorepo forever. This can be a religious lightning rod, but we have taken a very firm stance on putting all our code in one repo. This makes it really easy for the LLM to grep and find files across frontend and backend and make changes that span different layers in the stack. Some of our frontend code is shared between web and mobile (react native) and having things split into different repos would make life difficult.
- Explicit STYLEGUIDE.md. Our styleguide is a file that is checked into our repo and referenced via CLAUDE.md (see below). Any updates to the styleguide necessarily warrant a PR and a discussion.
- Leaning into type systems. Type systems are back in vogue (thanks to Typescript and Rust) and we couldn’t be happier. Our codebase is primarily python/django and Typescript and we lean heavily into enforcing types in both. As an example of how maniacal we’ve gotten, we turned on the no-explicity-any eslint rule, effectively disallowing people to get around the type checker by casting to any. On the python side, we treat all mypy issues as errors that must be fixed before merging.
- Tons of (fast) unit tests. Unit testing used to be a huge chore until LLMs essentially took that pain away. In 2025, there is pretty much no excuse for not having tons of unit tests to check all aspects of your code. We don’t mandate that all new lines of code need to be unit tested but culturally, almost every new PR is. Frontend components are one place where we don’t unit test things; relying on other tools like Storybook instead.
The net result of leaning extremely heavily into type checkers and unit tests, is that the LLM can quickly loop on itself to fix these problem, resulting in eliminating a whole slew of issues with vibe coded output. 6. Create abstractions and tell the LLM about them. When using LLMs to vibe frontend code, we’d run into problems where it would create correct react/ts code, but would be building from Figma to code resulting in lots of component duplication and subtle differences. After all, it’s really easy to change a px-4 to a px-5 in that one instance without caring about everything else. We changed our codebase to create abstractions, like our component library, based on our design system. Then, we add something like look in
@/components/ for the right components
to our prompt to get the LLM to do the right thing. 7. Share your CLAUDE.md. We use claude-code pretty heavily and have shared CLAUDE.md files that we check-in and update periodically. How Anthropic teams use Claude code is a pretty great whitepaper with tips on using claude code together as a team.
I’d love to hear your thoughts on effective ways you’re using LLMs to develop code. I’m particularly interested in cultural and tooling changes you’ve made to accommodate and accelerate LLM adoption.
If you’re an engineer and like nerding out about this stuff, and care about personal finance, we’d love to work with you. We are a small, fully remote team (another post on that coming up), obsessed with building the future of personal finance. We are looking for senior full stack software engineers proficient in at least Python or Typescript (ideally both), who can learn quickly, are very curious and interested in AI and LLMs, and have a passion for personal finance. If that sounds like you, please reach out. Note: we cannot sponsor visas at this time, and are only looking for engineers that live in the Western United States.