I spent three weeks working closely with Claude 3.7 Sonnet and Claude Code, and they changed how I approach both development and strategy work. This is what I learned about using them in practice, and how I now balance experimentation against production readiness.
When my AI assistant started to actually work with me
I have used AI assistants since the early releases, but Claude 3.7 Sonnet was the first time I felt like I was genuinely collaborating with a system that could plan, refine, then execute, rather than just respond.
Last week I was integrating a modern tiered customer service web app into an identity infrastructure platform. There were multiple service endpoints and API connections that all needed to be authenticated and authorised properly. With previous AI systems, I would have broken this down into separate questions and carefully shepherded the model through each step of the integration.
Instead, I gave Claude 3.7 the entire problem and turned on its extended thinking mode. The response surprised me. Rather than a quick, surface-level answer, it walked through several approaches, weighed the tradeoffs, and proposed a solution that handled edge cases I had not even thought of.
Watching the dual-mode system work
The thing I value most about Claude 3.7 Sonnet is that it runs in two distinct modes. Standard mode handles quick questions and routine tasks. Extended thinking mode is the one I reach for when a problem needs deeper analysis.
That flexibility reshaped my workflow. For routine coding questions or data formatting, I stay in standard mode and get an immediate answer. For architectural decisions or strategy questions, I switch to extended thinking and let the system explore the problem more thoroughly.
Turning on extended thinking is simple. I found that starting a prompt with phrases like “Think about…” naturally pushes Claude Code into deeper reasoning. Ask it to “Think about the security implications of this authentication approach” and you get an analysis that considers several attack vectors and their mitigations. The phrasing signals that I want depth, not brevity.
Asking for structured plans works the same way. “Create a plan that can be worked through to…” produces a step-by-step roadmap. When I asked Claude to “Create a plan that can be worked through to refactor the notification system,” it gave me a sequential approach with clear dependencies and validation checkpoints at each stage. Those plans become frameworks I can work through methodically with the model’s help.
You can also specify how many tokens (up to 128K) the model may spend on reasoning, which lets you trade speed and cost against answer quality depending on how much the task matters. To be honest, I rarely set this consciously.
The week AI changed my development process
Three months ago my development process looked very different. I spent hours navigating codebases, writing tests, and managing GitHub operations. Necessary work, but it ate into the time I had for creative problem-solving.
My first day with Claude Code shifted that. I hit a cryptic browser console error during testing, normally the start of a long debugging session across the stack. I pasted the error message into Claude Code, and the next few minutes genuinely accelerated me. Across a handful of prompts it:
- Identified the error and traced it through both frontend and backend components
- Located the root cause in the data validation layer, and noted how it surfaced in the UI
- Mapped every touchpoint across the services that would need addressing
- Proposed a fix that handled edge cases in both client and server code
- Generated tests to verify the solution across all the affected systems
What used to take half a day of digging through several points in the stack was done in under an hour for less than $3. More to the point, I could focus on the architectural implications instead of getting lost in the mechanics of tracing the error from one system to another.
Keeping architectural control and vision
Claude Code speeds up development, but I quickly saw how much oversight the generated code and architectural decisions still need. It is like working with a talented but junior developer: I had to set guardrails to keep quality consistent and aligned with where I wanted the project to go.
I learned to open each significant piece of work by stating the architectural principles and design constraints up front. When I was refactoring a notification system, for example, I spelled out a clean separation between business logic and delivery mechanisms, the move towards an event-driven architecture, the performance constraints specific to that environment, the existing patterns I wanted maintained and reused for consistency, and the need to keep things simple and stick to the essentials.
That upfront investment paid off by avoiding a common trap: solutions that are technically sound but far too complex. Left unconstrained, the AI sometimes reached for elaborate approaches when something simpler would do. Capability has to be balanced against what you can actually maintain.
Breaking down complex tasks
One of my more useful discoveries was breaking larger tasks into smaller, well-defined steps. Instead of asking Claude Code to “implement a new authentication system,” I got better results by first asking it to outline the components as a step-by-step plan, then having it implement the core authentication logic, then the user management interfaces, and finally the integration with the existing permission system.
This mirrors how good project management works: define the scope clearly, sequence the work logically, and make the transitions between phases explicit. It also gave me natural checkpoints to confirm the work matched my architectural intent and to correct course where it did not.
Checkpointing progress through code commits turned out to be essential. Each commit insulated a step so that if an approach became too convoluted or got stuck, I could roll back without losing everything. Building, evaluating, refining, and committing in that loop let me try different approaches without putting the whole project at risk. It also let me see, at low cost, where the AI excelled and where it needed more guidance.
The parallel to running a complex project got clearer as I went. I was acting as both architect and technical lead, supplying context and direction while the AI handled the implementation details.
Project roadmaps for keeping context
Working on larger initiatives exposed another problem: holding context across multiple sessions. I started writing a project roadmap at the outset of any significant work. Each one outlined the problem space and the business objectives, broke the overall task into discrete sequenced components, documented the key design decisions and constraints, and linked to the relevant existing code and documentation.
I saved these roadmaps and pulled them up at the start of each new Claude Code session. That kept things continuous and worked around the context limits these systems have.
The roadmap acted as a shared memory between sessions, the same way good documentation keeps a project coherent across different work periods. Writing them as markdown documents in a Todo “[ ] task” format let me track completions across sessions simply by marking them “[X]”.
The learning curve was not always smooth sailing
It has not all been smooth. During the first week I overestimated Claude 3.7 in some areas and underestimated it in others. What I learned was that it excels at software engineering tasks that involve tracing through complex codebases, and it is surprisingly good at understanding business logic embedded in code. It does sometimes need guidance with project-specific frameworks, and its reasoning is most reliable when I give it clear constraints and evaluation criteria.
The most useful lesson was knowing when to use each capability. Extended thinking mode is worth a lot for architecture reviews but is overkill for routine code formatting.
Balancing experimentation and production use
Exploring these tools dropped me straight into the classic innovation dilemma: how fast should I fold them into my production workflow?
My approach changed after a few false starts. I built a simple framework to sort AI capabilities into three buckets. Experimental ones are promising but unproven, so I keep them in a sandbox. Augmentation ones are reliable enough to assist my work but still need human verification. Production ones are tested thoroughly enough to run with minimal oversight.
For each capability I set clear metrics for moving it between buckets. That let me capture value from the mature capabilities while still experimenting with the emerging ones.
Building a continuous evaluation system
Because the capabilities were changing so fast, I wanted a regular evaluation cadence. I pick three representative tasks from my actual workload, test new model versions against them, document the specific improvements and limitations, and then update my capability map and implementation guidelines.
The discipline guards against both premature excitement and late adoption. When Claude 3.7 Sonnet shipped, I quickly saw that its reasoning had crossed my threshold for code review assistance, while other capabilities stayed in the experimental bucket.
The role of human expertise
After all of this, I have come to see AI coding assistants as amplifiers of human judgement rather than replacements for it. The pattern that works best looks like a senior developer guiding junior ones. The senior sets the technical vision and architectural boundaries, decomposes complex problems into manageable pieces, reviews the output against good practice and business needs, supplies the domain context that never made it into documentation, and makes the strategic calls about technical tradeoffs.
That split plays to each side’s strengths. The AI is fast at implementation, recalls syntax details, and stays consistent across repetitive work. The human engineer brings domain expertise, business context, and judgement about how much complexity is appropriate.
Practical lessons for implementation
Trial and error left me with a handful of guidelines for anyone weighing up these tools. Start with bounded problems where the success criteria are objective and measurable. Set architectural guardrails early by stating your design principles and constraints, which keeps the AI from over-engineering. Break work into discrete steps, structuring larger tasks as sequences of smaller components with clear handoffs. Keep project roadmaps so context survives across sessions. Hold on to architectural ownership and treat the AI as implementing your vision rather than setting it. Build feedback loops that capture where the assistance helps most and least. And evaluate against your own context: generic benchmarks help, but testing capabilities against your specific business needs matters more.
Looking ahead: the strategic view
What excites me is not just what these tools do today but how fast they keep improving. Tasks that were out of reach six months ago are routine now, and the gap between “experimental technology” and “business necessity” keeps closing.
In practice this has meant running two tracks at once: putting mature capabilities into production while keeping an experimental sandbox for the emerging ones. That has let me capture value now while staying positioned for what comes next.
The developers who get the most out of this will not be the ones who adopt everything at once, nor the ones who wait for perfect maturity. They will be the ones who build evaluation frameworks suited to their own needs, so they can deploy the right capability at the right time while keeping clear human oversight of the architecture and the strategic direction.
In my experience, finding that balance, between automation and oversight, and between adopting too early and adopting too late, is the central strategic challenge facing technology leaders right now.