The Day I Became an AI "Babysitter" (And Why I'm Not Ashamed of It)

How helping transform traditional QA test cases into AI-assisted ones taught me that the future of testing isn't about replacing humans—it's about humans and AI working together

📖 Reading time: ~12-15 minutes

LLM Ai Baby sitting

Last month, I was scrolling through Reddit when I came across a discussion that made me pause. Someone was talking about how "vibe coding" was turning senior developers into "AI babysitters." The term stung a bit, but it also made me laugh—because that's exactly what I'd been doing for the past few months.

You see, at Bizom where I work, I was asked to help transform our traditional QA test cases into AI-assisted ones. What started as a routine project became something that completely changed how I think about my role as a developer. I went from writing test cases to... well, babysitting AI.

But here's the thing—it was one of the most eye-opening experiences of my career. Let me tell you what it was really like.

The Project That Changed Everything

So here's what happened. At Bizom, we handle millions of transactions every day through our APIs. When I was asked to help build a new testing framework for the QA team, I thought it would be a straightforward project. A few weeks of coding, some test cases, and we'd be done.

The catch? This was an add-on project I'd be working on in my free time after work.

But then my mentor, Bhupendra Pandey, approached me with an interesting proposition: "I know you've been working on some AI projects here at Bizom. Why don't you try using AI for this? I've been hearing good things about AI-assisted development. Let's see if we can transform our traditional QA test cases into AI-assisted ones."

I was intrigued. I had indeed been working on some AI projects at Bizom, but I'd never used AI tools for a testing framework before. Plus, I had to admit something: I'm a developer, and this was my first time writing production-ready QA test cases. I was stepping into unfamiliar territory.

Still, I figured it was worth a try. I used multiple AI tools—Playwright MCP, Postman MCP, MySQL MCP—all integrated with Cursor as my IDE to help me build most of the framework. What I didn't realize was that this would turn me into something I never expected to be: an AI babysitter.

The project was supposed to take 8-12 weeks. With AI help, we've achieved 4x faster development, and we're still working on it in my spare time. But the experience so far has been unlike anything I'd experienced in my career.

My First Day as an AI Babysitter

I remember the first day clearly. My mentor Bhupendra had given me the green light to experiment with AI, building on my previous AI experience at Bizom. I opened up Cursor with all the MCP tools integrated and asked it to create a simple test for our authentication API. I was excited—this was going to be so much faster than writing test cases manually.

The AI generated something that looked perfect at first glance. Clean, well-structured test code. But when I ran it, it failed immediately. The AI had assumed our API returned JSON, but this particular endpoint returns XML. Our API actually supports both formats depending on the endpoint, which made things even more confusing for the AI.

That's when I realized what "babysitting" really meant.

The Daily Routine of an AI Babysitter

Every evening after work, I'd sit down with my laptop and start by explaining the same things to the AI:

"Good evening, AI. Remember, our API supports both XML and JSON depending on the endpoint. The authentication endpoint returns XML, but the orders endpoint returns JSON. And the authentication flow works like this..."

It was like having a brilliant intern who had amnesia every single day. I'd spend the first hour of each evening just getting the AI up to speed on what we were building.

That was until I discovered Cursor rules. Once I set up a .cursorrules file with all our project context, the AI already knew our standards and patterns. It was like giving the intern a comprehensive handbook they could always reference.

The Frustration Moments

There were days when I wanted to throw my laptop out the window. I'd ask the AI to create a test for order creation, and it would generate something that looked right but was completely wrong:

test('should create order', async () => {
  const response = await api.createOrder(orderData);
  expect(response.status).toBe(200);
});

Simple, right? But our API doesn't work that way. You need to authenticate first, then create a test product, then create the order. The AI just didn't understand the business logic.

But here's the thing—I was struggling too. As a developer, I knew how to write code, but I didn't really understand what makes a good test case. What should I be testing? How do I structure it? What edge cases should I consider? I was learning QA concepts on the fly while also trying to teach the AI.

I found myself constantly explaining things that seemed obvious to me but were completely foreign to the AI, while also figuring out what I should be testing in the first place.

The "Aha" Moments

But then there were moments when the AI would surprise me. It would suggest test patterns I hadn't thought of, or generate helper functions that were more elegant than what I would have written.

One day, I was struggling with error handling in our test cases, and the AI suggested a retry mechanism that was actually quite clever. It was like having a testing partner who sometimes had brilliant ideas, even if they didn't always understand the context.

The AI also taught me about QA best practices I didn't know. It would suggest testing scenarios I hadn't considered, like testing with invalid data, testing timeout scenarios, or testing concurrent requests. As a developer, I was focused on making things work, but the AI helped me think about how things could break.

The Emotional Rollercoaster

Working with AI tools was an emotional journey. Some days I felt like a genius—watching the AI generate complex test cases in seconds that would have taken me hours to write. Other days I felt like a frustrated teacher trying to explain basic testing concepts to a student who just wasn't getting it.

The Highs

There were moments when I felt like I was working with a superhuman testing partner. The AI would generate 20 test files in the time it took me to write one. It would suggest test patterns I'd never thought of, or create helper functions that were more elegant than anything I could have written.

I remember one particularly satisfying moment when the AI generated a complete authentication test flow that handled all the edge cases I'd been struggling with. It was like watching magic happen.

The Lows

But then there were the frustrating moments. Days when I spent more time explaining basic testing concepts than actually building test cases. Times when the AI would generate test code that looked perfect but was completely wrong for our use case.

I started to question whether I was actually being productive or just spinning my wheels. Was I really building something, or was I just teaching a machine to do what I could do myself?

The Breakthrough

About halfway through the project, something clicked. I realized that the AI wasn't replacing me—it was amplifying me. I was still the one making the architectural decisions, understanding the business requirements, and ensuring test quality. The AI was just a very capable tool.

My mentor Bhupendra was right to suggest this approach. It was like the difference between using a calculator and doing math by hand. The calculator doesn't replace your understanding of math—it just makes you faster at the parts that are tedious.

Why I Kept Going (And Why I'm Glad I Did)

Despite all the frustration, there were moments that made me realize this was something special. Let me share what kept me going:

The Speed Was Unreal

I remember the day I asked the AI to generate tests for our payment processing API. In the time it took me to grab a coffee, it had created 15 different test scenarios covering everything from successful payments to network failures.

What would have taken me weeks to build manually, I completed in days. The AI generated over 100 test cases, complete API service classes, and helper functions. We've achieved 4x faster development, and the results are still unfolding. I'm not exaggerating when I say this would have been a massive undertaking without AI assistance.

The Consistency Was Perfect

One thing that really impressed me was how consistent the AI was. Every test file followed the same structure, every error handling pattern was uniform, every naming convention was consistent. When you're writing test cases manually, it's easy to get inconsistent—especially over a long project. The AI doesn't have that problem.

The Edge Cases Were Eye-Opening

The AI generated tests for scenarios I might not have thought of:

API error scenarios - What happens when the API returns an error?
Response format variations - How do we handle different response formats?
Authentication edge cases - Token expiration, invalid credentials, etc.
Network failure scenarios - Timeouts, connection drops, retry logic
Data validation edge cases - Invalid inputs, boundary conditions
Concurrent request handling - Race conditions, load testing scenarios

It was like having a paranoid colleague who thinks of every possible failure mode.

I Learned Things I Never Expected

Working with the AI taught me things I wouldn't have discovered on my own:

Modern TypeScript features I hadn't used before
Playwright best practices I wasn't aware of
API testing methodologies I hadn't considered
Advanced error handling patterns for test automation
Database validation techniques for comprehensive testing

It was like pair programming with someone who's read every testing book ever written.

The Hard Truth: It Wasn't Always Easy

I'd be lying if I said working with AI was all smooth sailing. There were some really tough moments:

The Mental Exhaustion

Every single evening started with me explaining the same things to the AI: "Our API supports both XML and JSON responses depending on the endpoint. Authentication requires this specific header format. We can't use that endpoint in tests—it modifies production data."

It was mentally exhausting, especially after a full day of work. Like being a teacher who has to explain the same lesson to a student who forgets everything overnight. I'd find myself repeating the same explanations over and over, wondering if I was actually making progress or just going in circles.

The Quality Control Nightmare

Here's something I didn't expect: I spent more time reviewing AI-generated test code than I would have spent writing it myself. Every single line needed careful review:

"This test is missing error handling." "This API call doesn't include required headers." "This validation logic is wrong."

It was like having a junior developer who was brilliant but made rookie mistakes. I had to catch every single one.

The Identity Crisis

There were moments when I questioned everything. Was I still a developer, or was I just a glorified code reviewer? Had I become dependent on AI tools? What if the AI service went down—would I even remember how to write code the old way?

I started to feel like I was losing touch with the craft I'd spent years mastering. It was scary.

The Debugging Hell

When tests failed, debugging became a nightmare. I had to trace through test code I didn't write, understand logic that sometimes made no sense, and figure out where the AI had misunderstood the requirements.

It's like debugging test code written by someone who thinks completely differently than you do—because they literally do.

How I Changed (And How I Didn't)

The Evolution

Here's what surprised me most: my job didn't disappear, it evolved. Instead of writing every line of test code, I became more like a cricket captain leading the team. I was still making all the important decisions—I just wasn't batting every single ball myself.

I found myself:

Designing test systems that AI could work within
Crafting instructions that got good results
Reviewing and refining AI output
Breaking down complex testing problems into smaller pieces
Maintaining context across sessions

The skills that mattered most were the ones I'd always had:

Understanding how systems fit together
Knowing what the business actually needed
Maintaining quality and security standards
Communicating clearly
Thinking critically about solutions

The "Vibe Coding" Experience

The Reddit discussion mentioned "vibe coding"—quickly generating code with AI. That's exactly what this project felt like:

"Hey AI, create a test for order creation." "Now add error handling for invalid data." "Make sure it handles both JSON and XML responses depending on the endpoint." "Add retry logic for network failures."

I was generating test code faster than I could think about it. It was exhilarating and terrifying at the same time.

The Trade-offs

The speed was incredible, but there were real trade-offs:

I generated more test code than I could have written manually
But I had to review and refine everything constantly
Sometimes I lost track of what the test code was actually doing
I had to maintain a deeper understanding of the system than ever before

It was like having a super-fast typist who didn't understand what they were typing. I had to be the brain, they were just the hands.

The Hard-Won Lessons

After months of working with AI tools, here's what I wish I knew from the start:

Building for Handoff

One of the biggest challenges I didn't anticipate was building something that the QA team could actually use. I had to think about:

Documentation: How do I explain the AI-assisted workflow to QA engineers who might not be familiar with it?
Simplicity: How do I make the framework intuitive enough for them to adopt?
Training: What do they need to know to maintain and extend the test cases?
Handoff Process: How do I transition from being the "AI babysitter" to them taking over?

It's like building a house and then teaching someone else how to live in it. The framework had to be both powerful and accessible.

Start with the Big Picture

Before I even touched the AI, I should have spent more time setting up the project structure. The AI works much better when it has a clear framework to follow. I learned this the hard way when I had to constantly correct the AI's assumptions about how our test code should be organized.

Context is Everything

The AI doesn't know your business, your APIs, or your testing preferences. I learned to treat it like onboarding a new employee—the more context you provide, the better the results. I started creating a "context document" that I could reference in every AI session. It saved me hours of re-explaining things.

Expect to Iterate

Don't expect perfect test code on the first try. I had to review every piece of generated code, test and validate before committing, and constantly refine my prompts. The AI got better as I got better at working with it.

Stay Human

This was the most important lesson: never stop being the human in the loop. AI tools are powerful, but they're not perfect. I always had to review generated test code for security issues, make sure I understood the business logic, and maintain my test quality standards.

You're still the senior developer—the AI is just a very capable junior.

Document Everything

AI doesn't remember context between sessions, so I learned to document everything: architectural decisions, business rules, API behavior quirks, testing strategies. It was tedious, but it saved me from constantly re-explaining the same things.

Cursor Rules Were a Game Changer

One thing that made a huge difference was setting up Cursor rules. I created a .cursorrules file that contained all the project-specific context, coding standards, and patterns. This meant that every time I started a new session with the AI, it already knew:

Our API supports both XML and JSON depending on the endpoint
Our authentication flow requirements
The specific testing patterns we use
Our naming conventions and code structure

Instead of explaining the same things over and over, the AI would reference the rules and maintain consistency across sessions. It was like having a permanent context document that the AI could always access.

What I Realized About the Future

It's Not About Replacing Humans

The term "babysitting" makes it sound like we're just watching AI, but that's not what's happening. We're actually mentoring, guiding, and leading. Teaching AI to understand our business domain, directing it toward better solutions, ensuring quality, and using it to explore new possibilities.

It's more like being a senior developer who's mentoring a brilliant but inexperienced junior developer.

The Hybrid Future is Already Here

The future isn't about AI replacing humans or humans replacing AI—it's about humans and AI working together. Humans provide vision, context, and quality control. AI generates test code, suggests solutions, and handles routine tasks. Together, they create more sophisticated testing systems than either could alone.

I've seen this in action. The testing framework we built is more comprehensive and consistent than anything I could have created alone, but it also required constant human oversight to ensure quality and security.

The Skills That Matter

If you want to thrive in this new world, focus on developing these skills:

Learning to craft effective prompts
Designing test systems that AI can work within
Ensuring AI output meets your standards
Maintaining consistency across AI interactions
Evaluating and improving AI suggestions

The developers who master these skills will be the ones who succeed in the AI era.

Looking Back: Was It Worth It?

The Reddit discussion asked whether being an "AI babysitter" is worth it. After this experience, I can say with confidence: Absolutely, yes.

What We Actually Achieved

We've built a testing framework with remarkable results:

4x faster development than traditional approaches
Comprehensive test coverage across multiple API modules
Established patterns and practices for future development
Thousands of lines of well-structured, documented test code
Complete automation of test case generation and execution
Database validation integrated seamlessly
Allure reporting implemented without manual intervention

The project is still ongoing, and the full results are yet to be seen, but the initial progress has been remarkable. My mentor Bhupendra was impressed with the results so far. The transformation from traditional QA test cases to AI-assisted ones was exactly what he had envisioned.

But here's the thing—I'm not just building this for myself. I need to show the QA engineers how to take it up next. I'm essentially creating a bridge between development and QA, making sure the framework is intuitive enough for them to use while being powerful enough to handle their testing needs.

The real validation came when I shared the AI-assisted approach with one of our QA engineers, Lokesh Devamuthu. After just one week of using the framework, he reported incredible results:

The Speed Was Unreal

"Yesterday In 2 hours of time completed 75 test cases with Database validation Using playwright MCP agent."

The Reliability Was Impressive

"80% of the test cases were working without any intervention out of these 75 cases fixed those and all are working now."

Minimal Manual Intervention Required

"Only change made manually was in DB validation to select the correct parameters and matching the API negative response for few cases."

Complete Automation Achieved

"Implemented Allure report using MCP AI without any manual intervention. now report is getting generated now."

Near Complete Coverage

"Completed skunits module and only 1 API is pending in orders module total 107 test cases are working with DB validation will Complete Order module today EOD."

75 test cases in 2 hours. That's the kind of productivity that makes you realize you're not just "babysitting" AI—you're orchestrating something revolutionary.

The real value wasn't just in the test code—it was in learning how to work effectively with AI tools and creating something that others could build upon.

What I Learned About the Future

AI is incredibly powerful, but it needs human guidance. The role of senior developers is evolving, not disappearing. "Babysitting" is actually sophisticated orchestration. The future belongs to humans and AI working together.

The Real Value

The real value isn't in the test code we generated—it's in the systems we designed, the patterns we established, and the knowledge we gained about how to work effectively with AI tools.

The term "AI babysitter" might sound dismissive, but it actually describes a crucial role in the future of software testing. We're not just watching AI; we're guiding it, teaching it, and using it to build things that neither humans nor AI could create alone.

Looking Ahead

As we move forward into an increasingly AI-assisted world, senior developers who embrace this role—who learn to work effectively with AI tools while maintaining human oversight and quality control—will be the ones who thrive.

The future of testing isn't about choosing between humans and AI. It's about humans and AI working together to create something greater than the sum of their parts.

And that's a future worth "babysitting" for.

This article was written as a reflection on helping build a comprehensive QA automation framework using Playwright MCP and working with Large Language Models. The project demonstrates the evolving role of senior developers in AI-assisted development and the value of human-AI collaboration in modern software testing.

Special Thanks: To Bhupendra Pandey for guiding me through the AI-assisted approach and supporting the transformation from traditional QA test cases to AI-assisted ones.