Shipping a Commercial MCP Server

← Back to Part 1: Agents Are More Like People Than Software

I've spent 24 years building and commercialising web APIs. Recently, we shipped a commercial MCP server at Cronofy and started building agentic products on top of it. What follows is what I've learned so far about what changes when your API consumer can reason, and what doesn't change at all.

This is a two-part series. Part 1 covers the design shift: why the interface patterns that work for deterministic software don't work for agents, and what to do instead. Part 2 covers the operational reality of shipping a commercial MCP server, including the surprises, the trade-offs, and the decisions we're still making.

The API that knows its client

One feature of MCP I didn’t appreciate until we started building is the dynamic nature of the interface. Because the client queries the tool schema at runtime, you can adapt the API to the client’s context in real-time.

Cronofy connects to systems of record through their APIs, for example recruiting systems like Greenhouse, SmartRecruiters, or Workday. When an agent connects to our MCP server in the context of a specific organisation, we can surface the downstream connection. If that org uses Workday, the tool schema can expose a parameter called “workday_job_requisition_id.” If they use SmartRecruiters, it’s “smart_recruiters_job_requisition_id.”

This isn’t cosmetic. It gives the agent a much better chance of getting the right value. It might already have the ID from context within its workflow. Or it can surface a specific request to the user: “I need the SmartRecruiters job requisition ID.” The user knows exactly what to look for, in which system.

The behaviour changes too. Provide a job requisition ID, and Cronofy auto-updates the interview scheduling in the recruiting system. Same tool. Different behaviour. Based on organisational context. Discovered at runtime.

When I first saw MCP clients exploring our tool schema live, interpreting it in real time, I realised we didn’t have to think rigidly about API design. The client was built to discover capabilities dynamically rather than assume a fixed contract.

We’ve made a virtue at Cronofy of never re-versioning our API. We’ve always invested in never making breaking changes yet augmenting the capabilities so clients can adopt them when they’re ready. Critical for deterministic software clients. Agents as API clients via MCP has rocked that foundation of my world.

Versioning: not yet

We’re not versioning the MCP server.

Because clients discover capabilities at runtime, you don’t need v1, v2, v3. You change what’s available and the client adapts. There’s no breaking change in the traditional sense because the client never assumed a fixed schema.

The question I’m watching is whether MCP clients will start caching tool definitions. We haven’t seen it yet. But for performance reasons, it’s plausible. If they do, you’re back to a versioning problem, albeit a simpler one than REST versioning. Track the versions you’ve exposed. Use observability to understand whether agents are still pulling from a cached version. Deprecate when it’s safe.

That’s better than arbitrary support windows. You deprecate based on actual usage, not a policy document. But it’s worth designing for the possibility.

I can see a world where versioning makes more sense broadly. As the true cost of running agents emerges, as businesses need cost controls and deterministic client behaviour, there may be a case for fixed API versions. But nothing we’ve designed precludes that.

One of our principles at Cronofy: always leave doors open. We ask whether any decision we’re making now precludes us from taking advantage of future changes. If the answer is no, it’s a two-way door. Walk through it. It’s only the one-way doors that need careful thought.

Data filtering and the trust asymmetry

When you expose data to an agent, you need to be mindful of what it sees. You have no control as to what it will do with that information.

In our case, we only expose summary information, attendees, date and time of calendar events when we’re operating in a user’s private context. If they’re operating in an organisational or shared context, this information is too sensitive to share by default. This leans into existing compliance paths and permission models within the Microsoft and Google calendar infrastructure. The permissions model behaves as people would recognise from other systems.

There is also a trust asymmetry that needs to be considered. When Cronofy controls both ends — our scheduling agent sitting between our MCP server and Slack or Teams — we’re confident the agent is being provided with the correct identity of the user and the context that they’re operating in. In a more open MCP context, with a client we don’t control, that confidence just isn’t there.

We handle this through two mechanisms. Token level: organisational tokens get different capabilities than personal tokens. Tool descriptions: narrow enough that the agent understands its scope and stays in lane.

API designers should think about the context in which an agent could use information, and help people not get themselves into trouble. We’ve been intentionally conservative about what we expose. Whether that changes as authentication models improve is an open question.

Agents that lie about making calls

One thing that surprised me working with Microsoft Copilot: the agent tells the user it called the MCP server when it didn’t.

It’s not in the logs. The call never happened. But the agent says it did, or implies it did. It might be predicting failure based on state it doesn’t have. Even when a human tells it to try again, it sometimes decides it knows better.

This is mainly a setup and debugging problem, not a production one. It tends to happen during workflow configuration, when connections are being established and state is settling. The defence is visibility. Logging. The ability to see in tools like Copilot Studio exactly which calls were made. You can’t prevent an agent from confabulating a tool call. You can make it obvious when it has.

Sometimes the fix is just restarting the session. Blow away the context so the agent stops assuming the call will fail.

The practical takeaway: you need better observability for MCP servers and/or the MCP Client than for traditional APIs. Not because the server is less reliable, but because the client might tell you it worked when it didn’t.

Idempotency when the caller can’t loop

REST APIs are generally atomic operations on single resources. Retry logic is straightforward.

MCP tools are often longer-running transactions spanning multiple resources. Retry becomes more complex.

Our event update tool is idempotent. The contract is “make the event look like this.” If it already looks like that, nothing changes. That’s inherited from our REST API design.

Scheduling is different. A scheduling request is deliberately non-idempotent because scheduling the same meeting twice might be legitimate. You might want to meet the same people on the same topic next week too.

We handle this through identifiers. Every scheduling request has an ID. If the agent provides the same ID on retry, it’s updating an existing request. Safe. If it omits the ID, it’s creating a new one. The tool description makes this distinction clear, and so far agents have respected it.

MCP’s protocol includes marking tools as idempotent or not. That signal should help agents decide whether retrying is safe. But whether all agents use that signal consistently is still an open question.

Testing: cheaper than you think

From the server side, an MCP server is deterministic software. It accepts structured input. It returns JSON. The testing approach is standard. Known input, expected output.

What changed is the economics of test coverage. When you’re writing software with a coding agent, typing is free. The ROI calculation for edge case coverage shifts dramatically. Without a coding agent, you’re making a judgement call about how many fixtures to set up, how many permutations to cover. With one, you scale out the coverage because the cost has collapsed.

The exploratory piece is how MCP clients interpret what the server returns. I was cautious about this. We return complex nested objects through a shallow schema layer. In practice, agents interpreted them well and maintained context across calls. Sensible attribute naming probably helped. But the discovery of client behaviour at the edges is ongoing.

What MCP is, and what it isn’t

MCP is a protocol that allows agents to talk to software. It provides discovery and transport, tailored for how agents consume APIs.

It doesn’t solve authentication. That’s still OAuth 2.0 or whatever you bolt on. It doesn’t solve authorisation, compliance, idempotency, or data filtering. Those are your responsibility as the service builder. Early builders need to extend the protocol sensibly and lean on existing standards where MCP doesn’t yet have answers.

A single vendor started the process of creating a standard. They’ve done a good enough job that people are adopting it. It will evolve. Identity handling will improve. Authentication flows will mature. But right now, thinking “I’ve built an MCP server, so agent integration is solved” misses the point. You’re still building an API. The hard problems haven’t gone anywhere.

Remember: we’re just talking about two bits of software talking to each other over the internet. The problems are all the same. The protocol provides a common language for that interaction, tailored for the style of client. It’s driven by the style of client rather than the style of server. And the MCP server itself is just deterministic software sitting behind an interface. There may be agentic capabilities in the stack behind it, but MCP does not require that.

What stays the same

Rate limiting. Authentication. Authorisation. Logging. Compliance. Audit trails.

Rate limiting is less of a practical concern with agents than with traditional clients. Agents have a natural processing cadence. They interpret, decide, call, wait, interpret again. There’s no tight loop hammering your API the way a broken retry in code would. You still need rate limiting, but the risk profile is different.

The compliance position doesn’t change because the schema adapts at runtime. An MCP server is deterministic software with fixed parameters that can be logged and audited. You know the authorisation context. Code changes are the same compliance concern as any other deployment.

Our existing usage-based pricing maps well to agent consumption. Tasks are operations. Operations generate usage events. We could shift to task-based pricing if demand appeared. It hasn’t yet. Another two-way door.

Twenty-four years of building and commercialising APIs taught me that the fundamentals don’t shift when a new abstraction arrives. They find new ways to test whether you’ve actually internalised them.

So is MCP the answer?

MCP earns its place where dynamic capability discovery, runtime schema adaptation, and task-level interfaces genuinely serve how agents operate.

It is not a complete solution. It is a protocol. A good one, at the right moment, solving the right layer of the problem. Everything above and below that layer is still yours to build.

I can definitely see a future where MCP and REST coexist. Inference costs are subsidised as market share is being captured but that won’t last for long. I can see a future where agents use MCP interfaces to design and validate workflows but then switch to building deterministic clients against REST APIs for the ongoing execution phase. 90% of a workflow’s runs end up being executed with cheap deterministic software. Agents deal with the exceptional 10% with the more flexible MCP approach.

The real answer isn’t MCP or REST. It’s understanding that agents operate more like people than software, and designing your interfaces accordingly. Give them tasks, not building blocks. Guide them when they fail. Show them only what they need. Take responsibility for the outcome within your domain.

And leave your doors open for what comes next.

Shipping a commercial MCP server