Creating a Roadmap to Bring AI to Invoice Processing
A research-driven initiative to uncover how AI can enhance back office work – culminating in a clear product direction focused on AI invoice processing.




Overview
Heading into late 2024, large language models (LLMs) continued to advance rapidly, with reasoning models capable of handling more complex, multi-step tasks starting to emerge. Tofu, a payroll software company, saw an opportunity to apply these advancements across a broader range of back-office functions, including payroll, accounting, HR, and payments.
The vision was simple: to eliminate repetitive back-office work through an autonomous system, allowing humans to focus on higher-value tasks. Drawing from the company’s experience in the payroll space, it was clear how tedious and time-consuming back-office operations could be. We wanted to let AI handle predictable, repetitive work—so that back-office specialists could focus on areas where their expertise truly shines.
Goals
To bring this vision to life, we set out with the following goals:
- Understand the day-to-day realities of back-office work – including payroll, accounting, HR, and payments.
- Discover pain points and identify opportunities where AI can meaningfully reduce friction or manual effort.
- Prototype and validate concepts with domain specialists to inform a clear and actionable product direction.
To make this possible, we partnered closely with GoGlobal, a sister company and back-office provider with deep operational expertise. Their teams became both our domain specialists and early testers, helping us ground ideas in real-world workflows and validate concepts.
Role and Process
I joined the initiative as a technical product manager, working closely with a small, fast-moving team led by the CEO and CTO. Our core group included a customer success manager (CSM) and three engineers – a mix of AI specialists and full-stack developers.
My role focused on three main areas:
- Understand the needs of stakeholders across various back-office domains.
- Leverage my technical insights to spot opportunities where new advancements (especially in LLMs and reasoning models) could meaningfully enhance workflows.
- Collaborate cross-functionally to shape and align the long-term product direction.
Key Outcomes
Early Concepts


Kicking Off Product Discovery
We began with a discovery phase to understand real-world back-office operations at GoGlobal, a business processing outsourcing (BPO) provider. Our research objectives focused on four key areas:
Working with GoGlobal gave us quick access to a variety of BPO roles. While we knew this might skew our perspective toward one company, their depth of experience made them the best partner for early research.
Contextual Inquiries
We started our discovery with contextual inquiries, directly observing back-office specialists as they worked. This approach allowed us to see not just the tasks themselves, but also the surrounding context — workflow triggers, handoffs, and ad hoc processes that often don’t surface in interviews.
We chose to focus on payroll for this study. While most of GoGlobal’s operations are based in Malaysia, their Japan-specific payroll and HR work is handled by a local team — in the same office as Tofu. This made in-person observation both accessible and far more insightful than remote sessions.
As a payroll BPO, GoGlobal manages payroll processing on behalf of client companies. Each month, they receive updated employment details like salary agreements, time sheets, and health insurance deductions. Using this information, they calculate payroll — factoring in overtime, taxes, and insurance — and make payments directly to the client’s employees.
Before diving in, I familiarized myself with Japanese payroll laws and practices to better understand the workflows. Over the course of two weeks, I observed payroll specialists in their day-to-day work, with each session typically lasting around two to four hours.
To make sense of the observations, I mapped out workflow diagrams that captured steps, tools, and decisions involved in each stage of the payroll process.

By analyzing these workflows alongside my field notes, I was able to surface the following key insights.
User Interviews
After conducting contextual inquiries with payroll specialists, we shifted our focus to remote user interviews with BPO operators across Malaysia and other GoGlobal locations. Our goal was to broaden our understanding of responsibilities, pain points, and goals across a wider range of back-office roles and varying levels of experience.
We prepared open-ended questions and conducted semi-structured interviews. This allowed us to follow a consistent line of inquiry while also diving deeper into individual experiences through follow-up questions. Over the course of two weeks, we interviewed 21 back-office operators working in accounting, payments, tax, payroll, entity management, and HR support.
We recognized that user interviews don’t offer the same contextual depth as inquiries—for example, we couldn’t observe how participants interacted with their software, and the level of detail varied depending on what they could recall. However, interviews allowed us to efficiently expand our reach across more roles and locations, providing rich insights within a tight timeline.
The interviews were analyzed using a combination of flowcharting and thematic analysis. A swimlane flowchart helped us visualize end-to-end workflows and understand how different teams collaborate across functions. Thematic analysis allowed us to identify recurring pain points, as well as surface participants’ goals and underlying values.

Through thematic analysis, we synthesized patterns across interviews to uncover recurring challenges, values, and behaviors. This helped us surface four key insights:
Aligning on Product Focus
With a detailed understanding of real-world back-office operations, we came together as a team to narrow our focus. While the engineering team had been actively experimenting with AI models and building foundational infrastructure in parallel, we knew that having a sharp focus was essential to move from exploration to execution.
The main decision was whether to focus horizontally or vertically:
- Horizontal focus: Solve problems that cut across multiple back-office domains—most notably around knowledge management and communication friction. This approach would allow us to tackle pain points across various functions (accounting, payroll, HR, payments)
- Vertical focus: Go deep into a single back-office function (e.g., just accounting or just payroll) and build a best-in-class, specialized solution optimized for the nuances of that domain.
We also considered our target customer:
- End clients: Enterprises and SMBs with internal back-office teams looking to improve their operations.
- Processors: BPO firms like GoGlobal, who manage back-office operations for other companies and operate under stricter accuracy, timeliness, and compliance expectations.

We brainstormed potential solutions to address the pain points we uncovered. These solutions could be roughly categorized into four distinct options below. Each option came with its own strengths and trade-offs across technical viability, competitiveness and go-to-market strategy.
After weighing the trade-offs, we chose to focus on automating a specific BPO workflow. While there was some risk of being too niche, we could mitigate it by targeting a workflow with both high pain and high impact.
We considered several workflows that consumed a significant portion of time and played a critical role: AP invoice processing, bank reconciliation, payroll processing, and payments. We ultimately decided on AP invoice processing, after systematically eliminating the other options.
- Bank reconciliation required handling a wide range of messy edge cases, such as timing mismatches, partial payments, and foreign currency adjustments. In addition, many accounting platforms already offer partial auto-reconciliation features, making the value of a standalone AI solution less compelling.
- Payroll processing was highly country-specific, with complex, localized compliance requirements. Unlike accounting, which is somewhat standardized through frameworks like GAAP, payroll rules vary significantly across regions, increasing complexity.
- Payments involved significant security risks, as automating payment preparation would require integrations with sensitive bank portals or APIs—raising concerns around compliance and liability.
AP invoice processing, on the other hand, presented a large, repetitive workload with relatively standardized processes and clear, measurable ROI—making it the strongest candidate to focus our efforts.
Refining the Product Direction
While our discovery work had already surfaced high-level pain points around invoice processing, we knew that designing an effective AI solution required a much more detailed and nuanced understanding of how invoices are actually handled in practice.
Working closely with GoGlobal’s accounting team, the CSM and I conducted in-depth interviews and workshops to unpack each step of the invoice processing workflow. We documented:
- Typical document types (e.g., invoices, purchase orders, receipts)
- Required validations (e.g., matching invoice line items to supplier agreements, verifying vendor bank details)
- Approval chains
- Common exception handling processes
- Systems and tools involved at each stage
To organize and synthesize our findings, we created spreadsheet that mapped the end-to-end process. This process map included not just the standard flow, but also the common variations and points of failure that accounting specialists routinely encounter.

By visualizing the workflow this way, we were able to:
- Identify the steps most amenable to AI automation (e.g., information extraction, validation checks)
- Highlight where human review was still critical (e.g., resolving ambiguous cases, sudden number fluctuations)
- Build a clearer understanding of how we could measure and demonstrate value—such as time saved per invoice or reduction in error rates
This detailed process mapping became a foundational artifact for the team and directly guided our early prototyping efforts.
At this stage, I created early concept wireframes to help align our team, spark discussions around how the product should work, and gather early feedback from GoGlobal accounting specialists with a more concrete picture.
The engineering team had been prototyping AI agents capable of extracting invoice data, seeking approvals from accounting specialists, and entering the finalized data into Xero. In parallel, I helped identify and scope the key components of the invoice processing workflow where AI could meaningfully drive automation:
- Data extraction and formatting
- PDF/image extraction: Extract invoice information using vision models to handle a wide range of file types and layouts.
- Field identification: Identify key fields such as invoice number, vendor name, due date, total amount, and tax type from extracted data.
- Data standardization: Normalize inconsistent data formats (e.g., date formats, string formats) to create clean, structured invoice entries.
- Knowledge and inference
- Vendor recognition: Matching extracted vendor details with existing records
- COA mapping: Determining which account to assign based on invoice contents
- Tax rate: Identifying and applying the appropriate VAT/GST
- Withholding tax: Check if an invoice is subject to withholding tax by referencing local accounting law
- Prepayments and accrual expenses: Detecting invoices subject to accrual accounting practices and getting the user involved to check
Breaking down the workflow into these distinct components helped me to shape the roadmap into four technical and product milestones:

Reaching Level 2 would require significant refinement of our prompts and fine-tuning of models. At this stage, we could fully leverage GoGlobal’s historical invoice data to improve extraction accuracy, inference quality, and overall system reliability.
The transition from Level 2 to Level 3 represents the most significant leap in value. In Level 2, although AI could populate invoice entries with high accuracy, human specialists would still need to review every entry before finalization to catch potential errors. In contrast, Level 3 introduces selective human review: the AI agent not only improves in accuracy but also flags invoices that require human attention, allowing low-risk entries to be processed autonomously.
This shift would dramatically expand the capacity of BPO operations—freeing specialists to focus only on edge cases and complex judgments while allowing AI to confidently handle the majority of routine invoice processing.
After helping define the initial product direction and roadmap, I wrapped up my involvement. Since then, the team has continued to iterate and, as of March 2025, is onboarding early adopters to further validate and improve the system.