Types of software testing are ordered categories that describe at what level, for what purpose, and in what way we verify how a system works. They are most often classified along four dimensions: level (from unit to acceptance), knowledge of the code (white-box vs black-box), purpose (functional vs non-functional), and execution method (manual vs automated). These dimensions are not mutually exclusive — a single test can be described by several categories at once. Below we go through each of them and explain when to use it.

Why is it worth understanding the classification of tests?

Classifying tests is not academic trivia — it is a tool for making decisions. When you know that a given test is “unit, white-box, functional, and automated,” you immediately understand its role: it is cheap, fast, checks the logic of a single piece of code, and can run in the CI loop on every commit.

In practice, teams that do not understand these distinctions make two typical mistakes. The first is over-investing in a single level — for example, hundreds of slow system tests where cheap unit tests would have been enough. The second is gaps in coverage — the system is excellently tested functionally, but it collapses under production load because nobody thought about non-functional tests. Mapping tests onto categories helps you see where the holes really are.

The second benefit is about communication. When a developer, a tester, and the person responsible for the business all use the same vocabulary, a large share of misunderstandings disappears. “Add regression tests to this change” means something specific, rather than “test it somehow.” A shared classification turns vague expectations into measurable tasks that can be planned, estimated, and accounted for.

Classification by level: from unit to acceptance

The classification by level answers the question: how large a piece of the system are we testing at once? This is the most fundamental dimension, often presented as the test pyramid — a wide, cheap base at the bottom and a narrow, costly peak at the top.

  • Unit tests — check the smallest isolated element of code: a function, a method, a class. They are the fastest and cheapest to maintain, which is why they should make up the largest part of the test suite. They are usually written by the developer, alongside the code. Use them everywhere there is logic worth checking.
  • Integration tests — verify cooperation between modules: whether the payment module talks correctly to the database, whether the API returns what the front end expects. This is where bugs at the seams between components reveal themselves — bugs invisible at the unit level.
  • System tests — check the complete, integrated system as a whole, in an environment close to production. They cover full end-to-end business scenarios.
  • Acceptance tests (UAT) — answer the business question: does the system do what the user or client needs? They are often performed with the participation of the end recipient and decide whether a version is approved for deployment.

The golden rule: the lower the level, the more tests there are and the faster the feedback. The higher you go, the more expensive and slower the tests become, and the fewer of them there should be.

Classification by knowledge of the code: white-box vs black-box

This division concerns how much we know about the internal structure of the tested system.

White-box tests assume full knowledge of the source code. The tester (usually a developer) designs cases so as to walk through specific execution paths, condition branches, and loops. The goal is code coverage — making sure that every significant statement and decision has been tested. This approach dominates at the unit level.

Black-box tests treat the system as a closed box: we only know the inputs and the expected outputs, and we do not look inside. Cases are designed based on requirements and specifications, not on the structure of the code. This approach dominates at the system and acceptance levels, where we look at behavior from the user’s perspective.

There is also an intermediate approach — grey-box — combining partial knowledge of the architecture with an external perspective, useful for example in integration and security testing.

Classification by purpose: functional vs non-functional

This is the division that is most often the source of production surprises. Functional tests check WHAT the system does — whether the business logic works as required. Will the user log in after entering a correct username and password? Does the cart calculate the discount correctly? This is verification of behaviors and rules.

Non-functional tests check HOW the system works — that is, its quality attributes. These include, among others, performance and load tests, security tests, usability tests, accessibility tests (WCAG compliance), and compatibility tests. A system may be 100% functionally correct and still be unfit for use because it falls over at 200 concurrent users or leaks data.

In short, the most important non-functional categories are:

  • Performance and load — how the system behaves under normal and peak traffic, where the threshold lies beyond which response times grow or the service falls over.
  • Security — whether data is protected, whether there are vulnerabilities allowing unauthorized access or a leak.
  • Usability and accessibility — whether the interface is understandable to the user and whether it meets accessibility standards (e.g. WCAG).
  • Compatibility — whether the application works correctly across different browsers, devices, and system versions.

This is precisely the area that tends to be neglected most often — mainly because non-functional bugs do not surface during “normal clicking,” only under extreme conditions or not until production. We break it down into its component parts in greater detail in a separate piece on non-functional tests — it is worth treating as a complement to this classification.

Classification by execution method: manual vs automated

The next dimension concerns who performs the test — a human or a machine.

Manual tests are performed by a tester by hand, step by step. They are irreplaceable wherever human judgment matters: exploratory testing, usability, testing of new and frequently changing features where automation does not yet pay off. A human will catch “something is off here” that a script would not notice.

Automated tests are scripts that execute cases without human involvement. They excel at repeatable, stable scenarios that are run often — above all in regression and in CI/CD pipelines. The investment in automation pays off where the same test has to be run hundreds of times. We write more about what is worth automating and when in our guide to QA test automation.

This is not an either-or choice. A mature team combines both approaches: it automates stable regression and smoke tests, and reserves testers’ time for exploration and quality assessment where a machine cannot help.

Regression, smoke, and sanity tests — when to use them?

These three types are often confused, even though they serve different roles and are run at different moments.

  • Smoke tests — a very narrow set checking whether the most important functions work at all and whether the build is fit for further testing. This is “does the application even start and log in?” They are run immediately after a new build, before anyone invests time in deeper testing.
  • Sanity tests — a narrow but deep check of a specific function or fix after a small change. They answer the question: “does this one repaired module work reasonably?” They do not cover the whole system, only the area affected by the change.
  • Regression tests — check whether a new change has not broken something that worked before. This is one of the most important and most frequently automated types of test, because with each version of the system the regression set grows. We describe in detail when and how to run them, and what to automate first, in our piece on regression testing.

A simplified distinction: smoke says “is it even worth testing further,” sanity says “is this fix sensible,” and regression says “did nothing else break.”

Comparison table of the most important types of tests

The table below shows how the key types of tests break down by who performs them, when, and for what purpose.

Type of testClassification dimensionWho performs itWhen to use itTypical level of automation
UnitlevelDeveloperOn every commitHigh
IntegrationlevelDeveloper / QAAfter connecting modulesHigh
SystemlevelQABefore releaseMedium
Acceptance (UAT)levelClient / businessBefore deploymentLow
FunctionalpurposeQAThroughout the development cycleMedium / high
Non-functionalpurposeQA specialistsBefore major releasesMedium
Smokescope / purposeQA / CIAfter a new buildHigh
Regressionscope / purposeQA / CIAfter every changeHigh

The table is deliberately simplified — in a real project the level of automation depends on the maturity of the team and the nature of the product. Treat it as a starting point, not a rigid rule.

How do these dimensions combine in practice?

The most important thing to remember: these classifications overlap, they do not exclude one another. One specific test can be described by several categories at the same time. Example: an automated test checking whether the registration form returns an error for a taken email is at once a functional test (it checks the logic), black-box (we look only at the input and output), at the system level (the whole registration flow), and automated (a script in CI runs it).

That is why, when designing a test strategy, we do not ask “which type to choose,” but “how to distribute the effort across all the dimensions.” A good strategy has a solid foundation of unit tests, a reasonable layer of integration and system tests, deliberately planned non-functional tests, and automated regression. The human element — exploratory and acceptance tests — stays where it brings the most value.

It is also worth deciding consciously when we start testing. Shifting tests earlier in the cycle (and monitoring after deployment) is a separate, important topic — we describe the differences between the approaches in our shift-left vs shift-right comparison.

How does ARDURA Consulting support testing and QA?

At ARDURA Consulting we treat testing not as a stage “at the end,” but as an element of quality built into the entire software development cycle. We support clients in two complementary ways.

The first is testing services — we take responsibility for designing and running the QA process: from the test strategy and the selection of the right types for a given product, through regression automation, to non-functional tests. You will find the full offering on the ARDURA Consulting testing services page.

The second is tester staff augmentation — when you need to strengthen your own team with experienced QA specialists. From a pool of over 500 seniors we select testers with the right competencies, and onboarding usually takes about two weeks. This solution works well when you already have a process but lack hands or a specific specialization — for example automation or performance testing.

Regardless of the cooperation model, we act as a partner, not a supplier of hours: we help select the types of tests that genuinely reduce risk in your project, and avoid over-investing where it brings no value.

Want to put testing in order in your project? Get in touch — we will help you assess where the biggest gaps in coverage are and how to close them.