Should I use machine learning or rule-based logic for this problem?

By Andrew

Decision tree datamachine learningaianalyticssoftware design

Decide whether your prediction or classification problem calls for hand-crafted rules, a classical ML model, deep learning, or a pre-built AI API. Answering questions about your data volume, explainability needs, and problem complexity will surface the approach with the best effort-to-value ratio.

Open interactive version →

Overview

Type: Decision tree
Tags: data, machine learning, ai, analytics, software design
Entry: Q1
Questions: 5
Outcomes: 5
Author: Andrew
Last updated: 2026-05-12

Decision Tree

Start: Does a well-maintained pre-trained model or API already exist for this exact domain?

yes

Outcome: Use a Pre-built API

no

Continues to question: Do you have at least a few thousand labelled training examples available?

Machine-Readable JSON (Canonical Model)

View JSON

{
  "_meta": {
    "schema": "https://www.drawdecisiontree.com/decision-dag.schema.json",
    "source": "https://www.drawdecisiontree.com",
    "description": "DrawDecisionTree.com is a free tool for building, sharing, and embedding interactive decision trees. This file is the machine-readable export of a published decision tree. The `dsl` field contains the original source in the Decision DAG DSL; the `dag` schema is documented at the URL in `schema` above.",
    "links": {
      "interactive": "https://www.drawdecisiontree.com/t/drawdecisiontree/data-ml-vs-rules.html",
      "embed": "https://www.drawdecisiontree.com/embed/path/drawdecisiontree/data-ml-vs-rules",
      "dsl_reference": "https://www.drawdecisiontree.com/decision-tree-dsl-reference.html",
      "guides": "https://www.drawdecisiontree.com/guides",
      "schema_docs": "https://www.drawdecisiontree.com/decision-dag.schema.json",
      "author_trees": "https://www.drawdecisiontree.com/trees/drawdecisiontree"
    },
    "generated_at": "2026-05-29T12:05:39.269Z"
  },
  "author": {
    "handle": "drawdecisiontree",
    "first_name": "Andrew",
    "last_name": null,
    "avatar_url": "1d32d828-b6ca-40ec-bdd7-771fe7b9c36a/avatar-1778531481027.svg",
    "display_name": "Andrew"
  },
  "file": {
    "id": "10d92182-b53d-4d69-bf83-8436c0f3f9b3",
    "name": "Should I use machine learning or rule-based logic for this problem?",
    "public_slug": "data-ml-vs-rules",
    "updated_at": "2026-05-12T16:53:43.587978+00:00",
    "url": "https://www.drawdecisiontree.com/t/drawdecisiontree/data-ml-vs-rules.html",
    "json_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/data-ml-vs-rules/tree.json",
    "dsl_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/data-ml-vs-rules/tree.dag"
  },
  "meta": {
    "description": "Decide whether your prediction or classification problem calls for hand-crafted rules, a classical ML model, deep learning, or a pre-built AI API. Answering questions about your data volume, explainability needs, and problem complexity will surface the approach with the best effort-to-value ratio.",
    "mode": "decision",
    "entry": "Q1",
    "tags": [
      "data",
      "machine learning",
      "ai",
      "analytics",
      "software design"
    ],
    "image": "https://images.unsplash.com/photo-1677442135703-1787eea5ce01?w=1200&q=80"
  },
  "questions": [
    {
      "id": "Q1",
      "text": "Does a well-maintained pre-trained model or API already exist for this exact domain?"
    },
    {
      "id": "Q2",
      "text": "Do you have at least a few thousand labelled training examples available?"
    },
    {
      "id": "Q3",
      "text": "Is the model's decision-making process required to be explainable to stakeholders or regulators?"
    },
    {
      "id": "Q4",
      "text": "Is the problem complex enough that a flat if/then ruleset would require more than ~20 rules to cover common cases?"
    },
    {
      "id": "Q5",
      "text": "Does the input data involve unstructured content such as images, audio, or raw text at scale?"
    }
  ],
  "outcomes": [
    {
      "id": "SIMPLE_RULES",
      "label": "Simple Rules (if/then Logic)"
    },
    {
      "id": "DECISION_TREE",
      "label": "Decision Tree / Scorecard"
    },
    {
      "id": "TRAD_ML",
      "label": "Traditional ML Model"
    },
    {
      "id": "DEEP_LEARNING",
      "label": "Deep Learning / Neural Network"
    },
    {
      "id": "PREBUILT_API",
      "label": "Use a Pre-built API"
    }
  ],
  "dsl": "dag: Should I use machine learning or rule-based logic for this problem?\nversion: 1.0.0\nimage: https://images.unsplash.com/photo-1677442135703-1787eea5ce01?w=1200&q=80\ndescription: Decide whether your prediction or classification problem calls for hand-crafted rules, a classical ML model, deep learning, or a pre-built AI API. Answering questions about your data volume, explainability needs, and problem complexity will surface the approach with the best effort-to-value ratio.\ntags: data, machine learning, ai, analytics, software design\nentry: Q1\n\nQ1: Does a well-maintained pre-trained model or API already exist for this exact domain?\n  hint: Examples include OpenAI for text generation and summarisation, AWS Rekognition for image classification, Google Cloud Vision for OCR, or Azure Cognitive Services for sentiment analysis. If a vendor already solved this problem at scale, building from scratch is rarely justified. Consider the API route only if you can tolerate the vendor's latency, pricing, and data-sharing terms. Check whether the vendor's model has been evaluated on data similar to yours before committing — generic APIs can perform poorly on domain-specific vocabulary or image types.\n  yes -> [PREBUILT_API]\n  no  -> Q2\n\nQ2: Do you have at least a few thousand labelled training examples available?\n  hint: Rule-of-thumb thresholds vary by problem, but classical ML typically needs hundreds to low thousands of labelled rows; deep learning needs tens of thousands or more. If your labelled data is sparse or expensive to collect, a rules-based or scorecard approach may outperform a poorly-trained model and will certainly be faster to deploy. Consider whether you can grow the dataset over time before committing to an ML-first architecture — active learning and human-in-the-loop labelling pipelines can accelerate data collection significantly.\n  yes -> Q3\n  no  -> [SIMPLE_RULES]\n\nQ3: Is the model's decision-making process required to be explainable to stakeholders or regulators?\n  hint: Regulated industries — credit scoring, healthcare triage, insurance underwriting — often require that every individual decision can be justified in plain language. If your legal or compliance team needs to audit model outputs, black-box approaches create significant risk. Gradient-boosted trees, logistic regression, and scorecards offer good accuracy while remaining interpretable; neural networks require post-hoc methods like SHAP which are harder to defend in a regulatory context.\n  yes -> Q4\n  no  -> Q5\n\nQ4: Is the problem complex enough that a flat if/then ruleset would require more than ~20 rules to cover common cases?\n  hint: A problem with many interacting features — e.g. credit risk involving income, debt, employment type, and payment history simultaneously — quickly becomes intractable as a hand-authored ruleset. Decision tree-based scorecards capture these interactions automatically while still producing a path of human-readable splits. If the domain expert can write the decision logic on a single whiteboard without contradictions, stick with simple rules rather than introducing model governance overhead.\n  yes -> [DECISION_TREE]\n  no  -> [SIMPLE_RULES]\n\nQ5: Does the input data involve unstructured content such as images, audio, or raw text at scale?\n  hint: Deep learning excels at tasks where features must be learned from raw signals — computer vision, speech recognition, large-scale NLP beyond simple keyword matching. If your inputs are structured tabular rows with engineered features, classical ML will usually match deep learning accuracy at a fraction of the infrastructure and training cost. Only invest in neural networks when you have the GPU budget, data volume, and MLOps maturity to support long training runs, versioned model artefacts, and continuous monitoring pipelines.\n  yes -> [DEEP_LEARNING]\n  no  -> [TRAD_ML]\n\n[SIMPLE_RULES]: Simple Rules (if/then Logic)\n  color: #48BB78\n  description: A hand-authored ruleset is the fastest path to a production decision system when domain knowledge is well-understood and labelled data is scarce. Document every rule in a version-controlled YAML or spreadsheet so that changes are auditable, and implement a monitoring dashboard to track rule trigger rates over time. Start with the highest-impact rules and extend incrementally as edge cases emerge. Re-evaluate after six months whether growing labelled data volume now justifies a machine learning upgrade — keep track of cases the rules get wrong to bootstrap a future training set.\n  code: ML_SIMPLE_RULES\n\n[DECISION_TREE]: Decision Tree / Scorecard\n  color: #4F86C6\n  description: A trained decision tree or logistic scorecard delivers the interpretability of rules with the accuracy of a learned model. Use gradient-boosted shallow trees (XGBoost or LightGBM with max_depth ≤ 4) or a logistic regression with binned features to produce a scorecard that can be reviewed by a credit officer or clinician. Store feature importance scores alongside the model artefact and schedule quarterly re-training as the data distribution evolves. Document the feature engineering pipeline thoroughly so that any analyst can reproduce the score, and version the model in MLflow or a similar registry.\n  code: ML_DECISION_TREE\n\n[TRAD_ML]: Traditional ML Model\n  color: #ED8936\n  description: Classical supervised learning — gradient-boosted trees, random forests, or regularised linear models — is the right default for structured tabular data with adequate labelled examples. Start with a baseline (logistic regression or XGBoost) before reaching for more complex architectures. Track model performance metrics in an experiment registry such as MLflow or Weights & Biases, and set up automated drift detection so that degrading accuracy triggers a re-training alert. Define a clear champion/challenger process for promoting new model versions to production and document your feature engineering logic in a reproducible pipeline.\n  code: ML_TRAD_ML\n\n[DEEP_LEARNING]: Deep Learning / Neural Network\n  color: #9F7AEA\n  description: Deep learning is warranted when the input modality is unstructured (images, audio, text) and you have sufficient labelled data and GPU infrastructure to support training. Begin with transfer learning from a pre-trained foundation model — a fine-tuned BERT for NLP or a ResNet backbone for vision — rather than training from scratch, as this dramatically reduces data requirements. Invest in robust MLOps: containerised training jobs, model versioning, A/B testing infrastructure, and automated re-training pipelines. Monitor for data drift and fairness metrics continuously in production, and document compute costs so that the business can make an informed ROI assessment.\n  code: ML_DEEP_LEARNING\n\n[PREBUILT_API]: Use a Pre-built API\n  color: #38B2AC\n  description: Leveraging a managed AI API is almost always the fastest and most cost-effective route when a reputable vendor has already solved your domain problem at scale. Integrate via the vendor's SDK, implement retry logic and rate-limit handling from day one, and log every request/response pair for audit and debugging. Evaluate vendor lock-in risk by abstracting the integration behind an internal interface so you can swap providers without rewriting business logic. Review the vendor's data retention and privacy policies carefully before sending any sensitive or regulated data, and benchmark latency and cost at your expected production request volume.\n  code: ML_PREBUILT_API\n"
}

DSL Representation

dag: Should I use machine learning or rule-based logic for this problem?
version: 1.0.0
image: https://images.unsplash.com/photo-1677442135703-1787eea5ce01?w=1200&q=80
description: Decide whether your prediction or classification problem calls for hand-crafted rules, a classical ML model, deep learning, or a pre-built AI API. Answering questions about your data volume, explainability needs, and problem complexity will surface the approach with the best effort-to-value ratio.
tags: data, machine learning, ai, analytics, software design
entry: Q1

Q1: Does a well-maintained pre-trained model or API already exist for this exact domain?
  hint: Examples include OpenAI for text generation and summarisation, AWS Rekognition for image classification, Google Cloud Vision for OCR, or Azure Cognitive Services for sentiment analysis. If a vendor already solved this problem at scale, building from scratch is rarely justified. Consider the API route only if you can tolerate the vendor's latency, pricing, and data-sharing terms. Check whether the vendor's model has been evaluated on data similar to yours before committing — generic APIs can perform poorly on domain-specific vocabulary or image types.
  yes -> [PREBUILT_API]
  no  -> Q2

Q2: Do you have at least a few thousand labelled training examples available?
  hint: Rule-of-thumb thresholds vary by problem, but classical ML typically needs hundreds to low thousands of labelled rows; deep learning needs tens of thousands or more. If your labelled data is sparse or expensive to collect, a rules-based or scorecard approach may outperform a poorly-trained model and will certainly be faster to deploy. Consider whether you can grow the dataset over time before committing to an ML-first architecture — active learning and human-in-the-loop labelling pipelines can accelerate data collection significantly.
  yes -> Q3
  no  -> [SIMPLE_RULES]

Q3: Is the model's decision-making process required to be explainable to stakeholders or regulators?
  hint: Regulated industries — credit scoring, healthcare triage, insurance underwriting — often require that every individual decision can be justified in plain language. If your legal or compliance team needs to audit model outputs, black-box approaches create significant risk. Gradient-boosted trees, logistic regression, and scorecards offer good accuracy while remaining interpretable; neural networks require post-hoc methods like SHAP which are harder to defend in a regulatory context.
  yes -> Q4
  no  -> Q5

Q4: Is the problem complex enough that a flat if/then ruleset would require more than ~20 rules to cover common cases?
  hint: A problem with many interacting features — e.g. credit risk involving income, debt, employment type, and payment history simultaneously — quickly becomes intractable as a hand-authored ruleset. Decision tree-based scorecards capture these interactions automatically while still producing a path of human-readable splits. If the domain expert can write the decision logic on a single whiteboard without contradictions, stick with simple rules rather than introducing model governance overhead.
  yes -> [DECISION_TREE]
  no  -> [SIMPLE_RULES]

Q5: Does the input data involve unstructured content such as images, audio, or raw text at scale?
  hint: Deep learning excels at tasks where features must be learned from raw signals — computer vision, speech recognition, large-scale NLP beyond simple keyword matching. If your inputs are structured tabular rows with engineered features, classical ML will usually match deep learning accuracy at a fraction of the infrastructure and training cost. Only invest in neural networks when you have the GPU budget, data volume, and MLOps maturity to support long training runs, versioned model artefacts, and continuous monitoring pipelines.
  yes -> [DEEP_LEARNING]
  no  -> [TRAD_ML]

[SIMPLE_RULES]: Simple Rules (if/then Logic)
  color: #48BB78
  description: A hand-authored ruleset is the fastest path to a production decision system when domain knowledge is well-understood and labelled data is scarce. Document every rule in a version-controlled YAML or spreadsheet so that changes are auditable, and implement a monitoring dashboard to track rule trigger rates over time. Start with the highest-impact rules and extend incrementally as edge cases emerge. Re-evaluate after six months whether growing labelled data volume now justifies a machine learning upgrade — keep track of cases the rules get wrong to bootstrap a future training set.
  code: ML_SIMPLE_RULES

[DECISION_TREE]: Decision Tree / Scorecard
  color: #4F86C6
  description: A trained decision tree or logistic scorecard delivers the interpretability of rules with the accuracy of a learned model. Use gradient-boosted shallow trees (XGBoost or LightGBM with max_depth ≤ 4) or a logistic regression with binned features to produce a scorecard that can be reviewed by a credit officer or clinician. Store feature importance scores alongside the model artefact and schedule quarterly re-training as the data distribution evolves. Document the feature engineering pipeline thoroughly so that any analyst can reproduce the score, and version the model in MLflow or a similar registry.
  code: ML_DECISION_TREE

[TRAD_ML]: Traditional ML Model
  color: #ED8936
  description: Classical supervised learning — gradient-boosted trees, random forests, or regularised linear models — is the right default for structured tabular data with adequate labelled examples. Start with a baseline (logistic regression or XGBoost) before reaching for more complex architectures. Track model performance metrics in an experiment registry such as MLflow or Weights & Biases, and set up automated drift detection so that degrading accuracy triggers a re-training alert. Define a clear champion/challenger process for promoting new model versions to production and document your feature engineering logic in a reproducible pipeline.
  code: ML_TRAD_ML

[DEEP_LEARNING]: Deep Learning / Neural Network
  color: #9F7AEA
  description: Deep learning is warranted when the input modality is unstructured (images, audio, text) and you have sufficient labelled data and GPU infrastructure to support training. Begin with transfer learning from a pre-trained foundation model — a fine-tuned BERT for NLP or a ResNet backbone for vision — rather than training from scratch, as this dramatically reduces data requirements. Invest in robust MLOps: containerised training jobs, model versioning, A/B testing infrastructure, and automated re-training pipelines. Monitor for data drift and fairness metrics continuously in production, and document compute costs so that the business can make an informed ROI assessment.
  code: ML_DEEP_LEARNING

[PREBUILT_API]: Use a Pre-built API
  color: #38B2AC
  description: Leveraging a managed AI API is almost always the fastest and most cost-effective route when a reputable vendor has already solved your domain problem at scale. Integrate via the vendor's SDK, implement retry logic and rate-limit handling from day one, and log every request/response pair for audit and debugging. Evaluate vendor lock-in risk by abstracting the integration behind an internal interface so you can swap providers without rewriting business logic. Review the vendor's data retention and privacy policies carefully before sending any sensitive or regulated data, and benchmark latency and cost at your expected production request volume.
  code: ML_PREBUILT_API

Machine Access

Static JSON: /t/drawdecisiontree/data-ml-vs-rules/tree.json
Live JSON (SPA): /json/drawdecisiontree/data-ml-vs-rules
Raw DSL: /t/drawdecisiontree/data-ml-vs-rules/tree.dag
Canonical HTML: /t/drawdecisiontree/data-ml-vs-rules.html

Questions in this decision tree

Does a well-maintained pre-trained model or API already exist for this exact domain?
Do you have at least a few thousand labelled training examples available?
Is the model's decision-making process required to be explainable to stakeholders or regulators?
Is the problem complex enough that a flat if/then ruleset would require more than ~20 rules to cover common cases?
Does the input data involve unstructured content such as images, audio, or raw text at scale?

Possible outcomes

Simple Rules (if/then Logic)
Decision Tree / Scorecard
Traditional ML Model
Deep Learning / Neural Network
Use a Pre-built API

How to use this decision tree

Click "Open interactive version" to step through the questions. Your answers narrow the tree until a recommended outcome is reached. You can also embed this tree on your own site.