Should I use machine learning or rule-based logic for this problem?
Decision tree
datamachine learningaianalyticssoftware design
Decide whether your prediction or classification problem calls for hand-crafted rules, a classical ML model, deep learning, or a pre-built AI API. Answering questions about your data volume, explainability needs, and problem complexity will surface the approach with the best effort-to-value ratio.
Overview
Decision Tree
Start: Does a well-maintained pre-trained model or API already exist for this exact domain?
yes
- Outcome: Use a Pre-built API
no
- Continues to question: Do you have at least a few thousand labelled training examples available?
Machine-Readable JSON (Canonical Model)
View JSON
{
"_meta": {
"schema": "https://www.drawdecisiontree.com/decision-dag.schema.json",
"source": "https://www.drawdecisiontree.com",
"description": "DrawDecisionTree.com is a free tool for building, sharing, and embedding interactive decision trees. This file is the machine-readable export of a published decision tree. The `dsl` field contains the original source in the Decision DAG DSL; the `dag` schema is documented at the URL in `schema` above.",
"links": {
"interactive": "https://www.drawdecisiontree.com/t/drawdecisiontree/data-ml-vs-rules.html",
"embed": "https://www.drawdecisiontree.com/embed/path/drawdecisiontree/data-ml-vs-rules",
"dsl_reference": "https://www.drawdecisiontree.com/decision-tree-dsl-reference.html",
"guides": "https://www.drawdecisiontree.com/guides",
"schema_docs": "https://www.drawdecisiontree.com/decision-dag.schema.json",
"author_trees": "https://www.drawdecisiontree.com/trees/drawdecisiontree"
},
"generated_at": "2026-05-29T12:05:39.269Z"
},
"author": {
"handle": "drawdecisiontree",
"first_name": "Andrew",
"last_name": null,
"avatar_url": "1d32d828-b6ca-40ec-bdd7-771fe7b9c36a/avatar-1778531481027.svg",
"display_name": "Andrew"
},
"file": {
"id": "10d92182-b53d-4d69-bf83-8436c0f3f9b3",
"name": "Should I use machine learning or rule-based logic for this problem?",
"public_slug": "data-ml-vs-rules",
"updated_at": "2026-05-12T16:53:43.587978+00:00",
"url": "https://www.drawdecisiontree.com/t/drawdecisiontree/data-ml-vs-rules.html",
"json_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/data-ml-vs-rules/tree.json",
"dsl_url": "https://www.drawdecisiontree.com/t/drawdecisiontree/data-ml-vs-rules/tree.dag"
},
"meta": {
"description": "Decide whether your prediction or classification problem calls for hand-crafted rules, a classical ML model, deep learning, or a pre-built AI API. Answering questions about your data volume, explainability needs, and problem complexity will surface the approach with the best effort-to-value ratio.",
"mode": "decision",
"entry": "Q1",
"tags": [
"data",
"machine learning",
"ai",
"analytics",
"software design"
],
"image": "https://images.unsplash.com/photo-1677442135703-1787eea5ce01?w=1200&q=80"
},
"questions": [
{
"id": "Q1",
"text": "Does a well-maintained pre-trained model or API already exist for this exact domain?"
},
{
"id": "Q2",
"text": "Do you have at least a few thousand labelled training examples available?"
},
{
"id": "Q3",
"text": "Is the model's decision-making process required to be explainable to stakeholders or regulators?"
},
{
"id": "Q4",
"text": "Is the problem complex enough that a flat if/then ruleset would require more than ~20 rules to cover common cases?"
},
{
"id": "Q5",
"text": "Does the input data involve unstructured content such as images, audio, or raw text at scale?"
}
],
"outcomes": [
{
"id": "SIMPLE_RULES",
"label": "Simple Rules (if/then Logic)"
},
{
"id": "DECISION_TREE",
"label": "Decision Tree / Scorecard"
},
{
"id": "TRAD_ML",
"label": "Traditional ML Model"
},
{
"id": "DEEP_LEARNING",
"label": "Deep Learning / Neural Network"
},
{
"id": "PREBUILT_API",
"label": "Use a Pre-built API"
}
],
"dsl": "dag: Should I use machine learning or rule-based logic for this problem?\nversion: 1.0.0\nimage: https://images.unsplash.com/photo-1677442135703-1787eea5ce01?w=1200&q=80\ndescription: Decide whether your prediction or classification problem calls for hand-crafted rules, a classical ML model, deep learning, or a pre-built AI API. Answering questions about your data volume, explainability needs, and problem complexity will surface the approach with the best effort-to-value ratio.\ntags: data, machine learning, ai, analytics, software design\nentry: Q1\n\nQ1: Does a well-maintained pre-trained model or API already exist for this exact domain?\n hint: Examples include OpenAI for text generation and summarisation, AWS Rekognition for image classification, Google Cloud Vision for OCR, or Azure Cognitive Services for sentiment analysis. If a vendor already solved this problem at scale, building from scratch is rarely justified. Consider the API route only if you can tolerate the vendor's latency, pricing, and data-sharing terms. Check whether the vendor's model has been evaluated on data similar to yours before committing — generic APIs can perform poorly on domain-specific vocabulary or image types.\n yes -> [PREBUILT_API]\n no -> Q2\n\nQ2: Do you have at least a few thousand labelled training examples available?\n hint: Rule-of-thumb thresholds vary by problem, but classical ML typically needs hundreds to low thousands of labelled rows; deep learning needs tens of thousands or more. If your labelled data is sparse or expensive to collect, a rules-based or scorecard approach may outperform a poorly-trained model and will certainly be faster to deploy. Consider whether you can grow the dataset over time before committing to an ML-first architecture — active learning and human-in-the-loop labelling pipelines can accelerate data collection significantly.\n yes -> Q3\n no -> [SIMPLE_RULES]\n\nQ3: Is the model's decision-making process required to be explainable to stakeholders or regulators?\n hint: Regulated industries — credit scoring, healthcare triage, insurance underwriting — often require that every individual decision can be justified in plain language. If your legal or compliance team needs to audit model outputs, black-box approaches create significant risk. Gradient-boosted trees, logistic regression, and scorecards offer good accuracy while remaining interpretable; neural networks require post-hoc methods like SHAP which are harder to defend in a regulatory context.\n yes -> Q4\n no -> Q5\n\nQ4: Is the problem complex enough that a flat if/then ruleset would require more than ~20 rules to cover common cases?\n hint: A problem with many interacting features — e.g. credit risk involving income, debt, employment type, and payment history simultaneously — quickly becomes intractable as a hand-authored ruleset. Decision tree-based scorecards capture these interactions automatically while still producing a path of human-readable splits. If the domain expert can write the decision logic on a single whiteboard without contradictions, stick with simple rules rather than introducing model governance overhead.\n yes -> [DECISION_TREE]\n no -> [SIMPLE_RULES]\n\nQ5: Does the input data involve unstructured content such as images, audio, or raw text at scale?\n hint: Deep learning excels at tasks where features must be learned from raw signals — computer vision, speech recognition, large-scale NLP beyond simple keyword matching. If your inputs are structured tabular rows with engineered features, classical ML will usually match deep learning accuracy at a fraction of the infrastructure and training cost. Only invest in neural networks when you have the GPU budget, data volume, and MLOps maturity to support long training runs, versioned model artefacts, and continuous monitoring pipelines.\n yes -> [DEEP_LEARNING]\n no -> [TRAD_ML]\n\n[SIMPLE_RULES]: Simple Rules (if/then Logic)\n color: #48BB78\n description: A hand-authored ruleset is the fastest path to a production decision system when domain knowledge is well-understood and labelled data is scarce. Document every rule in a version-controlled YAML or spreadsheet so that changes are auditable, and implement a monitoring dashboard to track rule trigger rates over time. Start with the highest-impact rules and extend incrementally as edge cases emerge. Re-evaluate after six months whether growing labelled data volume now justifies a machine learning upgrade — keep track of cases the rules get wrong to bootstrap a future training set.\n code: ML_SIMPLE_RULES\n\n[DECISION_TREE]: Decision Tree / Scorecard\n color: #4F86C6\n description: A trained decision tree or logistic scorecard delivers the interpretability of rules with the accuracy of a learned model. Use gradient-boosted shallow trees (XGBoost or LightGBM with max_depth ≤ 4) or a logistic regression with binned features to produce a scorecard that can be reviewed by a credit officer or clinician. Store feature importance scores alongside the model artefact and schedule quarterly re-training as the data distribution evolves. Document the feature engineering pipeline thoroughly so that any analyst can reproduce the score, and version the model in MLflow or a similar registry.\n code: ML_DECISION_TREE\n\n[TRAD_ML]: Traditional ML Model\n color: #ED8936\n description: Classical supervised learning — gradient-boosted trees, random forests, or regularised linear models — is the right default for structured tabular data with adequate labelled examples. Start with a baseline (logistic regression or XGBoost) before reaching for more complex architectures. Track model performance metrics in an experiment registry such as MLflow or Weights & Biases, and set up automated drift detection so that degrading accuracy triggers a re-training alert. Define a clear champion/challenger process for promoting new model versions to production and document your feature engineering logic in a reproducible pipeline.\n code: ML_TRAD_ML\n\n[DEEP_LEARNING]: Deep Learning / Neural Network\n color: #9F7AEA\n description: Deep learning is warranted when the input modality is unstructured (images, audio, text) and you have sufficient labelled data and GPU infrastructure to support training. Begin with transfer learning from a pre-trained foundation model — a fine-tuned BERT for NLP or a ResNet backbone for vision — rather than training from scratch, as this dramatically reduces data requirements. Invest in robust MLOps: containerised training jobs, model versioning, A/B testing infrastructure, and automated re-training pipelines. Monitor for data drift and fairness metrics continuously in production, and document compute costs so that the business can make an informed ROI assessment.\n code: ML_DEEP_LEARNING\n\n[PREBUILT_API]: Use a Pre-built API\n color: #38B2AC\n description: Leveraging a managed AI API is almost always the fastest and most cost-effective route when a reputable vendor has already solved your domain problem at scale. Integrate via the vendor's SDK, implement retry logic and rate-limit handling from day one, and log every request/response pair for audit and debugging. Evaluate vendor lock-in risk by abstracting the integration behind an internal interface so you can swap providers without rewriting business logic. Review the vendor's data retention and privacy policies carefully before sending any sensitive or regulated data, and benchmark latency and cost at your expected production request volume.\n code: ML_PREBUILT_API\n"
}DSL Representation
dag: Should I use machine learning or rule-based logic for this problem?
version: 1.0.0
image: https://images.unsplash.com/photo-1677442135703-1787eea5ce01?w=1200&q=80
description: Decide whether your prediction or classification problem calls for hand-crafted rules, a classical ML model, deep learning, or a pre-built AI API. Answering questions about your data volume, explainability needs, and problem complexity will surface the approach with the best effort-to-value ratio.
tags: data, machine learning, ai, analytics, software design
entry: Q1
Q1: Does a well-maintained pre-trained model or API already exist for this exact domain?
hint: Examples include OpenAI for text generation and summarisation, AWS Rekognition for image classification, Google Cloud Vision for OCR, or Azure Cognitive Services for sentiment analysis. If a vendor already solved this problem at scale, building from scratch is rarely justified. Consider the API route only if you can tolerate the vendor's latency, pricing, and data-sharing terms. Check whether the vendor's model has been evaluated on data similar to yours before committing — generic APIs can perform poorly on domain-specific vocabulary or image types.
yes -> [PREBUILT_API]
no -> Q2
Q2: Do you have at least a few thousand labelled training examples available?
hint: Rule-of-thumb thresholds vary by problem, but classical ML typically needs hundreds to low thousands of labelled rows; deep learning needs tens of thousands or more. If your labelled data is sparse or expensive to collect, a rules-based or scorecard approach may outperform a poorly-trained model and will certainly be faster to deploy. Consider whether you can grow the dataset over time before committing to an ML-first architecture — active learning and human-in-the-loop labelling pipelines can accelerate data collection significantly.
yes -> Q3
no -> [SIMPLE_RULES]
Q3: Is the model's decision-making process required to be explainable to stakeholders or regulators?
hint: Regulated industries — credit scoring, healthcare triage, insurance underwriting — often require that every individual decision can be justified in plain language. If your legal or compliance team needs to audit model outputs, black-box approaches create significant risk. Gradient-boosted trees, logistic regression, and scorecards offer good accuracy while remaining interpretable; neural networks require post-hoc methods like SHAP which are harder to defend in a regulatory context.
yes -> Q4
no -> Q5
Q4: Is the problem complex enough that a flat if/then ruleset would require more than ~20 rules to cover common cases?
hint: A problem with many interacting features — e.g. credit risk involving income, debt, employment type, and payment history simultaneously — quickly becomes intractable as a hand-authored ruleset. Decision tree-based scorecards capture these interactions automatically while still producing a path of human-readable splits. If the domain expert can write the decision logic on a single whiteboard without contradictions, stick with simple rules rather than introducing model governance overhead.
yes -> [DECISION_TREE]
no -> [SIMPLE_RULES]
Q5: Does the input data involve unstructured content such as images, audio, or raw text at scale?
hint: Deep learning excels at tasks where features must be learned from raw signals — computer vision, speech recognition, large-scale NLP beyond simple keyword matching. If your inputs are structured tabular rows with engineered features, classical ML will usually match deep learning accuracy at a fraction of the infrastructure and training cost. Only invest in neural networks when you have the GPU budget, data volume, and MLOps maturity to support long training runs, versioned model artefacts, and continuous monitoring pipelines.
yes -> [DEEP_LEARNING]
no -> [TRAD_ML]
[SIMPLE_RULES]: Simple Rules (if/then Logic)
color: #48BB78
description: A hand-authored ruleset is the fastest path to a production decision system when domain knowledge is well-understood and labelled data is scarce. Document every rule in a version-controlled YAML or spreadsheet so that changes are auditable, and implement a monitoring dashboard to track rule trigger rates over time. Start with the highest-impact rules and extend incrementally as edge cases emerge. Re-evaluate after six months whether growing labelled data volume now justifies a machine learning upgrade — keep track of cases the rules get wrong to bootstrap a future training set.
code: ML_SIMPLE_RULES
[DECISION_TREE]: Decision Tree / Scorecard
color: #4F86C6
description: A trained decision tree or logistic scorecard delivers the interpretability of rules with the accuracy of a learned model. Use gradient-boosted shallow trees (XGBoost or LightGBM with max_depth ≤ 4) or a logistic regression with binned features to produce a scorecard that can be reviewed by a credit officer or clinician. Store feature importance scores alongside the model artefact and schedule quarterly re-training as the data distribution evolves. Document the feature engineering pipeline thoroughly so that any analyst can reproduce the score, and version the model in MLflow or a similar registry.
code: ML_DECISION_TREE
[TRAD_ML]: Traditional ML Model
color: #ED8936
description: Classical supervised learning — gradient-boosted trees, random forests, or regularised linear models — is the right default for structured tabular data with adequate labelled examples. Start with a baseline (logistic regression or XGBoost) before reaching for more complex architectures. Track model performance metrics in an experiment registry such as MLflow or Weights & Biases, and set up automated drift detection so that degrading accuracy triggers a re-training alert. Define a clear champion/challenger process for promoting new model versions to production and document your feature engineering logic in a reproducible pipeline.
code: ML_TRAD_ML
[DEEP_LEARNING]: Deep Learning / Neural Network
color: #9F7AEA
description: Deep learning is warranted when the input modality is unstructured (images, audio, text) and you have sufficient labelled data and GPU infrastructure to support training. Begin with transfer learning from a pre-trained foundation model — a fine-tuned BERT for NLP or a ResNet backbone for vision — rather than training from scratch, as this dramatically reduces data requirements. Invest in robust MLOps: containerised training jobs, model versioning, A/B testing infrastructure, and automated re-training pipelines. Monitor for data drift and fairness metrics continuously in production, and document compute costs so that the business can make an informed ROI assessment.
code: ML_DEEP_LEARNING
[PREBUILT_API]: Use a Pre-built API
color: #38B2AC
description: Leveraging a managed AI API is almost always the fastest and most cost-effective route when a reputable vendor has already solved your domain problem at scale. Integrate via the vendor's SDK, implement retry logic and rate-limit handling from day one, and log every request/response pair for audit and debugging. Evaluate vendor lock-in risk by abstracting the integration behind an internal interface so you can swap providers without rewriting business logic. Review the vendor's data retention and privacy policies carefully before sending any sensitive or regulated data, and benchmark latency and cost at your expected production request volume.
code: ML_PREBUILT_API
Machine Access
- Static JSON:
/t/drawdecisiontree/data-ml-vs-rules/tree.json - Live JSON (SPA):
/json/drawdecisiontree/data-ml-vs-rules - Raw DSL:
/t/drawdecisiontree/data-ml-vs-rules/tree.dag - Canonical HTML:
/t/drawdecisiontree/data-ml-vs-rules.html
Questions in this decision tree
- Does a well-maintained pre-trained model or API already exist for this exact domain?
- Do you have at least a few thousand labelled training examples available?
- Is the model's decision-making process required to be explainable to stakeholders or regulators?
- Is the problem complex enough that a flat if/then ruleset would require more than ~20 rules to cover common cases?
- Does the input data involve unstructured content such as images, audio, or raw text at scale?
Possible outcomes
- Simple Rules (if/then Logic)
- Decision Tree / Scorecard
- Traditional ML Model
- Deep Learning / Neural Network
- Use a Pre-built API
How to use this decision tree
Click "Open interactive version" to step through the questions. Your answers narrow the tree until a recommended outcome is reached. You can also embed this tree on your own site.
More decision trees by Andrew
Which API design pattern is right for my project?
Determine the right API design style for your integration scenario.
Authentication Method Selection
Authentication is a security-critical, high-friction decision to reverse — migrating users from one auth method to another requires coordinated password resets or credential migration campaigns. This tree eliminates methods that don't match your user type, enterprise requirements, and security posture, giving you a clear shortlist before you write a line of code.
Caching Strategy Selection
Premature or misapplied caching adds complexity — stale data bugs, invalidation logic, and distributed consistency problems — without solving the actual bottleneck. This tree routes you to the caching pattern that matches your data access profile, so you apply the right tool to the right problem rather than defaulting to Redis for everything.
CI/CD Pipeline Tool Selection
Choosing a CI/CD platform is a long-term infrastructure commitment — pipelines accumulate config, custom scripts, and team muscle memory that make switching painful. This tree eliminates tools that don't fit your source control host, infrastructure model, or team scale, leaving only the options genuinely viable for your situation.
Which cloud provider should I use — AWS, Azure, or Google Cloud?
Answer a few questions to identify the most suitable cloud platform for your workload.
Container Orchestration Platform Selection
Container orchestration is foundational infrastructure — the platform you choose shapes how you deploy, scale, network, and operate every service you run. This tree eliminates options that don't match your operational maturity, cloud provider commitment, and workload complexity, so you land on the platform that fits your team today without over-engineering for a scale you haven't reached.