HSR Sector 6 · Bangalore +91 96110 27980 Mon–Sat · 09:30–20:30
Chapter 8 of 20 — DevOps Fundamentals
intermediate Chapter 8 of 20

Ansible Configuration Management — Playbooks, Roles & Automation

By Vikas Swami, CCIE #22239 | Updated Mar 2026 | Free Course

What Ansible Configuration Management Is and Why It Matters in 2026

Ansible configuration management is an agentless automation framework that uses SSH and Python to enforce desired state across servers, network devices, and cloud infrastructure. Unlike traditional configuration tools that require client software on every managed node, Ansible operates from a single control machine, executing declarative YAML playbooks that describe what your infrastructure should look like rather than how to build it. In 2026, as Indian enterprises migrate to hybrid cloud architectures and DevOps adoption accelerates across Cisco India, HCL, Aryaka, and Akamai India, Ansible has become the de facto standard for configuration drift prevention, compliance automation, and infrastructure-as-code workflows.

Configuration management solves the core problem of infrastructure entropy. When you manually configure 50 routers or 200 Linux servers, each system gradually diverges from its intended state through ad-hoc changes, security patches, and human error. Ansible enforces idempotency—running the same playbook multiple times produces identical results without unintended side effects. This matters critically for network engineers transitioning into DevOps roles at Bangalore-based service providers where a single misconfigured BGP peer or firewall rule can cascade into customer-impacting outages.

The tool's architecture separates control logic from execution. Your laptop or a dedicated Ansible Tower server acts as the control node, storing playbooks in Git repositories. Managed nodes—whether Cisco ASA firewalls, Ubuntu web servers, or AWS EC2 instances—require only SSH access and Python 2.7+ (Python 3.6+ recommended). Ansible connects via SSH, transfers minimal Python modules to /tmp, executes them, retrieves results, and cleans up. No persistent agents consume memory or create attack surface. For organizations pursuing CERT-In compliance or RBI cybersecurity frameworks, this agentless model simplifies audit trails and reduces the number of listening services on production systems.

In our HSR Layout lab, we maintain 24×7 rack access where students configure Ansible to manage Cisco routers, Palo Alto firewalls, and Kubernetes clusters simultaneously. The same playbook syntax that configures a VLAN on a Catalyst switch can provision an S3 bucket in AWS or deploy a Docker container—this polyglot capability makes Ansible the Swiss Army knife for infrastructure teams. Our AWS DevOps course in Bangalore dedicates four weeks to Ansible automation because every hiring partner from Wipro to Accenture now lists "Ansible experience" in job descriptions for roles paying ₹6-12 LPA.

How Ansible Executes Playbooks Under the Hood

Ansible's execution model follows a push-based architecture where the control node initiates all operations. When you run ansible-playbook site.yml, the Ansible engine parses YAML syntax, builds an internal representation of tasks, and generates a dependency graph. It then connects to inventory hosts in parallel (default 5 forks, configurable to 50+ for large deployments), transfers Python modules via SFTP or SCP, executes them in a temporary directory, and streams JSON-formatted results back to the control node.

Each task in a playbook invokes a module—a self-contained Python script that performs one specific action. The yum module installs packages on Red Hat systems, ios_config pushes commands to Cisco IOS devices, ec2_instance provisions AWS virtual machines. Modules are idempotent by design: the file module checks if /etc/app/config.json already exists with the correct permissions before attempting to create it. This check-then-act pattern prevents unnecessary changes and makes playbooks safe to run repeatedly.

Ansible uses Jinja2 templating to inject variables into playbooks and configuration files. A single playbook can configure 100 routers with unique hostnames, IP addresses, and BGP AS numbers by reading variables from inventory files or external databases. The template engine evaluates conditionals, loops, and filters at runtime. For example, this snippet generates router configurations dynamically:

- name: Configure BGP on Cisco routers
  ios_config:
    lines:
      - router bgp {{ bgp_asn }}
      - neighbor {{ peer_ip }} remote-as {{ peer_asn }}
      - network {{ advertised_network }} mask {{ subnet_mask }}
  when: device_role == "edge_router"

The when conditional ensures BGP configuration only applies to edge routers, not core switches. Variables like bgp_asn and peer_ip come from host_vars files or group_vars directories, allowing you to maintain one playbook for diverse network topologies. In production environments at Cisco India's Bangalore office, network architects use this pattern to manage 500+ branch routers with a single Git repository.

Ansible's fact-gathering phase precedes task execution. Before running any tasks, Ansible connects to each host and executes the setup module, which collects system information—OS version, network interfaces, disk space, installed packages—and stores it as variables. Playbooks reference these facts to make intelligent decisions: "Install Apache only if the OS is Ubuntu 20.04 or newer" or "Configure eth0 only if it exists." You can disable fact gathering with gather_facts: no to speed up playbooks that don't need system information, a common optimization for network device automation where facts are irrelevant.

Error handling uses the block, rescue, and always keywords, similar to try-catch-finally in programming languages. If a task fails within a block, Ansible executes rescue tasks (rollback procedures, notifications), then always tasks (cleanup operations) regardless of success or failure. This pattern is essential for production deployments where a failed database migration must trigger automatic rollback and alert the on-call engineer.

Playbooks vs Roles vs Collections: Ansible's Organizational Hierarchy

Ansible provides three levels of code organization, each suited to different complexity scales. Understanding when to use playbooks, roles, or collections determines whether your automation codebase remains maintainable or devolves into spaghetti YAML that no one dares modify.

Construct Use Case Structure Reusability
Playbook Single-purpose automation (deploy one app, configure one service) Single YAML file with tasks, variables, handlers Low—copy-paste to reuse
Role Reusable component (web server setup, database installation) Directory with tasks/, handlers/, templates/, defaults/, vars/ High—import into any playbook
Collection Vendor-specific modules (Cisco IOS, AWS, VMware) or enterprise standards Namespace with roles, modules, plugins, documentation Ecosystem-wide—publish to Ansible Galaxy

A playbook is the entry point for Ansible execution. It defines which hosts to target, which roles to apply, and in what order. A typical playbook for deploying a three-tier web application might look like this:

---
- name: Deploy production web stack
  hosts: webservers
  become: yes
  roles:
    - common
    - nginx
    - php-fpm
    - monitoring

- name: Configure database tier
  hosts: dbservers
  become: yes
  roles:
    - common
    - postgresql
    - backup

Each role encapsulates all logic for one component. The nginx role contains tasks to install the package, templates for nginx.conf, handlers to reload the service, and default variables for worker processes and buffer sizes. Roles follow a standardized directory structure that Ansible recognizes automatically. When you reference roles: nginx, Ansible searches for roles/nginx/tasks/main.yml and executes those tasks in sequence.

Roles promote DRY (Don't Repeat Yourself) principles. If you need to deploy Nginx on 10 different projects, you write the role once and import it everywhere. Variables customize behavior: development environments might set nginx_worker_processes: 2 while production uses nginx_worker_processes: 16. Role dependencies allow you to declare "the nginx role requires the common role to run first," ensuring prerequisites like firewall rules and user accounts exist before Nginx installation begins.

Collections bundle roles, modules, and plugins into distributable packages. Cisco maintains cisco.ios and cisco.nxos collections with modules for configuring IOS and NX-OS devices. Amazon publishes amazon.aws with 100+ modules for EC2, S3, RDS, and Lambda. Collections use namespaced naming (cisco.ios.ios_config) to avoid conflicts when multiple vendors provide modules with similar names. You install collections from Ansible Galaxy or private Automation Hub servers using ansible-galaxy collection install cisco.ios.

In our AWS DevOps training program, students build a capstone project that uses roles for application deployment and the amazon.aws collection for infrastructure provisioning. This mirrors real-world workflows at Aryaka and Akamai India where network automation teams maintain internal role libraries for standard configurations (OSPF templates, ACL baselines, logging collectors) and consume vendor collections for device-specific operations.

Inventory Management: Static Files, Dynamic Sources, and Host Patterns

Ansible's inventory defines what hosts exist and how to group them. The simplest inventory is an INI-format file listing hostnames or IP addresses:

[webservers]
web1.example.com
web2.example.com

[dbservers]
db1.example.com
db2.example.com

[loadbalancers]
lb1.example.com ansible_host=203.0.113.10 ansible_user=admin

Groups organize hosts by function, location, or environment. You can nest groups to create hierarchies: [production:children] contains webservers and dbservers groups, allowing you to target all production hosts with ansible-playbook -i inventory site.yml --limit production. Host variables override group variables, which override global variables, giving you fine-grained control over configuration values.

Dynamic inventories query external sources to build host lists at runtime. Cloud environments where instances launch and terminate hourly make static inventory files obsolete. Ansible supports inventory plugins for AWS EC2, Azure, GCP, VMware vCenter, and OpenStack. The AWS EC2 plugin queries the EC2 API, retrieves all running instances in your account, and automatically creates groups based on tags, regions, and instance types. A playbook targeting tag_Environment_production automatically includes any EC2 instance tagged with Environment=production, even if it launched five minutes ago.

For network device inventories, organizations often integrate with IPAM systems (Infoblox, NetBox) or CMDB platforms (ServiceNow). A custom inventory script queries the CMDB API, retrieves all Cisco routers in the Bangalore region, and outputs JSON that Ansible consumes. This ensures your automation always targets the current production topology without manual inventory updates. At HCL's network operations center, dynamic inventories pull device lists from NetBox, which serves as the single source of truth for IP addressing and device roles.

Host patterns in playbook headers and ad-hoc commands use wildcards and boolean logic. hosts: web* targets all hosts starting with "web". hosts: webservers:&dbservers targets hosts in both groups (intersection). hosts: all:!production targets everything except the production group. These patterns enable surgical automation: "Apply security patches to all development web servers in the Mumbai datacenter" translates to hosts: webservers:&development:&mumbai.

Inventory variables define connection parameters and host-specific data. ansible_host specifies the IP address when the inventory name differs from DNS. ansible_user and ansible_ssh_private_key_file configure authentication. ansible_network_os=ios tells Ansible to use IOS-specific modules for Cisco devices. Group variables stored in group_vars/webservers.yml apply to all web servers, while host variables in host_vars/web1.example.com.yml apply to one specific host. This layered variable system allows you to define company-wide defaults, override them per environment, and further override them per host.

Writing Idempotent Tasks and Handling State Drift

Idempotency means running a playbook multiple times produces the same result as running it once. This property is fundamental to configuration management—you should be able to execute your entire infrastructure playbook every hour without causing outages or accumulating duplicate configurations. Ansible modules are designed to be idempotent, but you must use them correctly to preserve this guarantee.

The command and shell modules are not idempotent by default because Ansible cannot predict what arbitrary commands do. Running command: echo "log_level=debug" >> /etc/app.conf appends a line every execution, creating duplicates. The correct approach uses the lineinfile module, which checks if the line exists before adding it:

- name: Set application log level
  lineinfile:
    path: /etc/app.conf
    regexp: '^log_level='
    line: 'log_level=debug'
    state: present

This task searches for any line starting with log_level=, replaces it with log_level=debug, or adds the line if it doesn't exist. Running it 100 times results in exactly one log_level=debug line in the file. The regexp parameter makes the task idempotent by defining what constitutes "the line already exists."

State drift occurs when manual changes or external processes modify systems between Ansible runs. An engineer SSH's into a server and edits /etc/hosts to troubleshoot a DNS issue, forgetting to update the Ansible playbook. The next playbook run doesn't touch /etc/hosts because that file isn't managed by any task, so the manual change persists. Over time, production systems accumulate dozens of undocumented modifications that exist nowhere in source control.

Ansible Tower (now part of Red Hat Automation Platform) addresses drift through scheduled playbook runs and compliance reporting. You configure Tower to execute your infrastructure playbook every 4 hours. If a task reports "changed" status, Tower logs the change and can trigger alerts. Persistent "changed" status on the same task indicates someone is manually modifying that resource between runs. Tower's job output shows exactly what changed, helping you identify drift sources.

The check_mode feature (invoked with --check) performs a dry run, reporting what would change without actually changing it. This is invaluable before running playbooks in production: ansible-playbook site.yml --check --diff shows a unified diff of every file that would be modified. If the diff reveals unexpected changes, you can investigate before applying them. In our 4-month paid internship at the Network Security Operations Division, interns must run check mode and get approval from senior engineers before executing playbooks against customer environments.

Handlers implement change-triggered actions. A handler is a task that only runs if notified by another task that reports "changed" status. Restarting a service after configuration changes is the canonical use case:

- name: Update Nginx configuration
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  notify: restart nginx

handlers:
  - name: restart nginx
    service:
      name: nginx
      state: restarted

If the template task changes nginx.conf, it notifies the handler. Ansible queues the handler and executes it once at the end of the playbook, even if multiple tasks notify it. This prevents restarting Nginx five times when five configuration files change—it restarts once after all changes complete. Handlers are idempotent because they only run when necessary, and the service module is itself idempotent (restarting an already-running service is safe).

Network Automation with Ansible: IOS, NX-OS, and Multi-Vendor Environments

Ansible's network automation capabilities extend beyond Linux servers to routers, switches, firewalls, and load balancers. The cisco.ios, cisco.nxos, arista.eos, and junipernetworks.junos collections provide modules that abstract vendor CLI differences behind a consistent YAML interface. A network engineer who knows how to configure VLANs on Cisco IOS can use the same Ansible syntax to configure VLANs on Arista EOS or Juniper Junos without learning each vendor's CLI.

Network modules use SSH or NETCONF to connect to devices. Unlike server automation where Ansible transfers Python modules to the target, network devices don't run Python. Instead, Ansible executes modules on the control node, which generate CLI commands or NETCONF XML and send them to the device over SSH. The module parses the device's response, determines if the desired state was achieved, and reports success or failure.

The ios_config module pushes configuration commands to Cisco IOS devices. This playbook configures OSPF on a router:

- name: Configure OSPF on Cisco router
  hosts: routers
  gather_facts: no
  tasks:
    - name: Enable OSPF process
      cisco.ios.ios_config:
        lines:
          - router ospf 100
          - network 10.0.0.0 0.255.255.255 area 0
          - network 192.168.1.0 0.0.0.255 area 1
        save_when: modified

The lines parameter lists commands to execute. Ansible compares these commands against the running configuration, only sending commands that aren't already present. The save_when: modified parameter writes changes to startup-config if any commands were applied, ensuring configurations survive reboots. This idempotent behavior prevents configuration bloat from repeated playbook runs.

For complex configurations, the ios_config module supports hierarchical commands using the parents parameter. Configuring an interface requires entering interface configuration mode first:

- name: Configure GigabitEthernet0/1
  cisco.ios.ios_config:
    lines:
      - description Uplink to Core Switch
      - ip address 10.0.1.1 255.255.255.0
      - no shutdown
    parents: interface GigabitEthernet0/1

Ansible generates the command sequence interface GigabitEthernet0/1, then the three sub-commands, then exit. The module parses the existing interface configuration and only sends commands that differ from the current state. If the interface already has IP address 10.0.1.1/24 and is administratively up, Ansible reports "ok" without making changes.

The ios_facts module retrieves device information—software version, serial number, interface status, routing table—and stores it as variables. You can use these facts to make conditional decisions: "Only configure BGP if IOS version is 15.2 or newer" or "Skip this router if it has fewer than 4 interfaces." Fact gathering for network devices is disabled by default (gather_facts: no) because it adds latency; you explicitly invoke ios_facts when needed.

Multi-vendor environments require careful inventory organization. Group devices by vendor and use ansible_network_os to specify the operating system:

[cisco_routers]
router1.example.com ansible_network_os=ios
router2.example.com ansible_network_os=ios

[arista_switches]
switch1.example.com ansible_network_os=eos
switch2.example.com ansible_network_os=eos

[all:vars]
ansible_connection=network_cli
ansible_user=admin
ansible_ssh_pass=!vault |
          $ANSIBLE_VAULT;1.1;AES256
          ...encrypted password...

Playbooks use vendor-specific modules based on ansible_network_os. A single playbook can configure Cisco, Arista, and Juniper devices by conditionally selecting the appropriate module. In practice, most organizations standardize on one or two vendors to simplify automation, but Ansible's multi-vendor support provides an escape hatch for acquisitions or legacy equipment.

At Cisco India's Bangalore development center, network architects use Ansible to manage lab environments with 200+ routers and switches. A single playbook resets all devices to baseline configurations, provisions VLANs and VRFs for new test scenarios, and tears down environments when testing completes. This automation reduced lab provisioning time from 4 hours of manual CLI work to 15 minutes of unattended execution, allowing more test iterations per day.

Ansible Vault: Encrypting Secrets and Managing Credentials

Ansible playbooks stored in Git repositories often contain sensitive data—database passwords, API keys, SSH private keys, SNMP community strings. Committing these secrets in plaintext violates security policies and creates audit findings. Ansible Vault encrypts files or individual variables using AES256, allowing you to safely commit encrypted secrets to version control while keeping the decryption password separate.

You encrypt an entire file with ansible-vault encrypt secrets.yml. Ansible prompts for a password, encrypts the file, and replaces its contents with a base64-encoded ciphertext block starting with $ANSIBLE_VAULT;1.1;AES256. To edit the file later, use ansible-vault edit secrets.yml, which decrypts it into a temporary file, opens your editor, and re-encrypts when you save. Running a playbook that references encrypted files requires the --ask-vault-pass flag or a vault password file.

Encrypting individual variables within a YAML file uses ansible-vault encrypt_string. This command encrypts a single value and outputs the encrypted string, which you paste into your playbook:

db_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          66386439653765386662306134636436643266656433323330613036353235
          ...truncated...

The !vault tag tells Ansible to decrypt this value at runtime. This approach allows you to encrypt only sensitive values while leaving non-sensitive configuration readable in plaintext, improving playbook maintainability. You can mix encrypted and unencrypted variables in the same file.

Vault password files automate decryption in CI/CD pipelines. Instead of prompting for a password interactively, you store the password in a file (protected by filesystem permissions) and reference it with --vault-password-file ~/.vault_pass. Jenkins or GitLab CI jobs retrieve the vault password from a secrets manager (HashiCorp Vault, AWS Secrets Manager) at runtime, write it to a temporary file, execute the playbook, and delete the file. This enables automated deployments without embedding passwords in CI configuration.

Multiple vault IDs allow different teams to encrypt different secrets with different passwords. The network team encrypts router credentials with one password, the database team encrypts PostgreSQL passwords with another. Playbooks specify which vault ID to use for each encrypted variable: db_password: !vault-id db@vault. Running the playbook requires providing multiple vault passwords: ansible-playbook site.yml --vault-id network@prompt --vault-id db@~/.db_vault_pass.

In our HSR Layout lab, students practice vault workflows by encrypting AWS access keys and SSH private keys in their capstone projects. We enforce a policy that any playbook containing credentials in plaintext fails code review. This mirrors real-world practices at Wipro and TCS where security audits flag unencrypted secrets as critical findings. Our DevOps fundamentals course includes a dedicated module on secrets management covering Ansible Vault, HashiCorp Vault integration, and AWS Systems Manager Parameter Store.

Ansible Tower and AWX: Enterprise Automation Platforms

Ansible Tower (the commercial product) and AWX (the upstream open-source project) provide a web UI, REST API, and RBAC system on top of Ansible. While the ansible-playbook CLI suffices for individual engineers, enterprise environments with 50+ playbooks and 20+ team members need centralized execution, audit logging, and access control. Tower transforms Ansible from a command-line tool into a self-service automation platform.

Tower's job templates define playbook execution parameters—which playbook to run, which inventory to use, which credentials to inject, which variables to prompt for. A network engineer with read-only Tower access can click "Launch" on a pre-approved job template to provision a new branch office network, even if they don't have SSH access to the routers or knowledge of Ansible syntax. The job template encapsulates the playbook, credentials, and safety checks (survey questions, approval workflows), democratizing automation across the organization.

Role-based access control restricts who can execute which playbooks against which inventories. The "network-ops" team can run network playbooks against production routers but not database playbooks. The "app-dev" team can run application deployment playbooks against development environments but not production. Tower's audit log records every playbook execution—who ran it, when, what changed, what failed—providing the compliance trail that CERT-In and ISO 27001 audits require.

Scheduled jobs execute playbooks on a cron-like schedule. A compliance playbook runs every night at 2 AM, checking that all servers have the latest security patches, correct firewall rules, and no unauthorized user accounts. If the playbook detects drift, Tower sends a Slack notification to the on-call engineer. This continuous compliance model replaces quarterly manual audits with automated daily verification.

Tower's credential management integrates with external secret stores. Instead of storing SSH keys and API tokens in Tower's database, you configure Tower to retrieve them from HashiCorp Vault or CyberArk at runtime. When a job executes, Tower requests the credential from Vault, injects it into the Ansible environment, runs the playbook, and discards the credential. The credential never touches disk in plaintext, and Tower's audit log shows which job used which credential without revealing the credential value.

Workflows chain multiple job templates into a directed graph. A deployment workflow might run "provision-infrastructure" (creates EC2 instances), then "configure-servers" (installs packages), then "deploy-application" (copies code), then "run-tests" (executes smoke tests). If any job fails, the workflow can execute rollback jobs or send notifications. Conditional branching allows "if tests pass, promote to production; if tests fail, destroy the environment and alert the team."

At Aryaka's Bangalore office, Tower manages SD-WAN edge device provisioning. When a customer orders a new site, the sales system creates a Tower job via REST API, passing site details as extra variables. Tower executes a workflow that provisions a virtual edge appliance in AWS, configures IPsec tunnels to the nearest PoP, updates the CMDB, and emails the customer with connection details. The entire process completes in 8 minutes without human intervention, compared to 2-3 hours of manual work previously.

Common Pitfalls and Interview Questions for Ansible Roles

Ansible interviews at Cisco India, Akamai, and Barracuda Networks probe your understanding of idempotency, error handling, and performance optimization. Interviewers present scenarios where naive playbook design causes outages or configuration drift, expecting you to identify the flaw and propose a fix.

Pitfall: Using command module for tasks that have dedicated modules. Candidates often write command: systemctl restart nginx instead of using the service module. The command module doesn't check if the service is already running, doesn't report "changed" status accurately, and doesn't handle errors gracefully. The correct approach uses service: name=nginx state=restarted, which is idempotent and provides meaningful status reporting.

Pitfall: Not using become for privilege escalation. Tasks that modify system files or restart services require root privileges. Forgetting become: yes causes "permission denied" errors. Worse, some candidates add sudo to shell commands (shell: sudo systemctl restart nginx), which bypasses Ansible's privilege escalation logging and breaks in environments where sudo requires a password. The correct pattern is become: yes at the play or task level, with ansible_become_pass in inventory for password-based sudo.

Pitfall: Ignoring changed_when and failed_when for shell tasks. The shell module always reports "changed" status, even if the command didn't modify anything. This pollutes Tower logs and triggers unnecessary handler notifications. Use changed_when to define what constitutes a change: changed_when: result.stdout.find('already exists') == -1. Similarly, failed_when customizes failure conditions for commands that use non-zero exit codes for non-error conditions.

Pitfall: Not testing playbooks with --check and --diff before production runs. Candidates who skip dry runs discover errors in production, causing outages. Interviewers ask "How do you validate a playbook before running it against 500 routers?" The expected answer includes check mode, limiting execution to a subset of hosts (--limit test-router), and reviewing diff output for unexpected changes.

Interview question: "How do you handle secrets in Ansible playbooks stored in Git?" The answer must mention Ansible Vault for encryption, vault password files for automation, and integration with external secrets managers (HashiCorp Vault, AWS Secrets Manager) for enterprise environments. Bonus points for discussing vault IDs for multi-team scenarios and the security implications of storing vault passwords in CI/CD systems.

Interview question: "Explain how Ansible achieves idempotency and why it matters." Candidates must define idempotency (same result regardless of execution count), explain that modules check current state before making changes, and provide an example of a non-idempotent task (shell: echo "line" >> file) versus an idempotent one (lineinfile). The "why it matters" part should mention configuration drift prevention, safe repeated execution, and compliance automation.

Interview question: "You have 1000 servers and a playbook takes 2 hours to complete. How do you optimize it?" Expected optimizations include increasing fork count (forks=50 in ansible.cfg), disabling fact gathering for tasks that don't need it, using async and poll for long-running tasks, enabling pipelining to reduce SSH round trips, and using strategy: free to allow faster hosts to proceed without waiting for slower ones. Advanced candidates mention splitting the playbook into smaller targeted playbooks and using Tower workflows to parallelize independent jobs.

In our 4-month paid internship, students encounter these pitfalls in controlled lab scenarios before facing them in production. We simulate a "playbook that accidentally deletes customer data" scenario where improper use of the file module with state: absent and a typo in the path variable wipes /var/www instead of /var/www/old-backup. Students learn to use --check, validate variable values with assert tasks, and implement approval gates in Tower workflows.

Real-World Deployment Scenarios Across Indian Enterprises

Ansible adoption in India spans telecom providers, financial services, e-commerce platforms, and IT service companies. Each sector uses Ansible differently based on regulatory requirements, scale, and existing toolchains. Understanding these deployment patterns helps you design automation that fits organizational constraints rather than imposing textbook architectures.

Telecom: Zero-touch provisioning for 5G edge nodes. Bharti Airtel and Reliance Jio deploy thousands of edge compute nodes for 5G services. When a new cell tower activates, an Ansible playbook provisions the edge server—installs Kubernetes, configures SR-IOV for network acceleration, deploys containerized VNFs (virtual network functions), and registers the node with the orchestration platform. The playbook runs from a central Tower cluster, triggered by the OSS/BSS system via REST API. Ansible's network automation modules configure the underlying Cisco or Nokia routers to establish IPsec tunnels back to regional data centers. This zero-touch model reduces deployment time from days to hours and eliminates manual configuration errors that cause service outages.

Banking: Compliance automation for RBI cybersecurity framework. HDFC Bank and ICICI Bank use Ansible to enforce RBI's cybersecurity guidelines across thousands of branch servers and ATM controllers. Playbooks run nightly to verify that SSH uses key-based authentication, firewall rules block unauthorized ports, and security patches are current. Any deviation triggers an incident ticket in ServiceNow and alerts the security operations center. Ansible Tower's audit logs provide evidence for RBI inspections, showing exactly when each compliance check ran and what it found. The bank's change advisory board pre-approves playbooks, and Tower's RBAC ensures only authorized personnel can execute them against production systems.

E-commerce: Continuous deployment for flash sales. Flipkart and Amazon India deploy application updates dozens of times per day during high-traffic events like Big Billion Days. Ansible playbooks pull the latest code from Git, run database migrations, update configuration files, and perform rolling restarts of application servers—all without downtime. The playbook uses serial: 10% to update 10% of servers at a time, running health checks after each batch. If health checks fail, the playbook aborts and rolls back to the previous version. This blue-green deployment pattern, orchestrated by Ansible, allows rapid iteration while maintaining 99.99% availability. Tower workflows integrate with Jenkins, triggering deployments automatically when CI tests pass.

IT services: Multi-tenant infrastructure for client projects. TCS and Infosys manage infrastructure for hundreds of clients, each with unique requirements. Ansible roles define standard components (web server, database, monitoring agent), and playbooks compose these roles with client-specific variables. A single Tower instance serves all clients, with inventories and credentials segregated by organization. When a new client onboards, the account team fills out a survey in Tower (number of servers, AWS region, compliance requirements), and a workflow provisions the entire stack—VPC, subnets, EC2 instances, RDS databases, CloudWatch alarms. The client receives a fully configured environment in 30 minutes, compared to 2-3 weeks of manual provisioning previously.

At Movate (formerly CSS Corp), where many of our graduates work, Ansible manages the global network infrastructure for a Fortune 500 client. Playbooks configure 800+ Cisco routers across 40 countries, ensuring consistent OSPF, BGP, and QoS policies. The network team maintains a Git repository with role-based configurations (edge router, core router, branch router) and uses Tower to execute playbooks during maintenance windows. Tower's scheduling feature staggers execution across time zones, ensuring changes happen during local off-peak hours. The audit trail satisfies SOC 2 requirements, and the version-controlled playbooks provide disaster recovery—if a router fails, they provision a replacement with the exact same configuration in minutes.

Frequently Asked Questions About Ansible Configuration Management

What is the difference between Ansible and Terraform?

Ansible and Terraform both automate infrastructure, but they serve different purposes. Terraform specializes in provisioning—creating cloud resources like EC2 instances, S3 buckets, and VPCs. It uses a declarative language (HCL) to define desired state and maintains a state file to track what exists. Ansible specializes in configuration management—installing packages, editing files, restarting services on existing servers. It uses an imperative language (YAML playbooks) and doesn't maintain state. In practice, teams use both: Terraform provisions infrastructure, then hands off to Ansible for configuration. For example, Terraform creates 10 EC2 instances, outputs their IP addresses, and Ansible configures those instances with web servers and applications. Some organizations use Ansible for both provisioning (via cloud modules) and configuration, accepting the trade-off that Ansible's stateless model makes drift detection harder.

How does Ansible connect to network devices that don't support SSH?

Older network devices use Telnet instead of SSH, and some use proprietary APIs. Ansible's network_cli connection plugin supports Telnet by setting ansible_connection=network_cli and ansible_network_os=ios (or the appropriate OS). For devices with REST APIs (Cisco ACI, Palo Alto Panorama), Ansible uses HTTP-based modules that send API requests instead of CLI commands. The uri module makes arbitrary REST calls, while vendor-specific modules (like panos_security_rule for Palo Alto) abstract API details behind Ansible's YAML syntax. For devices with no SSH, Telnet, or API support, you can use Ansible to configure a jump host that has access to the device, then use delegate_to to execute commands on the jump host that proxy to the target device. This pattern is common for serial console access to out-of-band management interfaces.

Can Ansible manage Windows servers?

Yes, Ansible manages Windows servers using WinRM (Windows Remote Management) instead of SSH. You configure Windows hosts with ansible_connection=winrm and install the pywinrm Python library on the control node. Ansible provides Windows-specific modules: win_feature installs Windows features, win_service manages services, win_copy transfers files, win_regedit modifies the registry. Playbooks for Windows look similar to Linux playbooks but use different module names. Authentication supports NTLM, Kerberos, and CredSSP. For domain-joined servers, Kerberos provides single sign-on. For workgroup servers, you specify username and password in inventory. Windows automation is less mature than Linux—some modules lack features, and error messages are less informative—but it's sufficient for common tasks like installing IIS, deploying .NET applications, and configuring Windows Firewall.

How do you test Ansible roles before using them in production?

Ansible roles should be tested in isolated environments using tools like Molecule, which automates the create-test-destroy cycle. Molecule provisions a test instance (Docker container, Vagrant VM, or cloud instance), applies the role, runs verification tests (using Testinfra or Ansible's own assert module), and destroys the instance. A typical Molecule workflow: molecule create spins up a container, molecule converge applies the role, molecule verify runs tests to confirm the role worked (e.g., "Nginx is installed and listening on port 80"), and molecule destroy cleans up. You integrate Molecule into CI pipelines so every Git commit triggers automated role testing. For network device roles, testing is harder because you can't easily spin up virtual routers. Organizations use GNS3 or Cisco VIRL to create virtual lab topologies, or they maintain a dedicated physical test lab. At Networkers Home, our HSR Layout lab includes a test rack where students can safely experiment with playbooks before running them against the production training environment.

What is the maximum number of hosts Ansible can manage simultaneously?

Ansible's scalability depends on control node resources and network latency. A single control node with 8 CPU cores and 16 GB RAM can manage 500-1000 hosts with default settings (5 forks). Increasing forks to 50-100 allows managing 5000+ hosts, but you'll hit control node CPU limits. For larger deployments (10,000+ hosts), use Ansible Tower with multiple execution nodes—Tower distributes playbook execution across a cluster of control nodes. Each execution node handles a subset of hosts, and Tower aggregates results. Network latency matters more than host count: managing 1000 hosts on a LAN is faster than managing 100 hosts over satellite links. For very large deployments, partition your infrastructure into regions and run separate Tower clusters per region, or use Ansible's strategy: free to allow fast hosts to proceed without waiting for slow ones. Cloud providers like AWS recommend batching operations—instead of configuring 10,000 EC2 instances with Ansible, use Auto Scaling Groups with launch templates that bake configuration into AMIs.

How does Ansible handle playbook failures and rollbacks?

Ansible stops executing tasks on a host when a task fails, but continues on other hosts. The any_errors_fatal: yes directive aborts the entire playbook if any host fails, useful for deployments where partial success is worse than no change. The ignore_errors: yes directive allows a task to fail without stopping the playbook, useful for optional tasks like "delete old log files" where failure is acceptable. For rollback, Ansible doesn't have built-in rollback like database transactions. You must design playbooks to be reversible: if a deployment playbook installs version 2.0, the rollback playbook installs version 1.9. Tower workflows can automate this: if the deployment job fails, Tower automatically runs the rollback job. The block/rescue/always pattern provides try-catch-finally logic within a playbook. Tasks in the block section execute normally; if any fail, tasks in the rescue section execute (rollback logic); tasks in the always section execute regardless of success or failure (cleanup logic). This pattern is essential for production deployments where you must guarantee the system is left in a consistent state even if the playbook fails midway.

What is the difference between include and import in Ansible?

Both include and import allow you to reuse tasks from external files, but they differ in when the inclusion happens. import_tasks is static—Ansible loads the included file at parse time, before execution begins. This means you can't use variables in the filename, and conditionals on the import statement apply to every task in the included file. include_tasks is dynamic—Ansible loads the file at runtime, during execution. You can use variables in the filename (include_tasks: "{{ ansible_os_family }}.yml") and conditionals only affect whether the include happens, not the tasks inside. Use import_tasks for static task lists that don't change based on runtime conditions. Use include_tasks when you need to conditionally include different task files or loop over a list of files. The same distinction applies to import_role vs include_role and import_playbook vs include_playbook. Dynamic includes add flexibility but make playbooks harder to debug because you can't see the full task list until runtime.

How do you manage Ansible playbook dependencies and versioning?

Ansible playbooks and roles should be stored in Git repositories with semantic versioning tags. When you reference a role from Ansible Galaxy or a private Git repository, specify the version: ansible-galaxy install geerlingguy.nginx,2.8.0. This ensures your playbook uses a known-good version rather than the latest (potentially broken) version. For internal roles, use Git submodules or a requirements.yml file that lists role dependencies with version pins. Tower and AWX support project syncing from Git, automatically pulling the latest commit from a specified branch. For production playbooks, use a stable branch (e.g., production) that only receives tested changes, while development happens on main. Collections introduce formal dependency management: a collection's galaxy.yml file declares dependencies on other collections, and ansible-galaxy collection install resolves and installs them. For Python dependencies (modules that require specific libraries), use a virtual environment or container image with pinned versions in requirements.txt. This prevents "it works on my laptop" issues where playbooks fail in production because the control node has a different Python library version.

Ready to Master DevOps Fundamentals?

Join 45,000+ students at Networkers Home. CCIE-certified trainers, 24x7 real lab access, and 100% placement support.

Explore Course