Service Errors
This guide covers common service errors encountered with EPMware Agent, including diagnostic procedures and resolution steps.
Common Service Errors
Connection Errors
Error: Connection Refused
Symptoms:
Causes: - EPMware server is down - Firewall blocking connection - Incorrect server URL or port - Network connectivity issues
Resolution:
-
Verify server is accessible:
-
Check firewall rules:
-
Verify agent configuration:
-
Test from different network location:
Error: Connection Timeout
Symptoms:
Resolution:
-
Increase timeout in agent.properties:
-
Check for proxy requirements:
-
Optimize network route:
Authentication Errors
Error: 401 Unauthorized
Symptoms:
Resolution:
-
Verify token is correct:
-
Regenerate token:
- Log into EPMware
- Navigate to Security → Users
- Generate new token for agent user
-
Update agent.properties
-
Check user permissions:
- Verify agent user is active
- Confirm user has required roles
- Check no IP restrictions
Error: 403 Forbidden
Symptoms:
Resolution:
- Verify user permissions in EPMware:
- Agent user needs appropriate security class
- Check application access rights
-
Verify server configuration access
-
Check SSL certificate issues:
Java/JVM Errors
Error: OutOfMemoryError
Symptoms:
Resolution:
-
Increase heap size in ew_target_service.sh:
-
Monitor memory usage:
Error: ClassNotFoundException
Symptoms:
Resolution:
-
Verify JAR file integrity:
-
Check Java classpath:
File System Errors
Error: Permission Denied
Symptoms:
ERROR: Permission denied: /home/user/logs/agent.log
java.io.FileNotFoundException: Permission denied
Resolution:
-
Fix file permissions:
-
Check SELinux (Linux):
Error: No Space Left on Device
Symptoms:
Resolution:
-
Check disk space:
-
Clean up logs:
-
Implement log rotation:
Service Management Errors
Error: Service Fails to Start
Symptoms: - Scheduled task shows "Failed" - systemd service won't start - No process created
Resolution:
-
Check for existing process:
-
Review system logs:
-
Verify prerequisites:
Error: Service Keeps Restarting
Symptoms: - Service restarts every few minutes - Multiple PIDs in short time - Logs show repeated startup attempts
Resolution:
-
Check restart configuration:
-
Investigate crash cause:
Application-Specific Errors
Error: EPMLCM-13000 (Planning)
Symptoms:
Resolution:
- Verify Planning application provisioning:
- Check user is provisioned to Planning application
- Verify application is running
-
Test with EPM Automate or Workspace
-
Check Planning services:
Error: HFM Registry Not Found
Symptoms:
Resolution:
-
Copy reg.properties to correct location:
-
Verify file permissions:
Error Diagnosis Tools
Log Analysis Script
Create a script to analyze agent logs for errors:
analyze-errors.sh:
#!/bin/bash
LOG_DIR="logs"
REPORT_FILE="error-report-$(date +%Y%m%d).txt"
echo "EPMware Agent Error Analysis Report" > $REPORT_FILE
echo "Generated: $(date)" >> $REPORT_FILE
echo "==========================================" >> $REPORT_FILE
echo >> $REPORT_FILE
# Count errors by type
echo "Error Summary:" >> $REPORT_FILE
echo "--------------" >> $REPORT_FILE
grep -h ERROR $LOG_DIR/*.log | cut -d' ' -f5- | sort | uniq -c | sort -rn | head -20 >> $REPORT_FILE
echo >> $REPORT_FILE
echo "Recent Errors (Last 24 hours):" >> $REPORT_FILE
echo "-------------------------------" >> $REPORT_FILE
find $LOG_DIR -name "*.log" -mtime -1 -exec grep ERROR {} \; | tail -20 >> $REPORT_FILE
echo >> $REPORT_FILE
echo "Connection Issues:" >> $REPORT_FILE
echo "------------------" >> $REPORT_FILE
grep -h "Connection\|Timeout\|refused" $LOG_DIR/*.log | tail -10 >> $REPORT_FILE
echo >> $REPORT_FILE
echo "Memory Issues:" >> $REPORT_FILE
echo "--------------" >> $REPORT_FILE
grep -h "OutOfMemory\|heap" $LOG_DIR/*.log | tail -10 >> $REPORT_FILE
echo "Report saved to: $REPORT_FILE"
cat $REPORT_FILE
Health Check Script
health-check.sh:
#!/bin/bash
echo "=== EPMware Agent Health Check ==="
echo
# Check 1: Process Running
echo -n "Agent Process: "
if pgrep -f epmware-agent > /dev/null; then
echo "✓ Running (PID: $(pgrep -f epmware-agent))"
else
echo "✗ Not Running"
fi
# Check 2: Recent Polls
echo -n "Recent Polls: "
LAST_POLL=$(tail -1 logs/agent-poll.log 2>/dev/null | awk '{print $1, $2}')
if [ -n "$LAST_POLL" ]; then
echo "✓ Last poll: $LAST_POLL"
else
echo "✗ No recent polls"
fi
# Check 3: Error Count
echo -n "Recent Errors: "
ERROR_COUNT=$(find logs -name "*.log" -mmin -60 -exec grep ERROR {} \; 2>/dev/null | wc -l)
if [ $ERROR_COUNT -eq 0 ]; then
echo "✓ No errors in last hour"
else
echo "⚠ $ERROR_COUNT errors in last hour"
fi
# Check 4: Disk Space
echo -n "Disk Space: "
DISK_USE=$(df -h . | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $DISK_USE -lt 80 ]; then
echo "✓ ${DISK_USE}% used"
else
echo "⚠ ${DISK_USE}% used (cleanup recommended)"
fi
# Check 5: Memory Usage
echo -n "Memory Usage: "
if pgrep -f epmware-agent > /dev/null; then
MEM=$(ps aux | grep epmware-agent | grep -v grep | awk '{print $4}')
echo "✓ ${MEM}% of system memory"
fi
echo
echo "=== Health Check Complete ==="
Recovery Procedures
Emergency Restart Procedure
-
Kill all agent processes:
-
Clean up temporary files:
-
Archive problematic logs:
-
Start with debug mode:
Rollback Procedure
If new agent version causes issues:
- Stop current agent
- Backup current configuration:
- Restore previous version:
- Restart agent
- Monitor for stability
Monitoring for Errors
Automated Error Detection
Set up automated monitoring for critical errors:
#!/bin/bash
# monitor-errors.sh - Run via cron every 5 minutes
CRITICAL_PATTERNS="OutOfMemory|Connection refused|Authentication failed|FATAL"
ALERT_EMAIL="admin@company.com"
# Check for critical errors
ERRORS=$(grep -E "$CRITICAL_PATTERNS" logs/agent.log | tail -5)
if [ -n "$ERRORS" ]; then
echo "$ERRORS" | mail -s "EPMware Agent Critical Error on $(hostname)" $ALERT_EMAIL
fi
Best Practices
- Implement comprehensive logging at appropriate levels
- Set up proactive monitoring for common errors
- Document error patterns specific to your environment
- Create runbooks for common error scenarios
- Test recovery procedures regularly
- Maintain error knowledge base for your team
- Review logs regularly even when no issues are reported