首页 运维 正文
系统管理员的软硬件维护清单

 2022-10-23    569  

春节长假将至,有些系统管理员们被老板要求写一份公司的软硬件维护清单,对于没写过此类文档的运维朋友们而言会感到很苦恼。

系统维护清单该怎么写?

系统管理员的软硬件维护清单

其实不光是在长假前后,系统管理员平时也应该养成按时(比如每天、每周、每月)按照维护清单进行软硬件维护的习惯。

简单而言,系统维护主要包括如下几个方面:

  1. 保持软件和系统的更新。软件更新通常包含bug修复和安全漏洞修复,这是为了你的安全着想。
  2. 杀毒软件的更新和定期查杀病毒。
  3. 检查你的系统监控数据是否完好的保存。各种监控。
  4. 检查系统的备份是否完好的保存。备份的重要性相信不用再强调了!
  5. 检查机房的物理环境,如温度、湿度等。
  6. 检查硬盘/RAID的情况,磁盘占用情况,是否有坏道。
  7. ……

    从某种角度而言,系统维护清单都应该是系统管理员们必须遵守的铁律。编辑在此推荐专题:系统运维秘诀大分享。其中包括:

    1. 系统管理员必须了解的六大铁律
    2. 系统管理员应该定期完成的九件事
    3. 运维人员应该时刻谨记的十条安全法则

      具体的系统维护清单,其实不少厂商(尤其是微软和IBM)都提供了软硬件维护清单的参考文档。可惜的是,大部分都还没有翻译成中文(这也是为什么技术人学好英文很重要,因为太多资料手册都是English Only)。下面摘录部分相关文档,以供大家参考。

      #p#

      微软BizTalk Server维护清单参考文档

      每日检查清单

      Steps Reference

      Check for failed disks in the hardware RAID (reliability check).

      "View Disk Properties" in the Windows Server2003 product Help athttp://go.microsoft.com/fwlink/?linkid=104161

      Check for messages requiring manual intervention such as suspended messages (reliability check).

      For information about manually checking for suspended messages see "Investigating Orchestration, Port, and Message Failures" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?linkid=104169

      For information about performing automated monitoring using Microsoft Operations Manager2005 see "Suspended Message Alerts" athttp://go.microsoft.com/fwlink/?linkid=105059

      Check the event logs for errors and warnings (administration check).

      BizTalk Server 2006 R2 errors and warning events are saved in the application log. The event source is "BizTalk Server2006". We recommend that you monitor the event log using an automated solution such as Microsoft System Center OperationsManager. For more information, seeMonitoring with MOM 2005 or Operations Manager 2007.

      每周检查清单

      Steps Reference

      Ensure that each host has an instance running on at least two physical BizTalk servers (reliability check).

      High Availability for BizTalk Hosts

      Ensure that each receive location is redundant (reliability check).

      Scaling Out Receiving Hosts

      Ensure that the SQL Server Agent service is running on the SQL server (administration check).

      • Monitoring SQL Server Agent Jobs
      • How to Start the SQL Server Agent
      • Monitoring SQL Server Agent Jobs and Databases
      • "SQL Server Agent" in SQL Server 2005 Books Online athttp://go.microsoft.com/fwlink/?LinkId=106728

      Ensure that all SQL Server jobs related to BizTalk Server are working properly (administration check).

      • Monitoring SQL Server Agent Jobs and Databases
      • Monitoring SQL Server Agent Jobs

      Ensure that the SQL Server jobs responsible for backing up BizTalk Server databases are running normally (administration check).

      • How to Schedule a Backup BizTalk Server Job
      • How to Configure a Backup BizTalk Server Job

      Ensure that the latest security updates are installed (security check).

      Microsoft Update site athttp://update.microsoft.com/microsoftupdate/v6/default.aspx

      Analyze weekly performance monitoring logs against baseline and thresholds (performance check).

      • Monitoring Throttling Using Performance Threshold Rules
      • Using the Performance Analysis of Logs (PAL) Tool
      • Identifying and Mitigating Performance Issues
      • Troubleshooting Performance Issues

      Ensure that the system is not experiencing frequent auto-growth of BizTalk Server databases (performance check).

      • Defining Auto-Growth Settings for Databases
      • Guidelines for Sizing the Tracking Database
      • Identifying Bottlenecks in the Database Tier
      • "Database File Initialization" in the SQL Server 2005 Books Online athttp://go.microsoft.com/fwlink/?LinkID=101579.
      • "SQL Server Maintenance" inBest Practices for Configuring SQL Server

      Run SQL Server Profiler during high load to check for long response times and high resource usage (performance check).

      "Using SQL Server Profiler" in the SQL Server 2005 Books Online athttp://go.microsoft.com/fwlink/?LinkID=106720

      Ensure that message batching for all adapters is appropriate for resource consumption or latency (performance check).

      • Configuring Batching to Improve Adapter Performance
      • "How to Design a Performant Adapter" in the BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkId=106720

      Ensure that the large message threshold is appropriate for resource consumption (performance check).

      • How to Adjust the Message Size Threshold
      • "How BizTalk Server Processes Large Messages" in the BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkId=82351

      每月检查清单

      Steps Reference

      Ensure the master secret key is backed up and readily available on offline storage (reliability check).

      How to Back Up the Master Secret

      Ensure that failover for all clustered services has been tested (reliability check).

      How to Test Group Failover

      Ensure that the Enterprise SSO service is clustered (reliability check).

      Clustering the Master Secret Server

      Ensure that the BizTalk Server databases are clustered under SQL Server services (reliability check).

      Clustering the BizTalk Server Databases

      Ensure that at least two physical BizTalk servers are part of the BizTalk group (reliability check).

      How to Ensure Multiple Servers Are Part of a BizTalk Group

      Determine whether any unstable code is being used, and if so, use separate hosts (reliability check).

      High Availability for BizTalk Hosts

      Perform functional testing of all new BizTalk applications (reliability check).

      • Testing an Application
      • "Staging Tasks for BizTalk Application Deployment" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkID=103092.

      Determine whether there are any unnecessary BizTalk applications, artifacts, and configurations (administration check).

      • Remove all unnecessary BizTalk applications, artifacts, and configurations.
      • For more information about removing a BizTalk application or artifact using the BTSTask command-line tool see "RemoveApp Command" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkID=106721.
      • For more information about removing an artifact from an application using either the BizTalk Server Administration console or the BTSTask command-line tool, see "How to Remove an Artifact from an Application" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkId=106722.

      Check the BizTalk Server Administration console for any non-approved changes (administration check).

      "Using the BizTalk Server Administration Console" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkId=106723.

      Check BTSNTSvc.exe.config for any non-approved modifications (administration check).

      "BTSNTSvc.exe.config File" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkId=106724.

      Check the BizTalk Server-related registry keys for any non-approved modifications (administration check).

      "Windows registry information for advanced users" article athttp://support.microsoft.com/kb/256986

      Run the Best Practices Analyzer for BizTalk Server (administration check).

      "BizTalk Server 2006 Best Practices Analyzer" article athttp://go.microsoft.com/fwlink/?LinkId=83317

      Ensure that the latest service packs and updates are installed (administration and security check).

      Microsoft Update site athttp://update.microsoft.com/microsoftupdate/v6/default.aspx

      Ensure that the artifacts for different trading partners are not installed on the same host (security check).

      Configuring Hosts and Host Instances

      Ensure that BizTalk Server is using only domain-level users and groups (security check).

      "Domain Groups" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkId=106725.

      Ensure that the MSDTC Security Configuration is enabled (security check).

      "Set the appropriate MSDTC Security Configuration options on Windows Server 2003 SP1 and Windows XP SP2" entry in "Troubleshooting Problems with MSDTC" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkID=101609.

      Determine whether the BizTalk Server cache refresh interval needs to be increased (performance check).

      How to Adjust the Cache Refresh Interval

      Determine whether the throttling options of each host need to be adjusted (performance check).

      Inbound Host Throttling

      Outbound Host Throttling

      Determine whether unnecessary tracking is enabled, such as orchestration, shape, and Business Rule Engine (BRE) event tracking (performance check).

      • How to Disable Tracking.
      • Planning for Tracking
      • Best Practices for Maintaining Performance(under "Monthly Performance Checks")
      • "Best Practices for Health and Activity Tracking" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkId=106726.

      Determine whether you are using a dedicated host for tracking maintenance (performance check).

      How to Use a Dedicated Host for Tracking Maintenance

      Determine whether the default XML send pipeline is being used instead of the PassThrough send pipeline (performance check).

      "Managing Send Ports Using BizTalk Explorer" in BizTalk Server 2006 R2 Help athttp://go.microsoft.com/fwlink/?LinkId=106727.

      Check the BizTalk Server database sizes for an increasing trend (performance check).

      • For more information about sizing the tracking database, seeGuidelines for Sizing the Tracking Database.
      • For more information about sizing the MessageBox, BizTalkDTADb, and BAMPrimaryImport databases, seeIdentifying Bottlenecks in the Database Tier.

      Determine whether the system is encountering database contention (performance check).

      For more information about avoiding contention in the MessageBox database, seeAvoiding Disk Contention.

      #p#

      IBM Lotus Domino服务器维护清单

      Task

      Frequency

      Back up the server

      Daily, weekly, monthly

      Monitor mail routing

      Daily

      Run Fixup to fix any corrupted databases *

      At server startup and as needed

      Monitor Administration Requests database (ADMIN4.NSF)

      Weekly

      Monitor databases that need maintenance

      Weekly

      Monitor replication

      Daily

      Monitor modem communications

      Daily

      Monitor memory

      Monthly

      Monitor disk space

      Daily, weekly, monthly

      Monitor server load

      Monthly

      Monitor server performance

      Monthly

      Monitor Web server requests

      Monthly

      Monitor server first domino servers

      Daily

      另外也有非官方文档,其他系统管理员的经验分享:

      #p#

      SQL Server硬件检查清单

      The Basics
      Hardware Manufacturer:
      Model Number:
      Serial Number:
      Tower/Rack/Blade
      Physical Location of Server:
      Purchase Date:
      Warranty/Service Contract Number:
      Warranty/Service Telephone Number:
      Date Warranty Expires:
      CPU
      Number of CPU Sockets:
      Number of Installed CPUs:
      CPU Model:
      CPU Ghz Speed:
      Number of Cores per CPU:
      Type of Hyperthreading:
      Is Hyperthreading on or off:
      CPU L2 Cache Size:
      CPU Bus Speed:
      Motherboard BIOS Version:
      Is BIOS Version Current:
      Memory
      Current Amount of RAM:
      Additional RAM Capacity Available:
      Number of Memory Slots Used:
      Number of Memory Slots Available:
      ECC Memory:
      Network Adapter
      Hardware Manufacturer:
      Model Number:
      Speed:
      Number of Ports per Card:
      Number of Cards:
      BIOS Version Number:
      Is BIOS Version Current:
      NIC Speed/Duplex Setting:
      Is the NIC Power Saving Feature Off:
      Storage
      Type: Local, DAS, SAN, Combo:
      Local/Integrated RAID Controller
      Number of Local RAID Controllers:
      Type: SCSI, SAS, etc.
      Controller Hardware Manufacturer:
      Number of Ports:
      Controller Model Number:
      Controller Cache Size:
      Is There a Cache Battery:
      Is Write Back Caching On:
      Controller BIOS Version Number:
      Is Controller BIOS Version Current:
      External RAID Controllers
      Number of External RAID Controllers:
      Type: SCSI, SAS, etc.
      Controller Hardware Manufacturer:
      Controller Model Number:
      Number of External Ports:
      Controller Cache Size:
      Is There a Cache Battery:
      Is Write Back Caching On:
      Controller BIOS Version Number:
      Is Controller BIOS Version Current:
      Local Disk Configuration
      RAID Configuration:
      Number of Physical Drives:
      Physical Dimension of Drives:
      Drive Capacity:
      Drive Speed/RPM:
      Total Available Disk Space:
      HBAs for External Storage
      Number of HBAs:
      Type: iSCSI, Fibre Channel, etc:
      Type of Connectors:
      HBA Hardware Manufacturer:
      HBA Model Number:
      HBA BIOS Version Number:
      Is HBA BIOS Version Current:
      DAS Disk Configuration
      RAID Configuration:
      Number of Drives:
      Physical Dimension of Drives:
      Drive Capacity:
      Drive Speed/RPM:
      Total Available Disk Space:
      SAN Disk Configuration
      SAN Manufacturer:
      SAN Model:
      iSCSI, Fibre Channel, etc:
      SAN Cache Capacity:
      SAN Software Version:
      Is SAN Software Current:
      Number of Attached LUNs:
      RAID Configuration per LUN:
      Number of Drives Used per LUN:
      Capacity of Drives Used in LUNs:
      Speed of Drives Used in LUNs:
      Available Disk Space per LUN:
      Are LUNs Shared or Dedicated:
      High Availability
      Redundant Power Supplies:
      Redundant NICs:
      Redundant Controllers:
      All Components Connected to UPS:
      Is Server Physically Secure:
      If Cooling Required, is it Redundant:
      Clustering
      Number of Cluster Nodes:
      Number of Active Nodes:
      Number of Passive Nodes:
      Type of Quorum:
      Type of Shared Storage:
      Are HBAs Redundant:
      Are Storage Switches Redundant:
      Are NIC Switches Redundant:
      Are NICs Redundant:
      Backup
      Tape Drive: Internal/External:
      Tape Drive Manufacturer:
      Tape Drive Model:
      Local Disk:
      DAS Disk:
      SAN Disk:

      #p#

      Windows Server 2003系统维护清单

      Daily Operations Checklist

      Checklist: Performing Physical Environmental Checks

      Use this checklist to ensure that physical environment checks are completed.

      Task:

      ·Verify that environmental conditions are tracked and maintained.

      ·Check temperature and humidity to ensure that environmental systems such as heating and air conditioning settings are within acceptable conditions, and that they function within the hardware manufacturer’s specifications.

      ·Verify that physical security measures such as locks, dongles, and access codes have not been breached and that they function correctly.

      ·Ensure that your physical network and related hardware such as routers, switches, hubs, physical cables, and connectors are operational.

      Checklist: Check Backups

      Task:

      ·Make sure that the recommended minimum backup strategy of a daily online backup is completed.

      ·Verify that the previous backup operation completed.

      ·Analyze and respond to errors and warnings during the backup operation.

      ·Follow the established procedure for tape rotation, labeling, and storage.

      ·Verify that the transaction logs were successfully purged (if your backup type is purging logs).

      ·Make sure that backups complete under service level agreements (SLA).

      ·Checklist: Check CPU and Memory Use

      ·Use this checklist to record the sampling time of each counter.

      Checklist: Check Disk Use

      Follow the checklist and record the drive letter, designation, and available disk space.

      Task

      ·Create a list of all drives and label them in three categories: drives with transaction logs, drives with queues, and other drives.

      ·Check disks with transaction log files.

      ·Check disks with SMTP queues.

      ·Check other disks.

      ·Use server monitors to check free disk space.

      ·Check performance on disks.

      Drive Letter

      Designation (drives with transaction logs, drives with queues, and other drives)

      Available space MB

      Available % free

      Your data here

      Your data here

      Your data here

      Checklist: Event Logs

      Check event logs using the following checklist.

      Task

      ·Check application and system logs on the server to see all errors.

      ·Check application and system logs on the Exchange server to see all warnings.

      ·Note repetitive warning and error logs.

      ·Respond to discovered failures and problems.

      Weekly Maintenance Checklist

      Checklist: Create Reports

      Use this checklist to create status reports to help with capacity planning, service level agreement (SLA) reviews, and performance analysis.

      Task:

      ·Use daily data from event log and System Monitor to create reports.

      ·Report on disk usage.

      ·Create reports on memory and CPU usage.

      ·Generate uptime and availability reports.

      Checklist: Incident Reports

      Use this checklist to create incident reports.

      Task

      ·List the top generated, resolved, and pending incidents.

      ·Create solutions for unresolved incidents.

      ·Update reports to include new trouble tickets.

      ·Create a document depository for troubleshooting guides and post- mortems about outages.

      Checklist: Antivirus Defense

      Use this checklist to perform your antivirus defense.

      Task

      ·Perform a virus scan on each computer.

      ·Check anti-virus definition updates timely.

      Checklist: Status Meeting

      Use this checklist to conduct weekly status meetings during which the tasks are reviewed.

      Task

      ·Server and network status for the overall organization and segments.

      ·Organizational performance and availability.

      ·Overview reports and incidents.

      ·Risk analysis and evaluation including upcoming changes.

      ·Capacity, availability, and performance reviews.

      ·Service level agreement (SLA) performance, and review items that have not met target objectives.

      遗憾的是,目前还没有来得及为大家整理成中文资料。以后会逐渐更新相关的中文资料,敬请持续关注系统频道!

原文链接:https://77isp.com/post/8988.html

=========================================

https://77isp.com/ 为 “云服务器技术网” 唯一官方服务平台,请勿相信其他任何渠道。