HPE Performance Cluster Management Administration

Schedule

Start End Duration Location Details

Course Details

HPE Performance Cluster Management Administration

Course Code: H8PE9S

Duration: 3 Days

Prerequisites: 

•    H8PE8S: HPE Performance Cluster Management Foundations 

Course Description:

The HPE Performance Cluster Manager (HPCM) administration course provides knowledge and practice installing HPCM, managing data networks, provisioning servers, creating and modifying server images, working with software repositories and image version control, automating post installation tasks, configuring services, reviewing security features, and troubleshooting. 

Course Objectives: 

•    Install HPCM  
•    Add servers to the cluster  
•    Manage data networks  
•    Provision nodes  
•    Create and modify images and software repositories  
•    Use image version control  
•    Automate post installation tasks  
•    Configure shared filesystem, user accounts, applications and updates  
•    Troubleshoot cluster services  
•    Review cluster security features

Intended Audience:

•    Attend this class if you need to learn to install, configure and administer clusters managed with the HPE Performance Cluster Manager (HPCM) 
•    Experienced Linux system administrators 

Course Outlines:

Module 1: Install Cluster 

•    Describe HPCM features  
•    Define operating system slots  
•    Build cluster from ground up  
•    Provision node with GUI 
•    Provision node with command line  
•    Add nodes to the cluster  
•    Explore auto installation tools 

Module 2: Discover 

•    Discover nodes  
•    Interpret cluster configuration files  
•    Review cluster services 

Module 3: Data Networks 

•    Describe technologies  
•    Describe InfiniBand configuration  
•    Describe Intel Omni-Path configuration 
•    Describe software components  
Use diagnostic commands 

Module 4: Manage Images 

•    Manage software repositories  
•    List software repositories  
•    Add software repositories  
•    Remove software repositories  
•    Create repository groups  
•    Customize an image by using RPM lists  
•    Create a compute node image  
•    Create an ICE-compute node image  
•    Manage image version control  
•    Check in an image into version control      

•    Compare differences between two versions of an image  
•    List the versions of an image 
•    Deploy a specific version of an image 
•    Push an ICE-compute image to a rack  
•    Use parallel tools and inbuilt functionality to check differences between nodes  
•    Enable hyperthreading  
•    Disable hyperthreading  
•    Configure array services  
•    Install batch scheduler server on a compute node  
•    Install batch scheduler client on a compute node and in ICE compute node  
•    Configure HPCM connectors to job schedulers  
•    Capture an image from a node (golden)  Add RPMs to, remove RPMs from, and version control compute images  
•    Add and remove RPMs from running compute nodes 
•    Clone an ICE-compute image  
•    Clean up old images on the lead node  
•    Add RPMs to ICE compute image Compare when and when not to use tmpfs root  
•    Determine which nodes use tmpfs root  
•    Configure nodes to use tmpfs root  
•    List tmpfs quota difference (rack leader quotas do not apply when ICE-compute nodes are in tmpfs)  
•    Set tmpfs mode  
•    Set disk mode  
•    Show which mode a node has booted with  
•    Show which mode a node is scheduled to boot into 
•    Perform a clone operating system slot operation 

Module 5: Automate Post Installation Tasks 

•    Review conf.d scripts  
•    Exclude a conf.d script  
•    Use pre_reconf.sh  
•    Use reconfig.sh  
•    Develop post install and per-host customization scripts 

Module 6: Configure Shared Filesystem, User Accounts, Applications, and Updates 

•    NFS Export a filesystem on a compute node
•    Mount an NFS filesystem and create a user on an ICE compute node 
•    Manage user accounts  
•    Synchronize UIDs and GIDs, LDAP, etc.  
•    Run an application on compute and ICE compute nodes  
•    Display BIOS settings  
•    Upgrade firmware  
•    Update kernel  
•    Update distribution 
•    Update HPCM 

Module 7: Troubleshoot Cluster 

•    Backup cluster configuration  
•    Backup managed network switch configuration  
•    Use the central log repository  
•    Investigate log files  
•    Gather system information  
•    Interrogate iLOs, BMCs  
•    Confirm resources  
•    Create pdsh groups  
•    Investigate bond devices  
•    Inspect VLAN devices  
•    Capture a node crash dump  
•    Transfer an image from another slot or another system and confirm that the image can be used.  
•    Inject faults 

Module 8: Review Cluster Security 

•    Describe system administrator configurable security tasks  
•    Describe what makes cluster security different from standalone security (how would change X break the cluster) 
•    List ports used for each node role and for which interfaces  
•    List components with passwords 
o    Admin node – 
o    Flat compute nodes  
o    Rack leader nodes 
o    ICE compute nodes 
o    BMCs 
o    CMCs 
o    Ethernet network switches 
o    InfiniBand and Omni-Path switches 
o    IB/OPA switch BMCs  
o    Storage controllers 
•    List components that can have passwords applied