Hardware Management/FMFM: Difference between revisions
Jump to navigation
Jump to search
Yogesh.varma (talk | contribs) |
Yogesh.varma (talk | contribs) (→Scope) |
||
Line 12: | Line 12: | ||
===Scope=== | ===Scope=== | ||
* | The FMFM is a workstream about standardization of Fleetscale Memory Fault Management | ||
*Proposed topics: | |||
<ol> | |||
<li>Standardize vendor agnostic architecture for memory error handling</li> | |||
<ol style="list-style-type: lower-alpha; padding-bottom: 0;"> | |||
<li style="margin-left:2em">Modularization of inputs from different hardware vendors</li> | |||
<li style="margin-left:2em; padding-bottom: 0;">APIs and connections between different modules from different vendors.</li> | |||
<li style="margin-left:2em; padding-bottom: 0;">Define the output of each module (failure cause, health information, RAS actions, etc.)</li> | |||
</ol> | |||
</li> | |||
<li>Standardize memory error telemetry</li> | |||
<ol style="list-style-type: lower-alpha; padding-bottom: 0;"> | |||
<li style="margin-left:2em">Format content for better fleet scale RAS management</li> | |||
<li style="margin-left:2em; padding-bottom: 0;">Troubleshooting, FRU replacement policies, etc.</li> | |||
</ol> | |||
</li> | |||
<li>Coordinate with the broader OCP group to make sure there is a path for this general architecture</li> | |||
</ol> | |||
===Documents=== | ===Documents=== |
Revision as of 14:36, 18 July 2023
Welcome to the OCP Fleetscale Memory Fault Management (FMFM) WIKI
Fleetscale Memory Fault Management is a Worksteam within the Hardware Management Project.
Leadership
Scope
The FMFM is a workstream about standardization of Fleetscale Memory Fault Management
- Proposed topics:
- Standardize vendor agnostic architecture for memory error handling
- Modularization of inputs from different hardware vendors
- APIs and connections between different modules from different vendors.
- Define the output of each module (failure cause, health information, RAS actions, etc.)
- Standardize memory error telemetry
- Format content for better fleet scale RAS management
- Troubleshooting, FRU replacement policies, etc.
- Coordinate with the broader OCP group to make sure there is a path for this general architecture
Documents
- Coming Soon
Past Fleetscale Memory Fault Management Events
- Coming Soon