CCAMP Working Group Richard Rabbat, Ed. (FLA) Internet Draft Vishal Sharma, Ed. (Metanoia) Expires: August 2003 Norihiko Shinomiya (FLL) Ching-Fong Su (FLA) Peter Czezowski (FLA) February 2003 Fault Notification Protocol for GMPLS-Based Recovery draft-rabbat-fault-notification-protocol-02.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This draft presents generic mechanisms for a fault notification protocol to be used in a GMPLS-based failure recovery scheme. The mechanisms achieve bounded protection path activation times in the event of single failures, based on constrained protection-restoration path routing and requirements on the nodes in terms of physical capabilities, and the control plane delay characteristics. We justify choices made for the notification method and extensions required to current algorithms and protocols. Rabbat & Sharma (Eds.) Expires - August 2003 [Page 1] draft-rabbat-fault-notification-protocol-02.txt February 2003 Table of Contents 1. Overview.......................................................2 2. Terminology....................................................3 3. Glossary of Terms Used.........................................3 4. Requirements at Recovery Path Setup Time.......................4 5. Protocol Steps in Failure Notification and Service Recovery....5 5.1 T1: Fault Detection Time......................................5 5.2 T2: Hold-Off Time.............................................6 5.3 T3: Fault Notification and Completion of Recovery Operation...6 5.3.1 Delays Incurred by Messages.................................8 5.3.2 Notification Message Data...................................9 5.4 T4: Traffic Recovery Time.....................................9 6. Reversion (Normalization)......................................9 7. Security Considerations.......................................10 8. Conclusion....................................................10 Appendix A. Fault Notification Message Delays on Path............10 A.1 Delays Associated with Link Traversal........................10 A.2 Delays Incurred at the Nodes.................................11 References.......................................................12 Acknowledgments..................................................13 Authors' Addresses...............................................13 1. Overview The issue of time-constrained recovery (protection and restoration) in optical switching networks is very important for meeting high- availability and service-level guarantees. Several mechanisms have been devised for recovery in mesh and ring topologies. Currently, there are several Internet Drafts related to recovery in networks featuring a GMPLS control-plane. The terminology for GMPLS-based recovery is presented in [1]. Another draft [2] by the protection and restoration design team looks at differences between protection, restoration, path-based, link-based and span-based approaches. Requirements for failure recovery are listed in [3]. A fault notification protocol must address those recovery requirements that fall into three main categories: o Meeting timing requirements o Efficient usage of control plane resources o Supporting flexible design of recovery schemes Protection and restoration algorithms can be used for local repair (link-based or node-based), span protection, and path protection. This document presents generic mechanisms for a fault notification protocol and recovery scheme designed to ensure bounded recovery times, (e.g., 50 ms), which are comparable to recovery times in the Rabbat & Sharma (Eds.) Expires - August 2003 [Page 2] draft-rabbat-fault-notification-protocol-02.txt February 2003 ring-based SONET/SDH networks that implement 1+1 or 1:1 protection schemes. Link-based recovery can handle faults such as fiber link failures and transponder failures. However, in the case of a node failure, the control plane uses either node-based or path-based recovery. The advantage of span-based and path-based protection lies in their ability to reduce wavelength redundancy (wavelengths that are reserved for possible failures), but their disadvantage is the potentially lengthy delay incurred in notifying all nodes along the protection path of the failure of a remote resource. In some applications, protection paths need to be chosen carefully to meet a certain restoration time requirement (e.g., 50 ms). This document presents a fault notification protocol that is both technology and topology agnostic, and applies to intra-domain protection. Multi-domain recovery is not within the scope of this draft. In addition, this proposal focuses on scalability, an important issue that arises when using signaling for fault notification. We assume unidirectional traffic through Label Switched Paths (LSPs). For the purpose of illustration, we also assume a mesh Wavelength Division Multiplexing (WDM) network; applicability to ring-topology networks is automatic, though this protocol adds some overhead not needed for these networks. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [4]. 3. Glossary of Terms Used In addition to the terminology for GMPLS-based recovery that is documented in [1], this draft uses the following acronyms: o AIS: Alarm Indication Signal, a signal at the SONET/SDH transport layer o BDI: Backward Defect Indication, a signal at the transport layer sent upstream o LSP: Label Switched Path o MEMS: Micro-Electro Mechanical Systems o PXC: Photonic Cross-Connect, a cross-connect that switches wavelengths transparently, by means of a switching fabric such as MEMS Rabbat & Sharma (Eds.) Expires - August 2003 [Page 3] draft-rabbat-fault-notification-protocol-02.txt February 2003 o WDM: Wavelength Division Multiplexing 4. Requirements at Recovery Path Setup Time As a request for a working path is signaled into the network, it indicates what type of protection or restoration it requires, and, optionally, a recovery priority value. After the recovery route computation algorithm calculates the protection or restoration path, the link resources (wavelengths, wavebands, etc.) along that path are reserved and possibly activated. When the recovery path is not activated, these link resources may be used to carry preemptible best-effort traffic to increase network utilization. Alternatively, the same link resource may be reserved by multiple protection paths for different link failures as long as these protection paths do not need to be activated simultaneously (e.g., M:N shared protection). In either case, proper link resources need to be activated upon the notification of failure. When a label for a protection LSP is setup on a certain node A through RSVP-TE or CR-LDP, node A SHOULD be aware of the network resource this LSP is protecting. In the case of RSVP-TE for example, the protection PATH message may notify all nodes on the protection path of this information at path setup time as proposed in [5]. This allows node A to bundle (or group together) labels (as well as link resources) that protect a particular network resource. For example, if two labels j and k correspond to two LSPs used to protect working paths from the failure of link (X,Y), then they belong to the bundle L (X,Y). This allows node A to process in its control plane the joint event of the two LSP failures and possibly jointly activate/cross-connect both LSPs referenced by labels j and k when it receives notification of the failure of link (X,Y). This documents proposes a method for per-failure fault notification (as compared to per-LSP fault notification), hence such bundled label information is essential. The main difference between "per-failure" vs. "per-LSP" notification is in the number of notification mechanisms that need to be started. Per-failure fault notification allows the engaging of one mechanism to notify all relevant nodes of the fault. On the other hand, per-LSP notification requires activating as many mechanisms as the number of failed LSPs (for example, all LSPs that failed due to a link failure). In an optical network carrying hundreds of wavelengths per fiber, per-LSP notification is obviously taxing and provably superfluous. Rabbat & Sharma (Eds.) Expires - August 2003 [Page 4] draft-rabbat-fault-notification-protocol-02.txt February 2003 5. Protocol Steps in Failure Notification and Service Recovery The steps described in this section present a control plane based recovery scheme and its fault notification protocol. It details the process used in notifying nodes of the resource failure and activating the recovery lightpaths. The failure sequence is based on the timing sequence in the ITU-T communication entitled G.gps [6] applied to WDM networks. A timing diagram in Figure 1 is reproduced for clarity. The critical component in guaranteeing time constraints to service recovery is the fault notification process. The following sequence of events MUST be followed in order to ensure that the recovery process happens within a specific amount of time, as is the case of SONET/SDH-based networks. +-Network Impairment | +-Fault Detected | | +-Start of Fault Notification | | | +-Recovery Operation Complete | | | | +-Traffic Recovered | | | | | | | | | | v v v v v ------------------------------------------------> | T1 | T2 | T3 | T4 | time Figure 1. Recovery Temporal Model 5.1 T1: Fault Detection Time This is the period of time between the network impairment and the detection at the control plane. An example of such network impairment is a fiber cut. Layer 1 at a certain node detects the fault and passes it to the control plane. This document assumes that equipment in the optical network can detect such failures. This time is not included in the calculation of the recovery time. In general, if a bi-directional link is cut, both its upstream and downstream nodes will detect the fault. The downstream node detects a unidirectional link failure. In this case, that node will send at the transport layer a signal such as the Backward Defect Indication (BDI) defined in ITU-T G.709 to the node upstream that will also act as a detecting node. We assume that the time difference between detection and inference based on BDI is negligible. Other transport plane technologies MUST offer the same capability to be used in this context. So both upstream and downstream nodes detect the failure. Rabbat & Sharma (Eds.) Expires - August 2003 [Page 5] draft-rabbat-fault-notification-protocol-02.txt February 2003 To support the failure detection requirement, nodes MUST implement per-channel monitoring that will pinpoint the failure and report it to the detecting entity. 5.2 T2: Hold-Off Time This is the period of time that the reporting entity waits before starting the fault recovery process. This allows the fault recovery process at a given layer to wait for recovery to occur at a lower layer. In the case of WDM-based recovery, this time should be 0 sec since there is no underlying layer recovery. In the case of a GMPLS-enabled IP network over SONET, T2 may be set to 50ms such that SONET protection scheme can activate before any IP (MPLS) layer recovery is triggered. For GMPLS-enabled SONET over WDM, the choice for T2 is a bit complicated. Mechanisms such as SONET/SDH protection could be used in the same environment in conjunction with WDM-based protection by picking either protection mechanism or no protection at all. Allowing redundant protection mechanisms for the same light path may increase the recovery time. The SONET/SDH layer, if it exists, makes the decision whether to request a protected or unprotected light path from the WDM layer to connect the SONET equipment. 5.3 T3: Fault Notification and Completion of Recovery Operation T3 is the period between the time when detecting entity starts sending out a fault notification message and the time when every node, including ingress nodes and intermediate nodes on the corresponding recovery paths, have been notified of the failure and finished reconfiguring themselves for carrying restored traffic. For link-based recovery, the ingress node to the recovery LSP is the upstream detecting node. If the recovery time is strictly constrained, the ingress node SHOULD be as close to the link failure as possible. This reduces the recovery time since no messages have to be relayed to a remote or centralized authority to initiate recovery. Some ingress or egress nodes may detect a failure, for example, a Loss of Light (LoL) event. The fault notification message MUST be initiated by the detecting entity even if the ingress and egress nodes have other indications of failures. This allows the fault notification mechanism to solve for the worst-case scenarios and gives timely notification of all concerned nodes on the recovery path(s). For the purpose of this draft, transport plane signals such as the AIS (Alarm Indication Signal) and the BDI will be disregarded by all OXCs except the detecting nodes. It is to be noted that fault notification occurs at the control plane to minimize layer interaction. Rabbat & Sharma (Eds.) Expires - August 2003 [Page 6] draft-rabbat-fault-notification-protocol-02.txt February 2003 The detecting entity MAY use several fault notification methods to notify other nodes of the failure, including GMPLS-based signaling and flooding. In the case of GMPLS-based signaling, there is generally one fault notification message per disrupted Label Switched Path. Hence, signaling does not scale well with the number of connections; in addition, the message processing delay is less predictable. For details about the notification methods and the choice of flooding for this draft, the reader is encouraged to refer to [7]. This document specifies a notification protocol based on message flooding. In the case of flooding, the message sent from the detecting entity to all nodes on the various protection paths should reach them within the specified recovery time (T-rec) minus the reconfiguration time (T-cfg) needed at each node after fault notification. We define this as the fault notification time (T-ntf = T-rec - T-cfg). The method for assigning each node's T-ntf is out of scope for this document. Nodes on a recovery path (including the ingress node) are aware that they are protecting against the failure of a particular resource. All nodes notified of the failure will activate the recovery path by performing any required hardware reconfiguration (e.g., moving mirrors in the case of a MEMS-based switching fabric). The approach outlined in this draft supports node reconfiguration applied sequentially (e.g., parallel movement of the mirrors is not available), or in parallel (e.g., electronic switching fabric). The ingress node starts sending data on the protection path at the start time S(I) specified in the next paragraph. If one of the detecting entities at the ingress or egress node detect, at the data plane, a failure in the protection light path to be activated, it MUST raise an alarm that may be dealt with at the management plane. The management plane will take appropriate remediation action. Alarm and remediation are outside the scope of this draft. The nodes on protection paths receive the fault notification within a deterministic time. This time delay is calculated by each node as explained in Appendix A. To avoid complex clock synchronization, an ingress node, identified as node I, that receives the notification from a detecting node, node J, calculates the start time S(I) at which it switches traffic to the protection path as follows: S(I) = time-of-notification(I) - min-delay-between(J,I) + T-rec where o time-of-notification(I) returns the clock time at node I; o min-delay-between(J,I) returns the minimum time needed for the notification from node J to reach node I; Rabbat & Sharma (Eds.) Expires - August 2003 [Page 7] draft-rabbat-fault-notification-protocol-02.txt February 2003 Note that (time-of-notification(I) - min-delay-between(J,I)) will give the time when failure was detected at J, and T-rec is the recovery time requirement. For simplicity, in this example, we assume that hardware reconfiguration time and fault notification time have been considered during the protection path set up. Hence at S(I), every node on the protection paths should have been notified of the failure and finished reconfiguration. Fault notification is done via flooding as follows. The detecting entity sends a notification packet to its neighbors on all outgoing links. The notification packet is a high-priority packet, and contains the unique identifier of the link at fault. Each node that receives such a packet sends an acknowledgement to the sender and transmits duplicates of the notification to all other neighboring nodes. To reduce the amount of fault notification traffic that is flooded, the nodes avoid re-broadcasting packets about the same fault and decrement a time-to-live field in the packets as they are received. When the recovery type is restoration with dynamic routing, the ingress node for the recovery path, on receiving the fault notification message, must begin the processes of computing and signaling the restoration paths in an order according to the relative recovery priorities of the working paths for which it is responsible. 5.3.1 Delays Incurred by Messages The above discussion suggests that in order for the protection algorithm to abide by the T-rec ms recovery requirement, it needs to be either: 1. Aware of timing issues to be able to select a proper path, or 2. Passed a set of nodes and links that satisfy the timing constraints. Due to the complexity of the first method, we believe that the second method will be easier to develop and implement. For example, a pruned topology may be considered for protection path computation, where links/nodes that violate the strict recovery time requirements are excluded. A database of link information should hold the fiber physical length and the capacity of each link (or channel) as well as the notification message processing time. The total time needed by a notification packet to travel from source to destination can be broken into two delay components: the time needed to traverse each link and the time needed to go through each node. While the different delay calculations are discussed in Appendix A, the algorithm for computing the protection paths is out of scope for this document. Rabbat & Sharma (Eds.) Expires - August 2003 [Page 8] draft-rabbat-fault-notification-protocol-02.txt February 2003 5.3.2 Notification Message Data Two types of messages are needed for reliable communication of fault notifications: a Fault Notify Message to carry the information regarding the failure to the neighboring nodes, and a Fault Notify Acknowledge Message to indicate that a notification message was properly received. Aside from implementation-dependent constructs, the data to be carried in these messages is presented in Table 1 below. Table 1. Required and Optional Data for Fault Notifications -------------------------------------------------------------------- Data Object Fault Fault Notify Description Notify Acknowledge -------------------------------------------------------------------- Message ID R R Identifies notification messages Fault Link ID R - Identifies the failed link Fault ID R - Identifies sequence of failure Channel Status O - Indication of link fault status Local Node ID O - Identifies the original node that is reporting the failure TTL O - Time To Live field -------------------------------------------------------------------- R: required, O: optional, -: not applicable A node keeps sending Fault Notify messages at intervals until it receives a Fault Notify Acknowledgement response or the control channel connectivity is declared lost. 5.4 T4: Traffic Recovery Time This is the time between the last recovery action and the time that the traffic (if present) is completely recovered. This interval is intended to account for the time required for traffic to once again arrive at the point in the network that experienced disrupted or degraded service due to the occurrence of the fault, i.e. the egress node. 6. Reversion (Normalization) Most of the current literature recommends that for resource efficiency, the traffic should be moved back to the original path when the failed link or node is back online. Although reversion is an optional step, it is typically employed. If reversion is not used, the "orphaned" bandwidth on the failed working paths should be reclaimed as they become repaired. The signaling of fault repair Rabbat & Sharma (Eds.) Expires - August 2003 [Page 9] draft-rabbat-fault-notification-protocol-02.txt February 2003 notifications is similar to that of fault notifications. However, the reversion phase does not have strict time constraints. 7. Security Considerations This draft makes use of several protocols; therefore this draft does not introduce any new security issues besides the ones that arise in the use of these protocols. 8. Conclusion This draft presents generic mechanisms for a fault notification and service recovery protocol for GMPLS-enabled optical networks. It describes the steps required in the notification process, leading to recovery of light path service within specific time bounds. A "per- failure" approach (as opposed to "per-LSP") to fault notification is proposed for its scalability. Appendix A. Fault Notification Message Delays on Path This appendix describes the delays incurred on the path. Two types of delays occur on the path between any two nodes. They are delays incurred during traversal of the links on that path, and delays that occur at the nodes along the path. The following presents the computations and expected values for the different delays. A.1 Delays Associated with Link Traversal The time needed to traverse each link is the sum of the transmission time and the link propagation delay: 1. The transmission time is a value based on link capacity. The calculation is as follows: D trans = (packet size) / (link speed). 2. The link propagation delay is due to the physical length of the link: D prop = length / (light propagation speed on fiber). The length of a notification packet is expected to be of the order of a hundred bytes (about 10^3 bits). As an example, for a link speed of 1 Gbps, D trans ~= 10^3 / 10^9 = 10^-6 s = 1 microsecond. This value therefore can safely be ignored in calculating delays. On the other hand, the link propagation delay in metropolitan area and long-haul networks affects total delay. For a distance of 100 km, Rabbat & Sharma (Eds.) Expires - August 2003 [Page 10] draft-rabbat-fault-notification-protocol-02.txt February 2003 with light speed in a fiber at 2/3 (about 200,000 km/s) of its speed in free space, D prop ~= 10^2 / (2 * 10^5) = 0.5*10^-3 s = 500 microseconds. A.2 Delays Incurred at the Nodes At each node, two delays are important: queuing delay and processing time. The processing time D proc has been identified in the literature as a few tenths of a millisecond in the case of an RSVP object. This value is smaller in the case of a simpler IP packet requesting the activation of an LSP path. The issue of queuing delay is important at all intermediate nodes. Fault notification messages should be queued at the front of the buffer that holds other control packets in order to avoid queuing delays, (those messages do not have to contend with data packets since obviously no data are sent over the control channel). A queuing process such as priority queuing would allow those packets to be admitted at the head of the queue, through the setup of the priority of the packet. A simple mechanism such as the setup of the priority bits at the IP header, such as the IP precedence bits or DSCP code points of the TOS (Type Of Service) byte would be appropriate. Using priority queuing for fault notification messages will ensure that the queuing delay will be bounded. In the case of flooding for fault notification, D queue(A) = 0 sec. If other fault notification messages are in the queue as well, this implies multiple failures, where the time recovery guarantee does not apply. Otherwise, it may indicate the fact that multiple messages are traveling on different protection paths to notify the same link failure, such as the case when a signaling protocol is used for fault notification. In the case of per-LSP fault notification just as in the case of using a signaling protocol, the maximum queuing delay at node A is: D queue max(A)= (number of protection paths) * (packet size) / (link bandwidth). This explains mathematically the choice against using a signaling protocol for fault notification. Flooding allows that value to be 0 sec. In the absence of priority queuing, the maximum queue delay can be calculated as follows at node A, assuming fair queuing of the FIFO buffers of all control channels and assuming input buffers only: D queue max(A)= (number of queues) * (queue size) / (link bandwidth). This value is an upper bound, and is dependent on hardware buffer implementations. Rabbat & Sharma (Eds.) Expires - August 2003 [Page 11] draft-rabbat-fault-notification-protocol-02.txt February 2003 References [1] Mannie, E., et al, "Recovery (Protection and Restoration) Terminology for GMPLS", Internet Draft, work in progress, draft- ietf-ccamp-gmpls-recovery-terminology-01.txt, November 2002. [2] Papadimitriou, D., et al, "Analysis of Generalized MPLS-based Recovery Mechanisms (including Protection and Restoration)", Internet draft, work in progress, draft-ietf-ccamp-gmpls-recovery- analysis-00.txt, January 2003. [3] Czezowski, P., and T. Soumiya (Eds.), "Optical network failure recovery requirements", Internet Draft, work in progress, draft- czezowski-optical-recovery-reqs-01.txt, February 2003. [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [5] Li, G., J. Yates, et al, "Experiments in Fast Restoration using GMPLS in Optical/Electronic Mesh Networks", Post-deadline Papers Digest, OFC 2001, Anaheim, CA, March 2001. [6] ITU-T Draft Recommendation G.gps, "Generic Protection Switching", work in progress, April 2002. [7] Rabbat, R. et al, "Fault Notification and Service Recovery in WDM Networks", white paper available at: http://perth.mit.edu/~richard/wp-ietf-fault-notification.pdf. Rabbat & Sharma (Eds.) Expires - August 2003 [Page 12] Acknowledgments The following individuals provided valuable input to this draft: Takafumi Chujo of Fujitsu Labs of America, and Akira Chugo of Fujitsu Laboratories, Ltd. Authors' Addresses Richard Rabbat Vishal Sharma Fujitsu Labs of America, Inc. Metanoia, Inc. 595 Lawrence Expressway 305 Elan Village Lane, Unit 121 Sunnyvale, CA 94085 San Jose, CA 95134-2545 United States of America United States of America Phone: +1-408-530-4537 Phone: +1-408-955-0910 Email: rabbat@fla.fujitsu.com Email: v.sharma@ieee.org Norihiko Shinomiya Ching-Fong Su Fujitsu Laboratories Ltd. Fujitsu Labs of America, Inc. 1-1, Kamikodanaka 4-Chome 595 Lawrence Expressway Nakahara-ku, Kawasaki Sunnyvale, CA 94085 211-8588, Japan United States of America Phone: +81-44-754-2635 Phone: +1-408-530-4572 Email: shinomi@jp.fujitsu.com Email: csu@fla.fujitsu.com Peter Czezowski Fujitsu Labs of America 595 Lawrence Expressway Sunnyvale, CA 94085 United States of America Phone: +1-408-530-4516 Email: peterc@fla.fujitsu.com Rabbat & Sharma (Eds.) Expires - August 2003 [Page 13]