nanog mailing list archives
Towards an RPKI-rich Internet (and the appropriate allocation of responsibility in the event an RIR RPKI CA outage)
From: John Curran <jcurran () arin net>
Date: Sun, 30 Sep 2018 23:21:50 +0000
Folks -
Perhaps it would be helpful to confirm that we have common goals in the network operator community regarding RPKI, and
then work from those goals on the necessary plans to achieve them.
It appears that many network operators would like to improve the integrity of their network routing via RPKI
deployment. The Regional Internet Registries (RIRs) have all worked to support RPKI services, and while there are
different opinions among operators regarding the cost/benefit tradeoffs of RPKI Route Origin Validation (ROV), it is
clear that we have to collectively work together now if we are ever to have overall RPKI deployment sufficient to
create the network effects that will ensure compelling long-term value for its deployment.
Let’s presume that we’ve achieved that very outcome at some point in future; i.e. we’re have an Internet where nearly
all network operators are publishing Route Origin Authorizations (ROAs) via RIR RPKI services and are using RPKI data
for route validation. It is reasonable to presume that over the next decade the Internet will become even more
pervasive in everyday life, including being essential for many connected devices to function, and relied upon for
everything from daily personal communication and conducting business to even more innovative uses such as payment &
sale systems, delivery of medical care, etc.
Recognizing that purpose of RPKI is improve integrity of routing, and not add undo fragility to the network, it is
reasonable to expect that many network operators will take due care with the introduction of route validation into
their network routing, including best practices such as falling back successfully in the event of unavailability of an
RIR RPKI Certificate Authority (CA) and resulting cache timeouts. It is also reasonable expect that RIR RPKI CA
services are provisioned with appropriate robustness of systems and controls that befit the highly network-critical
nature of these services.
Presuming we all share this common goal, the question that arises is whether we have a common vision regarding what
should happen when something goes wrong in this wonderful RPKI-rich Internet of the future… More than anyone, network
operators realize that even with excellent systems, procedures, and redundancy, outages can (and do) still occur.
Hopefully, these are quite rare, and limited to occasions where Murphy’s Law has somehow resulted in nearly
unimaginable patterns of coincident failures, but it would irresponsible to not consider the “what if” scenarios for
RPKI failure and whether there is shared vision of the resulting consequences.
In particular, it would be good to consider the case of an RIR RPKI CA system failure, one sufficient to result in
widespread cache expirations for relying parties. Ideally, we will never have to see this scenario when RPKI is widely
deployed, but it also not completely inconceivable that an RIR RPKI CA experience such an outage [1]. For network
operators following reasonable deployment practices, an RIR RPKI CA outage should result in a fallback to unvalidated
network routing data and no significant network impacts. However, it’s likely not a reasonable assumption that all
network operators will have properly designed and implemented best practices in this regard, so there will very likely
be some networks that experience significant impacts consequential to any RIR RPKI CA outage. Even if this is only 1
or 2 percent of network operators with such configuration issues, it will mean hundreds of ISP outages occurring
simultaneously throughout the Internet and millions of customers (individuals and businesses) effected globally. While
the Internet is the world’s largest cooperative endeavor, there inevitably will be many folks impacted of a RIR RPKI
outage, including some asking (appropriately) the question of “who should bear responsibility” for the harm that they
suffered.
It is worth understanding what the network community believes is the most appropriate answer to this question, since a
common outlook on this question can be used to guide implementation details to match. Additionally, a common
understanding on this question will provide real insight into how the network community intends risk of the system to
be distributed among the participants.
There are several possible options worth considering:
A) The most obvious answer for the party that should be held liable for the impacts that result from an RPKI CA
failure would be the respective RIR that experienced the outage. This seems rather straightforward until one considers
that the RIRs are providing these services specifically noting that they may not be (despite all precautions) available
100% percent of the time, and clearly documented expectations that those relying on RPKI CA information for routing
origin validation should be fallback to routing with not validated state [2]. The impacted parties are those
customers of ISPs that improperly handled the unavailability of RPKI data; thus escalating situation into a
network-affecting outage. Under these circumstances, directing the claims from customers of all the
improperly-configured ISP’s to the RIR completely ignores the responsibility of these ISPs to prepare for this precise
eventuality, as was done by the fellow network operators.
B) One of the more interesting theories on who should be held liable is that those who are publishing ROA’s are
the appropriate responsible parties in the event of RPKI CA failure; one can achieve such a position on the logic that
they consciously decided to use RPKA CA services and thus asserted globally that they would henceforth have validated
routes – an RPKI CA failure is a case of their “vendor" (the RIR) letting them down on the publication. This also has
equity issues, since those publishing ROA information don’t have a clear contributory role, and the damages accruing to
them are coming from customers from those operators who failed their duty.
C) Another potential answer for the party that should be responsible is that each of the ISPs that failed to
appropriately configure their route validation and thus experience a network outage should be responsible for their own
customers impacted as a result. In addition to keeping the liability proportional to the customers served, this
encourages each such ISP to consider appropriate corrective measures.
It is possible to architect the various legalities surrounding RPKI to support any of the above outcomes, but it first
requires a shared understanding of what the network community believes is the correct outcome. There is likely some
on the nanog mailing list who have a view on this matter, so I pose the question of "who should be responsible" for
consequences of RPKI RIR CA failure to this list for further discussion.
Thanks!
/John
John Curran
President and CEO
American Registry for Internet Numbers (ARIN)
[1] https://www.ietf.org/mail-archive/web/sidr/current/msg05621.html
[2] https://www.rfc-editor.org/rfc/rfc7115.txt
Current thread:
- Re: ARIN RPKI TAL deployment issues, (continued)
- Re: ARIN RPKI TAL deployment issues John Curran (Sep 25)
- Re: ARIN RPKI TAL deployment issues Job Snijders (Sep 25)
- Re: ARIN RPKI TAL deployment issues John Curran (Sep 25)
- RE: ARIN RPKI TAL deployment issues Michel Py (Sep 25)
- Re: ARIN RPKI TAL deployment issues Jared Mauch (Sep 25)
- RE: ARIN RPKI TAL deployment issues Michel Py (Sep 25)
- Re: ARIN RPKI TAL deployment issues Mark Milhollan (Sep 26)
- Re: ARIN RPKI TAL deployment issues John Curran (Sep 27)
- Re: ARIN RPKI TAL deployment issues Stuart Henderson (Sep 28)
- Re: ARIN RPKI TAL deployment issues Anderson, Charles R (Sep 28)
- Towards an RPKI-rich Internet (and the appropriate allocation of responsibility in the event an RIR RPKI CA outage) John Curran (Sep 30)
- Re: ARIN RPKI TAL deployment issues Jared Mauch (Sep 25)
- Re: ARIN RPKI TAL deployment issues John Curran (Sep 25)
- Re: ARIN RPKI TAL deployment issues Christopher Morrow (Sep 25)
- Re: ARIN RPKI TAL deployment issues John Curran (Sep 26)
- Re: ARIN RPKI TAL deployment issues Jared Mauch (Sep 26)
- Re: ARIN RPKI TAL deployment issues John Curran (Sep 26)
- Re: ARIN RPKI TAL deployment issues Jared Mauch (Sep 26)
- Re: ARIN RPKI TAL deployment issues John Curran (Sep 26)
- Re: ARIN RPKI TAL deployment issues Claudio Jeker (Sep 26)
- Re: ARIN RPKI TAL deployment issues Tony Finch (Sep 26)
