RE: IPV4LL and EXPIRE
David Hauck
Tue Oct 21 16:00:33 2014
Hi Roy,
On Tuesday, October 21, 2014 4:24 AM, Roy Marples wrote:
> Hi David
>
> On 20/10/2014 20:46, David Hauck wrote:
>> On Monday, October 20, 2014 12:27 PM, Roy Marples wrote:
>>> On 2014-10-18 01:16, Roy Marples wrote:
>>>> So that's the reason maybe? Defending the IPv4LL address triggered
>>>> an expiry?
>>>> I may have to setup a reverse ARP proxy to test this.
>>>
>>> Fixed here:
>>>
>>> http://roy.marples.name/projects/dhcpcd/ci/77cc5e6fefbda2e0d03790a2cd
>>> 1 447df385c2d18?sbs=0
>>>
>>> Hopefully that fixes the state engine, can you test it please? :)
>>
>> Well I see the EXPIRE state cycling now/again (is this what you mean?) ;).
>
> Yes!
>
>>> So the last question is how do we handle the ARP table?
>>
>> Before this, and at the point where C2 is plugged back into the
>> eth1, the
> eth1 interface has already released its original IPV4LL address. Why
> doesn't the client transition back into a state where it attempts to
> contact a DHCP server (before actually trying to defend the previously
> assigned IPV4LL address)?
>
> dhcpcd didn't transition back because it's stuck in a loop because
> eth0 (i
> assume) is sending faulty ARP packets.
I think clarification on this point is important (regardless of what might be odd semantics during the subsequent attempts to [re]claim an IPV4LL address). In particular, I'm still not clear on why it doesn't transition back to DHCP INIT; the link has already been brought down (via the NO CARRIER event) and at this point the state machine should go back to DHCP INIT, no? I'm making the distinction here that the DHCP lease is a IPV4LL lease vs. a standard (fixed IP) lease and that the implementation should recognize this and therefore first attempt to re-acquire a normal lease before attempting to re-acquire - or really just configure - the stale IPV4LL lease.
As an aside, what is the expiry timeout associated with a normally (i.e., initially) configured IPV4LL lease/address?
> Here's what happens
>
> eth1 -> who owns 169.254.1.1?
> eth1 -> who owns 169.254.1.1?
> eth1 -> who owns 169.254.1.1?
> no replies, great I'll assign the address and announce it.
> eth1 -> I own 169.254.1.1
> eth0 -> no, I own 169.254.1.1
> eth1 -> I own 169.254.1.1
> eth0 -> no, I own 169.254.1.1
> eth1 has now failed to defend its IPv4LL address and must discard it
> and start over
>
> Normally, if eth0 really did have 169.254.1.1 it would say this for
> each of the initial 3 requests eth1 made. But it didn't.
Are you positive that this is how the ARP PROBE and ARP ANNOUNCE semantics are meant to work? I can imagine (one would need to fully read all the associated RFCs/guidance on this to be sure - e.g., RFCs 826, 5227) that the ARP PROBE sequence is meant to query whether anyone else actually OWNS the address, i.e., *not* whether anyone else has the address in its ARP cache - these are different things. And also, subsequently, that the ARP ANNOUNCE is further (more strongly) saying *I am now using this address* and, like any gratuitous ARP, would result in a reply if anyone else had this identical entry in their ARP *cache*. So, I can see the potential for reading this as being incorrect/confusing.
> Because dhcpcd did actually assign the address, the IPv4LL conflict
> counter was reset and started the process over.
>
> This is fixed here:
> http://roy.marples.name/projects/dhcpcd/ci/36e83c64395acafc9030cadef3b
> 5fd3082edccbd?sbs=0
>
> Basically we now only reset the conflict counter after the announcing
> is complete or we bind a non IPv4LL address. If we hit the conflict
> counter dhcpcd now transitions back to DHCP discovery.
What is the conflict "counter"?
> Another minor related issue is fixed here:
> http://roy.marples.name/projects/dhcpcd/ci/528f024cddbc23e7589eaf9f2e8
> c54ad7408277e?sbs=0
I'll integrate both of these and let you know what I find.
Would either of these explain why the EXPIRE cycle doesn't ultimately end with the observation that a newly random IPV4LL address is unique on the network? That is, doesn't the state machine continue to try to find a new IPV4LL address and shouldn't this eventually succeed?
>>> From your earlier statements, it looks like you run dhcpcd per
>>> interface rather than once dhcpcd process for all interfaces. Is this true?
>>
>> I don't think so. The network configuration has two interfaces
>> defined - the
> first one is static and the second is configured for DHCP. Nothing
> special is done with dhcpcd aside from the configuration I sent earlier.
>
> It's about how it's started
> dhcpcd eth1
> vs
> dhcpcd
>
> In the first case, the OS will define when to start dhcpcd on a per
> interface basis if ifup/ifdown or equivalent.
>
> In the latter case, dhcpcd will be started as a generic system service
> and interface configurations will be done in dhcpcd.conf This has the
> added benefit where dhcpcd knows the hw address of each interface and
> will ignore ARP's from them.
> It's also possible to assign a static IPV4 non DHCP configuration
> there as well, so this is could a workable solution to the ARP table issue.
>
>>> Also, what kernel are you running?
>>
>> 3.10.47-rt50
>>
>>> I thought that deleting the IP
>>> address would clear the ARP entry for it?
>>
>> I don't know what the proper semantics are for this in the locally
>> configured
> multihome scenario (i.e., the interfaces ARP entries are likely
> cleared but I'm not sure about the semantics in relation to the other
> local interface and it's entries that identify the downed IPV4LL interface's hw address).
>
> I'll see if I can reproduce the non clearing of ARP tables on my
> multihomed Linux machine later tonight.
>
>>> If this is the case, how did
>>> the table fill up unless IPv4LL from other machines?
>>
>> As you can see from what I've sent the entries are all for IPV4LL
>> addresses
> *that match the hw address* of the local, IPV4LL interface. So these
> aren't entries for other machines, *just the other, local IPV4LL interface*.
>
> Ah, so you mean you have two interfaces doing IPv4LL and they really
> are in conflict?
No, one (the first one - eth0) is static and the other one (eth1) is DHCP.
> If not then I think the kernel is making incorrect ARP announcements.
As indicated above I'm not so sure about this. I think a full and complete reading of the various RFCs would be needed to determine this. The one thing I would say is that I wonder whether the kernel (the stack) could be smarter about managing the ARP cache for the multihomed case and local interface addresses/entries when these entries go up/down (i.e., clear the eth1 entries when this interface is brought down).
Thanks,
-David
> Roy
Archive administrator: postmaster@marples.name