So after having implemented our BYOD network as described in this post… I began thinking about how it would be great if we could remove the Microsoft RADIUS server from the equation and have Smoothwall perform the RADIUS AAA. This would enable Smoothwall to know the user that was connected simply by the account they authenticated with against RADIUS; as opposed to having to log into the Smoothwall again through a web page using the same credentials you just passed to RADIUS to get on the WiFi in the first place!
Then back in May 2013 Smoothwall released Main Update 60 which in essence turned the Smoothwall device into a RADIUS server. Perfect! Just what I had wanted.
So I began to play around with the Smoothwall RADIUS offering and our Netgear Managed WiFi set up.
Following the Smoothwall online documentation for the WPA Enterprise set up proved easy enough, and pretty soon we had a test WiFi network running on its own segmented VLAN in the same way that the BYOD set up was; this time the Smoothwall was providing the DHCP and RADIUS services, whilst a Windows Server 2008 R2 DNS server remained in place to enable the DNS resolutions back into our internal LAN for local services.
So far so good; so we took the up-coming school summer holiday as a chance to move the BYOD set up over to the new WPA Enterprise set up. However once the girls returned from their holiday we soon started to notice some issues. We were getting multiple reports from people saying that they were connected to the WiFi but they were getting blocked visiting sites they would normally be allowed to visit (in the most part it was people complaining because they could not get onto Facebook!). We quickly realised that a short-term fix was for the user to turn off their WiFi and then turn it back on; but this was never going to work long-term!
So we logged the issues with Smoothwall who began investigating on our behalf. After many log files being sent over, they requested a Wireshark capture of what was happening. I also began trying to replicate the issue myself; as whenever a user bought their device to our offices it worked fine!
I finally managed to replicate the issue on a laptop whilst running Wireshark. To replicate the issue being experienced by users I manually roamed from one WAP to another, whilst having Facebook open on my browser. As soon as I roamed to the other WAP my WiFi stayed connected but I was all of a sudden blocked from viewing Facebook by the Smoothwall filter. Now that I could replicate the issue, I got back in touch with Smoothwall and talked through the steps I had carried out and sent them the Wireshark capture. They went away and looked into the logs and the issue, whilst I waited for an update!
I then spent a long time on the phone / remote support with a very helpful man at Smoothwall support who performed some TCPDUMP operations on the Smoothwall network interfaces to capture the RADIUS traffic being passed from WAP to Smoothwall. What we found was that as the Netgear wireless solution has the WAP perform the RADIUS login / log out requests and not the central controller; we were seeing a login at WAP1 successfully and then roam to WAP2. WAP2 sees the connection as a roam and not a new connection so does not send a login request to the Smoothwall. WAP1 the sends a log out request as the device has roamed away, and the Smoothwall responds by logging the connection off the web filter.
At this point the only suggestion that I had from Smoothwall was to log a case with Netgear and see if we could get their controller to handle the RADIUS requests and implement a custom block page so that when users experienced the roaming log off with Smoothwall, they were given instructions on what to do; i.e. turn their WiFi off and on again.
Not quite the resolution to the case I had hoped for, but we implemented it nonetheless.
So now I contacted Netgear and opened a support case with them. The case was swiftly elevated to Level 3 support! After many emails and calls backwards and forwards as well as sending logs from the Netgear system and the previous Wireshark captures and TCPDUMP captures from the Smoothwall we finally started to get somewhere. Here is the response from Netgear when the penny started to drop:
Hi Jon, Thanks for the extra information, so the actual roaming is working, but it’s getting logged out of the filter only, it’s all starting to make more sense. The internet filter should probably do something like cisco is doing (a countdown timer for roaming) when the accounting stop packet is received: http://www.cisco.com/en/US/docs/wireless/csg2/2.0/installation/guide/csg22rad.html#wp1024516 “ When RADIUS handoff is configured, and a RADIUS Accounting Stop is received, the CSG2 starts a handoff timer instead of immediately deleting the CSG2 User Table entry for the roaming subscriber. “ CSG2 should be substituted by the internet filter in your case I guess, does that make sense? Maybe you could send this info to Smoothwall, for their consideration? I’ll check with our Engineering team if there is any other way. Kind Regards,
Upon further investigation of my Wireshark packet capture and a few more questions from myself, Netgear came back with this:
Hi Jon, In reply to both: “ I think that packet 145 is me leaving the new AP after tripping the break in the filter; but why are we seeing the Accounting stops in packet 121 and 123? There is no reauth evident after these stops, so is this when the auth fails? ” Accounting stop, shouldn’t be seen as a complete logout as far as I can gather from the RFC and cisco doc, but the filter is (and the radius login server isn’t it seems). As regards to the packets 121-125, that’s exactly what we’re talking about, it’s sending accounting stop (packet 123, which should start a “cooldown” timer) and if it sees a start (packet 125) within the cooldown period, it considers it the same session. Kind Regards,
So it appears as if the Smoothwall RADIUS implementation was not correctly handling Netgear’s distributed AP RADIUS roaming model; which Netgear Engineering were kind enough to highlight as valid and provide links to the appropriate section of the RADIUS RFC(!).
So the case was reopened with Smoothwall, now armed with the detailed information from Netgear…
Smoothwall then came back and said that there was a known issue with roaming on distributed WiFi systems and it would be fixed in an upcoming update. So I then moved the main system back to the SSL Login we were using here; whilst we began discussing with Smoothwall how to set up a VM to test this up coming fix on. Oh and one other issue we noticed in all of this: the dramatically increased demand for CPU load and RAM on the Smoothwall UTM whilst it ran these services; even more reason to look at the Smoothwall VM were we could throw more CPU and RAM at the device with ease!
So the WPA Enterprise requests were closed whilst we waited for the bug fix to be release, and in the mean time more support requests were opened up so that we could discuss getting a Smoothwall VM set up to resolve the load issues we had seen, provide more separation between the school LAN and BYOD and to test the fix when it became available; but that is the next part!