Time to do a little doomsday prepping for the folks on the IT floor. Cyberattacks are happening constantly and, eventually, one succeeds against your organization. There are things that will be available and absolutely great to have. They are the same things that, if you don’t have them, you wish real hard that you did. What should be in the doomsday prepper bug-out kit?
First on that list has to be an external hard drive. I specify an external drive because a PC off the network is too easily left unpatched and could accidentally be connected to a hot network, whereupon all its information gets compromised by the ransomware. A stack of papers in a sealed envelope wouldn’t be a bad way of storing vital information, either.
What would I put on the hard drive? I would start with current copies of network diagrams. Even relatively recent copies will do the job. If a ransomware worm gets into the network share where these are kept, it’s game over as far as sharing intel quickly with first responders.
Likewise, information on what SNMP communities exist and what devices they work with; SNMP v3 information and what devices accept those credentials; TACACS accounts that are not connected to AD that work; where network devices still have local accounts and those credentials; which devices do ssh with keys of length 1024 or greater; which devices are still stuck on telnet. Knowing this can do two things: help with getting access to determine if the network devices are compromised, and being able to make an educated guess about which devices and credentials are most likely compromised.
What else… how about a client installed on each PC that is able to monitor the activity on the PC and also run scripts with local admin or system privileges? This client should be able to access the system independently of AD, which could be compromised in such a situation. Enterprise software distributiontools can be damaged in a major outage, so having the scripting ability can invoke a hardware install from a known clean network share. Granted, the client isn’t in the external drive or sealed envelope, but it’s something I’d want in place for my IT doomsday prepping.
I want it because monitoring activity on endpoints is critical. Anything and everything that provides information for reporting is excellent. If it can provide spreadsheets that can be further analyzed, even better. If the client or AD account can reach most machines, but is cut off on a segment of the population, then it’s a good bet that the ones where it has been cut off are compromised. Those monitors might be able to find dual-homed devices that can serve as vectors of contagion. You’ll want to know where those are and maybe shut them down as part of the prepping.
But I’m just a network guy that does a lot of NAC work. I’d like to know what else would be good to have on that external hard drive? What would be good to have in a sealed envelope? Is there a way to securely store application code in the event that app servers are compromised? Speaking of servers, should we ensure that the server networks are properly segmented from the rest of the network?
In short, what are the things you would put into place if you were brought in to get an organization as prepared as possible for The Big One?
Here’s the scenario: a firm purchases a security solution. The firm skimps on professional services and/or rushes the schedule on implementation and/or neglects to maintain the product properly.
Do not be surprised when, one day, that security solution does something that results in a system-wide outage:
Fig. 1: System-wide outage
Why were those decisions made? Because professional services, longer timelines, and proper staffing/coordination are all costs, and we demand better return on investment!
The problem is that many security systems have the capability to shut down the entire network, or kill access to PCs, or other stuff that, well, keeps devices completely safe from threats by denying any access to them whatsoever. And while an enraged executive can satisfy his need to offer up a sacrifice to the shareholders in his firm by kicking out the vendor closest to the outage, there’s still the problem of cleaning up the after-effects. The vendor typically survives to roll out product another day, but the firm is left with the same problem as before – and will have to now go to another vendor whose product can be just as destructive as the first, if implemented incorrectly.
Fig. 2: Vendor making an exit from firm after system-wide outage
Worse, the firm may choose to reject all vendors of a particular solution and instead seek to eliminate all technology that requires such a solution with a Bold Move. “We’re going to get rid of all our Windowsworkstations and switch over to thin clients that run on burner phones, so we don’t need firewalls anymore.” Yeah. Good luck with that. This much I know: whatever product is mentioned as part of a Bold Move Strategy definitely has an amazing salesperson in that region. Chances are, that Bold Move is going to involve a purchase order that skimps on professional services, compresses timelines, and lacks proper staffing and coordination, which may result not in a system-wide outage, but an undesired result after a lofty promise.
Fig. 3: Undesired result after a lofty promise
This, in turn, can result in the executive that oversaw a failed vendor implementation and a failed Big Move taking an opportunity at another company. This makes way for a new executive to step in and try his hand at choosing between doing things on the cheap or doing things correctly. Because RoI is much easier to measure than the chance that a botched implementation results in a DoS, my money’s on the cheap.
Fig. 4: Another botched implementation of a security product…
Here’s the situation: there’s a company that has handed over all operational running of its network over to a third-party integrator. At first, it was with a thought that the company would save loads of money, but now the truth is known. This integrator charges by the line of code. The only way to save money is to never issue changes to the switch configs.
Along comes an auditor, and the auditor makes a finding that the company needs more network security. His change involves adding just one line of code.
The customer does the multiplication and comes to the conclusion this auditor *has* to be getting a kickback from the integrator.
It doesn’t matter if the integrator makes the changes by hand or if it automates them: the contract spells it out clearly, each line of code involves a charge.
It may come out cheaper to just fire the CISO every year, pay a fine, and never really fix the problem.
What are some other tack-on monetary cost barriers that integrators add that get in the way of security? I’ve seen quotes for a pair of firewalls that, in retail terms, would be as much as purchasing the same pair once a month for an entire year and still have enough money left over to cover my salary, albeit without benefits. I suppose if they bought the one firewall pair, the cost of the other 11 could be transferred to cover my benefits.
But I did more at my job than manage a single pair of firewalls – how could this be an actual savings? It was only a cost savings if we never purchased the gear in the first place!
And that ain’t good security…
Integrators also introduce non-monetary costs (and if I sound like an economist now, it’s because I used to teach Economics…) in the form of the time and effort it takes for their customers to get the paperwork put together to submit to them whenever a new system is introduced to the network. Does the product also need to access network equipment? Oh dear, oh dear, oh dear, that may be a problem…
… because the integrator uses the same management environment for multiple customers. If my product can access customer AAA in the integrator’s environment, it is only a few lines of configuration away from accessing everything from AAA to ZZZ in that environment.
That also ain’t good security…
Then there’s the time I submitted the request to have a firewall rule added to permit a group of 5 source addresses talk to a group of 3 destination addresses over a group of 10 TCP ports.
Did the integrator create one rule and three groups?
5 times 3 times 10 equals 150, the actual number of rules created by the integrator for my request.
Walk with me through a thought exercise… let’s say that two nations are at a high level of international tensions, just less than a full declaration of war. Let’s also say that one nation’s Internet access is tightly controlled and the other’s is widely available. What happens when the two nations engage in a darkwar in cyberspace?
By “dark war”, I mean one in which the nations can be pretty sure about who is sending cyberattacks, but they can’t prove it. They can’t prove attribution because that means either revealing sources and methods of intelligence collection or they simply don’t have any permanent, tangible evidence to work with. As such, the attacks go forward, as do the responses, but there is no public attribution of them, so they stay in the dark.
Back to the nations in the hypothetical example, the one that has tightly controlled Internet access is already set for cyberdefense. Its commerce and government likely do not rely upon Internet connectivity in order to run normally. Or, if they do require connectivity, it’s only with internal IP addresses, nothing or very little extraterritorial. As such, it is not much of a target for the other nation. Its networks are difficult to get into and wrecking them is little more than an inconvenience.
For the nation with widely available Internet access, commerce and government services depend upon the Internet as a lifeline. Without it, activity halts and few organizations are prepared for long-term offline activity. It is a target-rich environment in a dark war.
So let us say that attacks against the Internet-rich nation are increasing in frequency and cost. How can it protect itself?
First, it bans traffic from the attacking nation. Nothing allowed to or from it. This then leads the attacking nation to shift to another path, that of compromising systems in other countries, perhaps starting with those in the allies of the nation they’re attacking. This is where the Internet-rich nation then asks all its allies to cut off traffic to and from the attacking nation. Let us assume that they all comply. What next?
Next are neutral nations. Their PCs are compromised, the botnets made of their PCs then launch attacks. This is where things will get complicated, so I’ll start using fictional names for these nations. We’ll call the Internet-rich nation the United States of Shamerica, or Shamerica for short, and the nation with its local networks separated from the world Shiran. Shamerica has its allies Shanada, Shengland, Shermany, Shrance, and The Shetherlands all cut off traffic from Shiran.
But many of those nations have outsourced their IT needs to nations such as Shindia, Shungary, Shulgaria, Shalaysia, and The Shech Republic. If Shiran attacks through those nations, what does Shamerica do if only half of those agree to cut off traffic from Shiran? Companies with outsourced IT in nations that don’t cut that traffic, like Shindia and Shalaysia, will be ruined if their access to those outsourcers is suddenly terminated – and that will be a victory for Shiran.
But if the traffic isn’t blocked, then that will also be a victory for Shiran when it results in yet another major cyberattack successfully getting through.
Meanwhile, firms in Shamerica are dealing with a higher and higher likelihood of cyberattack from Shiran’s indirect methods. At what point is the likelihood of attack high enough to justify them spending appropriately on security? And how much can appropriate security cost without those firms deciding to disconnect entirely from the Internet and return to the days of paper ledgers and mail-order business? Would customers be receptive to such things, as they potentially promise less Big Datatracking of their lives and maybe even less likely identity theft?
Drastic, I know, to suggest people returning to physical mail and magazines and buying goods in stores, but we have to ask at what point is the certainty of a successful attack enough to where being connected to the Internet is too high of a risk relative to the costs of mitigation? Would the hypothetical nations of Shussia and Shina have to be involved, as well, or can Shiran reach this point all on its own? How do we correctly calculate the risk of being connected to the Internet, or will that always be a given until such point it becomes entirely clogged with attack traffic so as to render it useless?
Because, in a dark war, the only true equivalent of a bomb shelter is to unplug from the Internet. Any connectivity is making a bet that your defenses are better than the attacker’s weapons. Miscalculate, and you are damaged.
The city of New Orleans just got attacked and that made me think of the song about a train by the same name, whose chorus opens with that line… but this time, the question lacks the soft charm and slow nostalgia of Steve Goodman’s folk song. This time, the question is cold, jarring, unnerving. It’s not the first major US city to be attacked and made to be dark and it won’t be the last. The cities and other local governments of the USA simply aren’t going to be able to deal with cyberattacks on their own, so they’re going to be target-rich environments for state actors and the criminals they hire to detonate hand grenades to cover their tracks… or just the criminals who blow things up, you never can tell.
We can tell the cities and counties and states of the USA all we want about security and be met with the tired, nodding heads and empty eyes of IT staff that tried to tell the same message to their higher-ups. They know. They’re not idiots. They’re just faced with small budgets and political imperatives to get stuff done, no matter what. They know that when their town / county / state experiences a major breach, it will lead to the first time that entity seriously considered spending time and money on security measures. It will lead to the first time IT is allowed to do what it knows needs to be done, even if it’s done on top of the rubble and ruin of the past.
Do they have a perimeter firewall? Sure, but there was the time somebody high up got mad about traffic being blocked, so it’s set to permit all traffic by default. Do they have a datacenter firewall? Yes, indeed, right here in this box in the storeroom. It is fresh and ready to go. Do they have antivirus running on every PC? Absolutely. Well, we can only tell for sure on PCs that have antivirus running on them… we don’t know about the ones that have fallen out of communication with our software maintenance platforms.
Need I continue? Some of you are already at the point where you can bear the horror no more, but I must press on! You must see more, that you know the depths of their helplessness! Do you see the unsecured Internet line in that office, terminating on a Windowsserver with RDP running, no limit on logon attempts? Do you see the flat network, with telnet still running on switches and routers? Do you see massive file shares with no permissions set to halt normal users from deleting or changing files? Do you see the backup server that constantly fails its nightly backups, with the backup operator simply clicking through the errors on his shift because he was told long ago to just ignore them? Do you see the gear that all respond to the SNMPcommunity “public”?
And there is more horror in there, I say. I didn’t even get to the Windows NT 4.0 server that’s still on the network. Why? Well, the payroll application couldn’t upgrade to run on Windows 2000, so we keep it going on that server over there… and there is yet more, deeper and deeper into hell.
Who knows what static routes lurk deep within the network, routes that bypass the firewall entirely for special IP addresses in faraway lands where US lacks extradition rights? And are there programs on unsuspected and unsuspecting systems that are just counting down the days until the dust settles, things revert to normal, and the problems of the past make themselves available for mayhem once again? Clean up all you want, but what do you do if that payroll server on NT 4.0 is infected? The only person who can rebuild that system died 3 years ago. If it’s infected, maybe we can just put it behind a firewall and only open the ports needed for Windows and Active Directory. Oh wait, that’s all of them…
So what is the solution? Is this where the federal government steps in and supplements the IT budgets of local government entities? Or would that lead only to swollen management salaries with pittances spent on actual new technical hires? Is this where the feds create a system of firewalls to filter all traffic entering and leaving the nation, such as the Chinese do?
Actually, that might be what we need. It wouldn’t do anything for completely domesticattacks, but it could do at least something to halt attacks from outside the USA, right?
Except… how do we know the difference between legitimate traffic from abroad and traffic with malicious intent? Encryption doesn’t allow one to peek into the packets very easily. Banning known bad source IP addresses just leads to attackers compromising systems with other IP addresses and then launching attacks from there.
But maybe the protection is on the outbound side, with a massive proxy server cutting communications with scam sites and other evil online in other countries. But for how long would the proxy server be protecting us only from malware and fraud? Wouldn’t law enforcement argue that we need to be protected from terrorist propaganda? How broad is that classification? Wouldn’t entertainment firms want to protect us from download sites? Would they also want to “protect” us from foreign entertainment outlets that didn’t allow them to act as middlemen brokers for their content? Would we also be “protected” from foreign news sources that didn’t go along with the administration’s views? BlockingRussian state news propaganda I wouldn’t mind, but I sure would mind if a CBC or BBC investigative journalism programme that was critical of a US firm or governmental policy was blocked.
I hate to suggest this, as it’s highly exploitative, but we could allow recent grads to learn IT and then work for pathetic, near-volunteer wages for local government entities in order to pay off their student debts. I hesitate to introduce a scheme to offer pardons for nonviolent offenders that do pro bono IT work, since fraud and cyberattacks are, themselves, nonviolent crimes…
The City of New Orleans owns Louis Armstrong International Airport. Did this recent attackpenetrate into the airport? Or was the firewall that is supposed to sequester it also permitting all traffic because there’s a full trust between its ADdomain and the City’s? Or for some other reason, I don’t care. It’s all a nightmare, and when I wake up, there’s some shadow moving across my screen, saying, “g00d m0rn1ng 4m3r1c4, h0w r u?”
I don’t know how to answer that question. I normally don’t want to curse the darkness without lighting a candle, but I’m at a loss for answers to all the questions I asked. Cyberattacks can produce near-nuclear results, if done on a sufficient scale and with intent to destroy, not just encrypt and demand ransom. Perhaps lasers and hypersonic missiles can defend the USA from sudden attacks launched from bombers, ICBM silos, or nuclear submarines. What good are those against cyberattacks that target our highly vulnerable small government entities?
I’ll open with my premise: if security does require imagination, then we’re in for trouble. So we’re going to need an answer for that question, and I’m afraid the answer is “yes.” Let me explain…
I was recently chatting with a colleague about how I enjoy my job. I thought I was talking about my passion for security, but he heard differently. He heard how my imagination and curiosity were prerequisites for my successes. He pointed out, “If someone doesn’t have the intuition that you have, how is he going to do security successfully? He can fill out a requirements list, do an audit checklist, follow regulations, but how is a person without that imagination going to be able to go beyond that and really get security done?”
In my role, I sometimes get a chance to deliver training for the product I support at $VENDOR. In those classes, I always enjoy a good discussion, when the participants are lively and engaged. But that’s not every class I’ve taught. I’ve taught classes where I had to help winkle out the answers from the students with leading questions. I’ve had students that may have been innovative and clever, but who did not see their future at the company that paid for their training. Demoralized and discouraged, they had no interest in applying their wits and insight to their current employers’ needs.
So, we need imaginative *and* motivated employees to do security right. Great, that really tightens up on my premise. Adding that “motivated” adjective cuts deep into the “imaginative” group. The imaginative ones tend also to be ones that need the best motivations to stick with their roles in security, so that makes the effective security professional even more of an endangered species, if not an outright unicorn.
I’m not going to go deep into the game theory of career path decisions. If one threatens to quit over an issue at work, one either gets passed over for promotions and opportunities because one is seen as a short-timer, or that threat becomes stale if used more than once or twice. Therefore, one doesn’t threaten to quit, one simply quits and moves on. If firms want to retain the imaginative by keeping them motivated, then those firms have to be proactive.
But back to those imaginative people… do firms really want to retain them? Those imaginative people can be high maintenance types, you know. Is it better to keep the “bread-and-butter” types on the payroll and let vendors, VARs, and outsourcers worry about managing the artistes of our profession? After all, we don’t need imagination all of the time. Quite a lot of work in security is simply painting by numbers. What are the vendor best practice recommendations? Follow those. What are the regulatory requirements? Implement those. Maintaining codeblocks, IP address assignments, switch configurations, application stores, document libraries – you and I both know that there’s drudgery in those tasks, and any level 1 tech with a runbook can handle them.
So when, exactly, do we need the imagination? I know we need it when analyzing the data. Yes, algorithms can sort through quite a lot of noise to get to the signal, but what does the algorithm know about things it could not have been programmed to handle? Leave zero-dayexploits aside, we have to know what to do when there’s a new production application in play! It takes imagination and initiative to think of what that new signal might be and who to ask about it so that it can be exempted from blocking rules.
We also need imagination after a breach. There’s chaos and mayhem all around, and it takes some proper cleverness to think of all the other evil that could be taking root as that chaos and mayhem distracts our attention. We need multiple imaginations here, not just one. Different eyes, different minds, different experiences can inform a broad range of responses that build off of each other.
But before the breach, we could certainly use imagination in red and blue teams experimenting with both ways to penetrate and ways to mitigate. Someone has to ask the questions about the environment that lead to fuzz testing and investigations. There’s no way to put “think of something new” in a runbook, the human mind just doesn’t work that way.
There’s also a call for imagination not on the technical side, but on the process and procedure side. We have to be creative in how we submit requests and apply for resources so that we don’t get shot down or delayed. This isn’t out of the box thinking – the people on the other end of the request will reject anything that doesn’t conform to their box. This is inside the box thinking, except with the ability to somehow merge normal spacetime into a singularity that allows for bypassing internal red tape while still, overall, complying with corporateprocesses and procedures.
So, we’ve got a problem, as I mentioned at the outset. We need creative, imaginative people, and those types simply do not grow on trees. (In point of fact, no humans grow on trees, it’s something to do with our mammalian biology, as I understand…) And while we can encounter a few natural gifted visionaries in the wild, there simply aren’t enough to go around for all the needs of all the firms in the world.
That leads to the question: can we teach people to be creative?
And if so, who is responsible for that?
While my educationexperience gives a firm “yes” to answer the first question, I’ve got no answer from experience to deal with the second. I would suppose that the firm that desires creative people needs to be about the business of teaching them, but I don’t see any programs that are geared for that. Let’s face it, most of security training deals with learning the tools, technical stuff. Where in our profession do we see training that gets people to think creatively?
As I typed that, the answer came to me – look at our end-user security training. We teach people how to spot phishing attacks, social engineering, things like that. Not everyone passes that training brilliantly, but enough people do to show that it has value in and of itself, but also as creativity training. To successfully deal with a phishing attack, for example, we tell people how to analyze certain data and evaluate it. We don’t provide a list of all possible bad links to click, but we do have a few short rules on how to spot them. And, unlike an algorithm, the human mind can adapt and extend lessons to new situations with ease.
Maybe, then, we don’t have trouble. We just have a need to perhaps change our accounting rules and consider people as unique assets that can be improved, not identical widgets that can be swapped interchangeably. But I can guarantee that it’ll take some imagination in order to close the imagination gap at where you work.
More than once, I’ve been in the meeting where someone is questioning whether or not to get a particular security system. This someone asks, “OK, so if someone has the CEO at gunpoint and forces him to log in to his PC and then takes pictures of the documents visible on his screen, then blackmails the CEO to say nothing to the local police as he slips away into the shadows and to a foreign nation where extradition is difficult, will you be able to stop that data exfiltration?”
And then that someone crosses arms and boldly states, “Then why bother with all this trouble if it’s useless against a *real* hacker?”
Now, maybe it’s not exactly that scenario. But whatever’s offered up is an advanced use case that even the tightest of security nets would have trouble catching. And if the current state of the IT environment is where someone could bring a PC from home and copy all the files off the main server, maybe that group of advanced use cases isn’t what anyone should be worrying about right now.
Which is why it’s important to consider such exotic cases, but rate them for what they are – exotic. When someone brings up a basic use case that is well within the capabilities of the security product to restrict, rate that as a basic case that will be among the first to be dealt with as the system is introduced. As the system matures, then the more mature cases can be considered.
I deal with NAC in my role, so I see the range of use cases all the time in my meetings with customers. Block a PC that isn’t part of your firm? This is not difficult to do. Block someone spoofing the MAC address of a printer? Well, that’s more than a basic task. I have to ask how we can tell a legitimate printer apart from a spoofed device. If there is no way to tell, then we have to ask if it’s possible to treat all printers as outsiders and restrict their access. This is where maturity comes into consideration.
Maybe we just proceed forward with the PC use case and think some more about that printer issue. Perhaps once we have the PC use case dealt with, there may have been time enough to set up an SNMPv3 credential to use to log on to legitimate printers. Maybe there was enough time to determine how to set up printer VLANs and restrict them. If so, then we’re ready to deal with that printer issue. While we’re doing that, we could be thinking about how to handle the security camera issue, or something like that.
Each environment will have different levels of maturity for their use cases. Perhaps at one firm, it is easier to deal with securing PCs than it is with MacOSs. At the next one, they could have a better handle on their MacOSmanagement than they do with PCs. Maturity could simply be deciding between equally-difficult tasks about which one will be done first.
Maturity can also be seen in calling out when a use case goes beyond the capabilities of the product under consideration. A proxy server does not provide its own physical security system, for example. So, if we entertain scenarios in which physical security is defeated, we should be tabling those until we’re looking at a physical security system. By the same token, if for a scenario to be plausible another security system has to be defeated, then that begs an argument about the safeguards and durability of the system that has to be defeated, not the one under current consideration.
We also see maturity in getting different systems to work together. Being able to automate responses from one system to another gives firms the ability to deal with increasingly advanced threats. All the while, as long as we keep a perspective on how mature our security systems are, we know what level of threat we can deal with.
A major reason people don’t want to buy more home automation technology is security. Not only is this a response given by 42% of respondents to the question “Why don’t you want to buy more home automation devices?”, it’s also my response.
When I get a device that will be internet-enabled, I agonize about how soon it will be before that device becomes a botnethost or worse. I do a little pen testing, I change default passwords, and I’m happy to say that my existing devices are either pretty darn secure or at least more secure now than when I first plugged them in. While I’m sure that there’s a person with at least above average intelligence out there that can pop these devices if given local access, I’m also sure that their traffic isn’t exposed to the Internet and I’ve got reasonable security with these things.
That being said, I don’t want to go through that for any other devices. My televisions have to stream content, my security system needs to connect to the monitoringback-end, and, uh… that’s all I’ve got. My robotic vacuum cleaner has no Internet – I paid more for that lack of feature, as it happens. My appliances all keep to themselves. I work from home, so my thermostats are right where I’d like them to be, no need to be online with those. It looks like I’m also in line with 49% of respondents who also indicated that they’re not buying more home automation because they don’t see a use case for the technology.
But even if I did, I’d have to ask, “is it secure?” And that’s not just the device itself. Maybe that new Internet-enabled barbeque grill is locked down tight, but what about the app that runs it? Or the app that runs any other system in my house?
Security doesn’t just mean making sure the kid down the road from me doesn’t pwn me when he does his daily wardriving. It also means that when I do something with it, it doesn’t suddenly affect my Google search results or trigger a Facebook ad. It’s bad enough that when my kid sends me a link to a stupid YouTubevideo that I have to spend the next few weeks telling algorithms that, no, I am NOT a fan of Korean boy bands. I don’t need this to happen because I change my thermostat or order groceries. Yes, there’s also the concern about private information. And while I can change default passwords and block ports, that does nothing about my info going into advertisers’ data lakes.
In fact, what other reason is there to have an Internet-enabled dishwasher except to send me more ads? I mean, if I forget to run the dishwasher before I leave home for the day, I can run it at night. If it’s before a big vacation, I can text the person that’s going to feed my cats to punch the “start” button. I’m happy to pay more for an airgapped dishwasher precisely because I want informational security, not just device security. Remember my comment about the vacuum cleaner? That applies to any other appliance. I want to keep that stuff to myself, thank you very much.
Let me set the scene: a customer asks about being able to trackusers that bring up unauthorized VMs on Windows machines. He explains that he’d like to look at the 192.168.0.0 RFC range to see how many addresses we see in that range. That’s OK by me, all I have to do is add that to the scope of the networks we track…
At that moment, we only looked at 10.0.0.0/8. I added the 192.168.0.0/16 range and we watched the new devices pop up into the discovery window.
And then we watched as those devices started to churn… the IP addresses stayed the same, but the MAC addresses kept changing. Loads of Netgear, Arris, Cisco-Linksys, Belkin, TP-Link devices… what was causing all this?
The horror! The horror of the home networks!
And then it dawned on us: these were all teleworker home networks bleeding into the corporate network estate! The traffic to and from 192.168 networks wasn’t supposed to be routable, but here it was, coming and going and getting picked up on the SPAN sessionmonitoring north-south traffic at the datacenter gateway.
192.168.1.1 and 192.168.0.1 were the addresses that changed MAC addresses most frequently. No surprise there, as those are default gateways on oh-so-many home networking products. 192.168.1.254 changed less often, as that was the default gateway on Arris routers used for AT&T broadband networks (I used to have one, so I know) and only a handful of other home devices. I saw Nest controls, Roku streamers, gamingsystems, the works. And all of this was exposed to the customer network, and all of the customer network was exposed to these environments.
Granted, there was going to be a mess as far as being able to route to any endpoint for much time, but the IP addresses that were less commonly used were also the ones with the most persistent MAC addresses and connections. The biggest concern was that the customer did allow any guest traffic on the wired network – but here were untold numbers of guest devices, the kind that don’t usually show up on BYOD networks!
Moral of the story? Those teleworker devices for home office networks are part of your perimeter. Make sure you keep an eye on those points of entry, as well as the big one you pay the ISP for.
I remember the first remote management and monitoring (RMM) solution ever, the venerable and wonderful “ping”. We would use it all the time to see if a remotehost was up and responding. And then, one day, someone wrote a program for Windows, Whatsup, and the world was changed forever. With that program, we admins could enter multiple IP addresses and that tool would ping them all day and night! It could even be set up to generate alerts.
We thought we had it made until someone asked, “Hey, I know I can ping the SQLserver, but is it responding on TCP 1433?” At that point, we knew both that we needed more in our app and that there would be other admins, with other network ports, who would make similar requests. And so began the development of RMM tools.
At small companies, RMM may very well be not much more than a shareware ping/telnet suite that checks for hosts being up and responding on critical ports. It may involve learning multiple suites of RMM tools, roughly in conjunction with the trial period for one tool ending and a download for the new tool being complete. Most of what goes on is just monitoring, not management (does that mean they consume R_M products?), as there are few enough systems to manage where ssh and RDPsessions to the several devices that need management are sufficient.
Once we get to a medium company with multiple sites, that SSH/RDP solution for everything simply fails to scale. It’s time to lay some money out and actually pay for an RMM solution that will track those uptimes as well as do some kind of configuration management. Everyone makes demands of that config management solution – will it do rollbacks? Will it do point-in-time recovery? Will it track changes made outside the product? Will it enforce certain configuration parameters? Will it integrate with the helpdesk ticketing system?
The answer to all of those questions is either “no” or “yes, at an additional cost.” Nobody rides the RMM train for free.
And it’s not like that RMM will magically never make mistakes. We’re still in a garbage in, garbage out world. More than once, I was working on a project to integrate our routers and switches with a tool by pushing code to them with the RMM solution… only to have that code get overwritten because a different team pushed a change with an outdated template. So what’s the policy and procedure for undoing a change that was done in error? I found that part out the hard way as I waited for the next change window to get my changes put back into the environment.
I’ve seen RMM tools that can’t push version-specific code. Well, they can, but they don’t keep track of versions, so it’s a guess or a logic problem to figure out which devices are on which version. One solution I came up with was to push one line of code to all devices, knowing that it would fail for devices on the older version. The next push checked the config to see if that line I previously pushed was in the config. If so, skip the device. If not, then push a line of code compatible with the older versions. Would I have preferred that the tool have the intelligence to do a version check and then push the appropriate line of code, all in one go? Yes. Yes, I would. The biggest irony to me in this particular case was that the RMM tool was made by the vendor of the devices that the tool couldn’t track the version on. Very disappointing…
And then there’s RMM at the large corporation. Thousands of switches and routers, some on very dodgy Internet connections, all of them being monitored. This means the poor sap with the on-call phone is constantly answering when the NOC calls in to say that the Dakar site is down. Or the Guadalajara site. Or the Noida site. Or the Ho Chih Minh City site. Or the Chengdu site. Or the Narvik site. Or the Deadhorse site. And the NOC guy reads out the entire device name and IP address, letter and number by letter and number, so one has to sit and wait through it all before saying, “Acknowledged. Please open a ticket with the ISP.” I can’t remember a happier day than when the policy was finally re-done so that the NOC would just open the blasted ticket on their own without requiring acknowledgement from engineering.
Still, we were blessed in that we had nearly every switch under management. This did have one side effect, however… we wouldn’t believe a switch existed if it wasn’t in the RMM tool until we saw it listed as a neighbor on another switch and pinged it. That’s when we discovered that some switches couldn’t be brought into our RMM tool because they didn’t support the SNMPv2. Or because nobody could remember the password to get local access and nobody had the nerve to take it to ROMMON mode to break into it. Or because the local support contract kept that gear out of our global tools.
Those problems were relatively straightforward compared to getting gear from specialty vendors into the RMM tool. Not all of them had the same implementation when it came to reporting, even things as simple as disk space and CPU usage. For disk space, does the vendor report total available space, across all volumes, or will it send an alert when one particular volume hits 95% capacity? Will it report overall CPU utilization or will it fire an alert when one of 16 CPUs goes over 90%? The answer is, of course, “It depends.” That means that alerts from some vendors actually aren’t alerts, they’re more like transient conditions of no great importance. It also means that some vendor gear could be in an alert state, but it doesn’t actually report it as such, given how it implements a particular SNMP MIB.
At all companies, there’s the issue with keeping the tools up-to-date. The day that the tool is launched for general use is such a bright, shining moment in the history of the progress of humanity, with all the devices that need monitoring in that tool, right where they should be. Within a very short time – overnight, in some cases – the information in it is obsolete. New devices aren’t added and decommissioned devices are showing red because nothing is reporting back at that IP address… and then they go green again when that IP is re-used, but we just haven’t realized yet that it’s a security camera now, not a loopback address.
Finally, there’s the issue of access. Even at the small company, not everyone who wants to know if a system is up will have access to the RMM dashboard. At larger and larger companies, access to that dashboard can get limited to the point where even the network engineers can’t look at it… or the tool is so cumbersome, there’s severe mental pain involved in getting information out of it.
And that’s why, even at a massively huge global megacorporation, I still got plenty of use out of running a shareware app that would ping a list of devices, so I’d know if they were up… it wasn’t an official tool with management and headcount assigned to it. It just ran on my desktop and running it meant I wouldn’t have to open a service ticket to ask someone if they could check to see if the RMM had a green dot by my device or not.