Patching is not easy, first you need to have a thorough real time inventory of every system, then you need monitor and be alerted whenever a new critical patch has been released for every system you have. Once you know what to patch, you need to start the patch process, which starts with finding the right patch and trying it in Development and Staging, make sure the system works as intended, so you need to have a full set of tests to run (automatic and/or manual) and then patch in production trying to minimize uptime, like patching 1 server at a time, making sure each server being patched is removed from production traffic first. Then test with 1 patched server/canary in production for some time to test it in real conditions, and then patch the rest. Of course the best scenario would be to rebuild the server from scratch with the patches, but for old systems that’s probably not the case.
Equifax should have done that, they should even have turned off the system so the attack is window is minimized, but we don’t live in the perfect world, the inventory system could not work properly, the patching process could take more time if it doesn’t work, and the attack could have happened the following day the vulnerability was published, so no actual time to patch, or in the worst case scenario it’s a Zero day being exploited first on your infrastructure.
So, what Equifax, or anyone else in the same situation should do?
I created a list of recommendations, it’s not everything but it’s something, of course not everything is applicable to every organization, due to resources or criticality.
To capitalize all the work and investigation I done, I figured I would make an exercise all the software/infrastructure engineers I work with.
I gathered them and proposed them to come up with a list, I acted as a facilitator.
The list I came up with is split into three categories, Design, Preventive and Detective controls. They are mostly geared at, a Zero-day gives an attacker remote code execution on an application server.
Design
- App servers should not have direct access to the DB
- Access should be through fully maintained APIs
- Those internal APIs would require authentication, if 1 authentication fails it should alert the SIRT.
- Sensitive data should be field level encrypted or hmac'd or tokenized.
- Not all internal customers need all the sensitive data.
- For example to perform a search by full SSN having the SSN in the database is not needed it can be hmac'd or tokenized.
- Encryption keys in the API with access to the DB should not be available in the instance, it should be injected when the App Server is created and later destroyed.
Preventive
- WAF with virtual patching. Legacy systems should use WAF that learn positive traffic and block negative traffic.
- Was the site/api intended for the public, if not it should have been protected with VPN.
- Depending on the caller it's the information it can be searched and retrieved.
- The API should rate limit its internal customers and limit the amount of responses.
- There could be rate limits that alert and rate limits that also block.
- Instances should be destroyed often (Max weekly), so attacks are not persistent.
- Have code and library scanners scan dependencies/libraries for vulnerabilities in code repositories, and/or CI.
- Run library inventories in production instances and check if vulnerable.
- Proactive patching: patch often, for this you need to have proper integration tests, so you can patch with confidence. Ideally when deploying you should deploy the new infrastructure in steps while the old one still works, when done, leave the old one inactive for a few days for rollback.
- Run external and internal vulnerability scanners for infrastructure and web apps.
- Limit the amount and size of results of DB queries (by DB config and/or query)
- Network must be segmented on a need to have basis.
- Restrict outbound internet connections, ideally make them go through proxies with whitelisted domains/IPs.
Detective
- If the internal customer tries to get more information than allows it must alert the SIRT.
- All instances should run software like auditd/go-audit as HIDS or have a HIDS.
- Unexpected instance behaviour should alert SIRT.
- Attempting to get the encryption keys through not approved means should alert SIRT.
- Network Traffic should be monitored, a deviation from the typical network traffic for an instance for an specific date and time should alert SIRT.
- SIRT should have clear runbooks on what to do when any of this alerts comes up. e.g. Isolate the instance, review logs, turn off acccess to sensitive data to a system, etc.
- Have DLP solutions (internal/external). Monitor DB (DB Firewalls) queries and look for abnormalities, and for sensitive data in responses, and that response sizes are not abnormal. Each service should have a DB user, so the Solution can profile by user.
- Have data canaries that if read it alerts.
- Install honeypots.
A few weeks later I did this exercise I found this recommendations that I think are quite good and aligned with mine.
https://icitech.org/wp-content/uploads/2017/09/ICIT-Analysis-Equifax-Americas-In-Credible-Insecurity-Part-One.pdf
Of course what I wrote is not something new it’s just applying standard InfoSec common knowledge.
Let me know if something else should be here or if you have any question.