Hacker's Guide to the Galaxy

Or correctly guessing Your SO's creds based on the information they share with you

15-min read

I should probably focus on Cracking the Coding Interview. Instead, I came across this tweet — spurring the idea of detailing my approach to determining a user’s credentials via various means, whether that is through FLIR cameras, general stupidity, wordlists, or OSINT.
Disclaimer: I do not support penetration testing into national systems or any systems for that matter without authorized permission. This was purely an academic exercise.

Hint: focus on the Twitter username.

Extracting ATM pins w/ Infrared Cameras

Forward-looking infrared cameras aren’t particularly new — they’ve been around for ages, conveniently classified for military appliance until the technology was deemed appropriate for general use. Then also, FLIR only touched retail in the past four decades, first focusing on more profitable (and frankly more suitable) endeavors: settling in the niche B2B services industry of electrical and pipeline maintenance, outdoor adventures, and emergency rescues. Fast forward to 2014, industry leader and aptly-namesaked FLIR Systems launched their first consumer-focused mountable attachment — as a mobile case. Skimming their technical specifications, one can see the relevant Imaging & Optical Palette they use to color-code their images.

Gray (white hot), Hottest, Coldest, Iron, Rainbow, Rainbow HC, Arctic, Lava and Wheel

A picture here might be worth more than the above dozen colorful words:

Instead of dropping €300 on an entire smartphone-compatible system to catch your dog redhanded, better yet, put UV pigment in a bottle of hand sanitizer, then use a blacklight to trace their last steps.

Trust me, I love the resurgence of this technology. On my very own platforms, I’ve implemented Hotjar, which analyzes site traffic to generate heatmaps and recordings.

Armed with Google Analytics, I can, with a ~80% confidence, identify who is visiting our websites according to their frequency, activity, and location — even with anonymized data

And this entire flow is automated, neatly packaged, and shipped to my inbox week after week for me to glean over. If a lowly web developer can wield this kind of power, imagine what governments (NSA) and corporations (Palantir) are capable of achieving. But what if you’re looking for a more professional (or more nefarious) solution?

In Heat of the Moment: Characterizing the Efficacy of Thermal Camera-Based Attacks, a handful of UC San Diego researchers could theoretically guess the right pin code (typed into a keypad) 80% of the time if the image was taken immediately after. If the image was taken a minute later, they had a fifty percent chance of guessing correctly.

Another unnerving aspect of this potential vulnerability is as the heat signature fades with time, one can figure out the order in which the keys were pressed, with the dimmest (coolest) button pressed first and the warmest (hottest) pressed last. With relatively fewer numbers of permutations and combinations and a bit of solid guesswork, you can accurately predict and gain access. For safety deposit boxes use the opposite element — perspiration i.e. water or sweat — and reverse the methodology.

Should you now start removing rubber and plastic buttons, replacing them with metal keys? Are we always going to be defenseless to high-functioning sociopaths who measure our body temperature to see if we are lying about our unprofessed love?

TL;DR? You’re fine. Regular image sensors only detect near-infrared and not the mid to far IR radiated by hot objects. I’ve run ‘tests’ on DSLRs with the filter removed and wasn’t able to remotely capture the same level of heat on plastic buttons as the paper.

Or you can just follow my handy guide to safeguarding the (hopefully less than) ten bands in your checkings account:

  • After use, an attacker requires physical access to the device in less than a minute for a higher success rate. Cherish the card machine for a minute longer. Maybe kiss it for good luck, but remember to be COVID-friendly though.

  • This attack vector won’t work on all keypads — metal keypads reflect IR like a mirror, are highly thermally conductive, and dissipate heat quickly, which doesn’t allow for a thermal signature to be left behind.

  • One can easily thwart attempts to extract the passcode by resting their figures on the remaining keys. This simple precaution produces a meaningless thermal signature. This isn’t even particularly hard, just smash the keypad a few times, or pass the bill to your inebriated friend right after.

Stupidity — The U1timat3 Anaphrodisiac

According to DataGenetics, 10.7% of the population chose 1234 as their ATM pin code. Ignoring this egregious error, there are a number of ways a person is vulnerable to credential theft, besides one’s own foolishness.

Statistically, one third of all codes can be guessed by trying just 61 distinct combinations

If you’re looking to choose the world’s least common pin then you could potentially go with 8068, which is statistically the least likely pin code (based on frequency). But that might have some unintended consequences, for instance, the least popular four-digit sequence might end up bubbling up to the top, like some sick twisted game of Nash Equilibrium, or for more gamer-inclined minds, akin to Gandhi in Civilization 1:

Allegedly, there was a bug that caused Mahatma Gandhi’s aggression to go to the maximum value of 255 due to an integer underflow error. In short, his aggressiveness stat is low enough that, upon adopting the democracy option and incurring the government’s inherent drop in militancy (-2), his default one point of aggression would sum to a negative value. This would cause it to wrap around to the highest value and subsequently turn Gandhi into a genocidal maniac. I’ve solved the complex math:

1 + (-2) = -1

‘Its fair to say that Gandhi could, on occasion, seem a little unnecessarily zealous,’ Sid Meier, main developer and designer of the turn-based strategy video game, concludes.

The existence of this bug was denied by Meier, and has no mention on the internet prior to the fifth installment of the award-winning game, Civ V, where Gandhi has an unnervingly high nuke rating (possibly as a homage.)

Assumptions based on data and empirical evidence can backfire and instead make one shoot themselves in the foot. Don’t be a sheep. There is always a relevant xkcd:

I am a sheep. I shot myself in the foot. Enter Security.org to call me out:

In a paper by the Computer Laboratory team at the University of Cambridge, they utilized a regression model to identify the factors which influence consumers picking a particular pin code combination. I’ll append parts of the abstract for the click-lazy:

We find that guessing PINs based on the victims’ birthday, which nearly all users carry documentation of, will enable a competent thief to gain use of an ATM card once for every 11–18 stolen wallets, depending on whether banks prohibit weak PINs such as 1234. The lesson for cardholders is to never use one’s date of birth as a PIN. The lesson for card-issuing banks is to implement a denied PIN list, which several large banks still fail to do. However, blacklists cannot effectively mitigate guessing given a known birth date, suggesting banks should move away from customer-chosen banking PINs in the long term.

People are stupid. Or Naive. I prefer to harbor on the latter for the sake of humanity. We choose recurring passwords for consistency (2244), patterns that can be etched into memory (2864 - north, south, east, west), or important dates (birthdays, most likely mothers). How hard would it be for an acquaintance to ask about your mother’s birthday? Or know about your affinity for compasses? Or slight OCD tendencies?

Know your customer. Their likes, dislikes, their innermost desires, and some sequence of numbers that might make them exploitable.

According to our computed model, the following blacklist of 100 PINs is optimal: 0000, 0101–0103, 0110, 0111, 0123, 0202, 0303, 0404, 0505, 0606, 0707, 0808, 0909, 1010, 1101–1103, 1110–1112, 1123, 1201–1203, 1210–1212, 1234, 1956– 2015, 2222, 2229, 2580, 3333, 4444, 5252, 5683, 6666, 7465, 7667.

Remember to swap these out a year down the line.

Enough numbers, let’s move onto characters. In 2015, security researcher Mark Burnett released a list of 10 million passwords. The data consisted of

  • Portions of most major data leaks in the last five years where there were plaintext passwords leaked or published later

  • Thousands of smaller leaks found on Pastebin and similar text-storage sites

  • Google Searches/ Alerts for leaked databases

  • Forums that share premium file-sharing and gaming accounts + porn passwords

  • Dumps discovered through torrent searches

There is a defunct subreddit dedicated to sharing research on the list. Some of the most popular posts performed Levenshtein distance between usernames and passwords, checked patterns/ character counts, and listed potential flaws of the set.


Slight digression: 69 was the 3rd most used pair-combination. A commenter replied:

I assume because most users were born in 1969?

This reminds me of when my mother wished her neighbor on April 20th because the Wi-Fi network name had 420 appended at the end of his SSID. Another one:

Unrelated, but did anyone else notice that the number 5683 was one of the top 10 most common PINs? I can’t think of a single reason why that would be.

It spells ‘LOVE’ using the alphanumeric code on the keypad.

The patterns of numbers in passwords offer a brief glimpse into human psychology, similar to the insights gleaned from the pin code paper. Birth years are a common trope, as well as repetitions (111/ 999). 768 is an important number for Muslims.

Another insightful comment from Hacker News:

When sites require a digit, everybody appends ‘1’ to their usual password. The exponential declining frequency of subsequent digits is because when passwords ‘expire’ folks just add 1. The short lifetime of site usage results in that decline. Just thinking out loud.


But the onus isn’t only on users. Strings like Password2020used to score high on password strength meters because it fulfilled the abjectly bad base criteria being:

  • Longer than 8 characters, and

  • One of each small letter, capital letter, digit, or symbol.

Take the Password Test. If you’re curious about whether your password or email has been compromised check HaveIBeenPwned. The website doesn’t store any passwords — their Privacy Policy has a brief on how they accomplish this. Technicals for nerds.

HSBC forces you to use a six-digit pin code compared to the customary four digits, increasing the number of combinations hundred-fold (10,000 -> 1MM). But they also renege on other responsibilities — like providing an appropriate login user interface.

And God save you if a bank limits the max length or bans the use of special characters.

TL;DR? Rotate passwords regularly, definitely don’t reuse them, and forget your mother’s birthday for the sake of your bank balance. And don’t use xkcd’s correcthorsebatterystaple. Do we really need to worry though? Not really.

  1. The dataset is not representative of all passwords. The list also has no indication of any source or user attitude. Awful websites with awful security will store awful passwords. If one would go on a flower companies website they wouldn’t care to use a strong password, especially if it’s a one-off visit. Their bank credentials and social media accounts will most likely have a more well-thought-out password.

  2. password is #2 on the list only because frequency can’t possibly decrease. In retrospect, it might be beneficial to have a list of common passwords that people don’t end up using — one’s that crackers keep regurgitating when attempting to access your account. Your unique passwords could theoretically never reach the top of the list because they’re unique. Plus, the dataset is incorrigible:

Many dumps include passwords in a hashed format that requires you to crack them yourself.

  1. Use LastPass or any other password manager to securely store and generate high-entropy strings. My hot take is you shouldn’t even remember your credentials. The less human intervention the better. People introduce errors.

  2. Research suggests that users’ first option for the language chosen for their password is their mother tongue. Add some french in there. I can help.

What can companies do on their side?

  1. Like the Cambridge Team suggested, companies can take the leaked dataset, run the uniq command to filter out duplicates, and create a blacklist to deter people from choosing the easiest path. That’ll easily make the list redundant.

  2. User-accessible features that assist in pasting passwords from password managers instead of creating one’s own convoluted login system. HSBC, please.

Big Black Box

Let’s say you’re tech-savvy enough that you follow all the above precautions. You’re emailing confidential information and have redacted the necessary fields. You’re still not technically safe. There have been many advancements in software tools that make it surprisingly easy to recover text from pixelated images or PDF files.

Not only can the length of the redaction give an estimate on the password length, but for variable width fonts, one could even rule out many passwords using pixel measurements between the text on either side.

It took a while to convince people to not use a bit of Gaussian Blur because it’s extremely insecure. Well, get ready for round 2. Case in point: the child abuser who was caught by simply reversing a Photoshop filter.

Police [essentially 4Chan] took a photo with a ‘swirl’ effect of the paedophiles face and reversed it to reveal a very usable picture. So good in fact he was found and arrested.

One can ostensibly reverse-engineer the underlying blur encoder and in turn, decode any screenshot. It’s a cat-and-mouse chase × arms-race. Why don’t people use black boxes properly? Simple: it attracts more attention. A redacted rectangle with sharp edges has much more contrast than all the other elements in the image, becomes visually dominant, taking the front row seat.

A pixelated area communicates more clearly that there’s information present that is hidden from you- but not from others. It also can’t be mistaken for a design element.

In 2017, two French ‘hackers’ reconstructed a blurred-out cryptocurrency QR code on TV and claimed bitcoin cash worth $1,000 (or £760 at the time of discovery).

Sassano and Storck, [the hackers], have explained the process in detail in a blog.

As always, you gotta share the content for those clicks and page views. Cough.

One can even decensor Hentai with Deep Neural Networks. A project humorously titled DeepCreamPy handles the dirty work so you can take care of your own business. Remember the markup tool debacle in iPhones? It was actually mostly transparent and barely hid any information, and ended up becoming a meme to slightly hide people’s names but in actuality dox them. You can see many instances of this on the /r/Tinder subreddit. The funniest part is it’s the most commonly used pen tool because Apple purposely choosing to place it front-and-center in their user interface, and giving it the widest stroke. People can’t be bothered, and end up picking the easy option. Image metadata (if not properly stripped) can geolocate you to a very accurate degree.

So that begs the question, can you do the same with faces? Sure the contrast is a lot worse but the added frames from the video might serve as a sort of interpolation? In some cases even blurring faces might be a bad idea. Just because we are unable to unblur a face today doesn’t mean we are unable in 10 or 100 years. Food for thought.

TL;DR? Be careful with everything you put up online. The age-old adage rings true:

what you put on the internet stays there forever, etched/ carved in stone.

  1. I pipe all of the images I upload online through ImageOptim first.

  2. Use the Adobe Acrobat Redaction tool. It’ll obliterate everything + the metadata.

  3. Image formats like JPEG have a thumbnail (part of the EXIF metadata) stored in them, which may not be updated when you edit the image. Also, leave generous margins. JPEGs can have compression artifacts that leak information outside the boundaries of the object. I’m sure you can sense my hatred towards this format.

  4. Are you printing a document? Follow these instructions to safeguard yourself —

    Redact -> print -> scan -> distribute.

    US courts and lawyers are finally starting to learn. Or for the paperless moguls:

    Redact -> convert to image -> convert to PDF -> distribute.

And companies?

  1. In Photoshop, the Filter > Pixelate > Mosaic option should have a checkbox called Secure? or Security Noise. Or there should be a separate filter called Pixelate > Redact. Ideally, it would use some intelligence to figure out the size of characters/ symbols in the selected area and automatically figure out the right combination of pixel size and noise.

  2. Or, Keep It Simple Stupid (KISS): flood the whole rectangle with black. If that’s the only layer of the picture, there’s no need to worry if the ‘secure noise’ is secure enough, or if it keeps staying secure two years down the line.

  3. Pixelization algorithms should implement some degree of brightness and chromatic random noise in order to defeat this new wave of attacks.

Might be overkill.

Open-source intelligence (OSINT)

Won’t share too much here because I might lose access to this tool, but there are a number of completely legal websites out there that one can use to perform KYC. The paid solutions provide a trove of data to:

  1. locate persons of interest

  2. Uncover associations between people, addresses, phones, and social handles 

  3. Determine the credibility of sources, witnesses, or suspects

  4. Track changes in historical online and offline identity information

  5. Connect personal, professional, and social information

Think of the site as a search engine for people. Social media resources like Twitter and Facebook are especially useful, wherein you can programmatically access or publicly scrape information on target users. Not using API keys grants you an additional level of security as well as the possibility of bypassing developer limitations.

TL;DR? Keep a tight check on your social media presence and old content purge regularly, after saving it somewhere. The 3-2-1 backup rule might work well here.

Please Charge Your Phone with My Wire

Hey, we’re back to iPhone accessories. Last year, at the annual Def Con hacking conference, security researcher MG introduced an Apple charging cable capable of remotely connecting to a computer. Talking to Vice, the devilish creator suggested you may even give the malicious version as a gift to the target — the cables even come with some of the correct little pieces of packaging holding them together. MG said:

It’s like being able to sit at the victims computer but without actually being there.

The cable comes with various payloads, or scripts and commands that an attacker can run on the victim’s machine. A hacker can also remotely ‘kill’ the USB implant, hopefully hiding some evidence of its use or existence.

He is selling the cables for $200 each. Pricey, but Apple has always thought differently.

TL;DR? Don’t use random people’s wires, and if you have to, don’t ‘trust’ the wire.

Back to Basics (ABC’s)

Let’s bring everything we’ve learned today, together. Once you’ve figured out the specifics of a user, all you really need to crack their password is a wordlist along with the internet’s most popular cracking tool, John the Ripper. He supports four modes:

  1. Single crack mode: Tries mangling usernames obtained from the GECOS field, and tries them as possible passwords

  2. Wordlist mode: Tries all words in the provided wordlist (see above dataset)

  3. Incremental mode (aka Brute-Force attack): Tries all possible character combinations.

  4. External mode: Optional mode in which John may use program code to generate words. One can add their own rules to drastically decrease solve time. What rules you might ask? All the exploitable details about the target user you collected (including, but not limited to, their parent’s/ pets name, birth year, school, etc)

Now, let’s create a hypothetical situation:

Using a wordlist dictionary (sourced from the internet) that has approximately 1,493,677,782 words (sized at 15GB) of the most commonly used phrases and database leak of existing user passwords, one would assume that it would take forever to crack the password and gain access. Combine this with external mode, and you’re exponentially multiplying the number of words. You’re safe right? Nope.

According to JtR benchmarks, an AWS c5a.24xlarge instance can perform ~408 billion checks per second (is it cracks? not entirely sure). Even if you multiply your standard wordlist by 100,000 it would take a little over 36 seconds* to run through the entire set. Spread and parallelize that across multiple instances and you have a formidable password cracking tool. Now, not everyone has access to a beefy c5a, which costs ~$3.696 per hour in the cloud. What about running the cracking tool on your own laptop? You’ll likely easily reach 5MM c/s on a single core, which isn’t too bad. On an 8-core machine, you’ll end up taking 43 days. Probably best to fork over that cash to our friendly neighborhood book rental company.

For the time-strapped but cash-heavy individuals, please run Jack on x1e.32xlarge and share the results. At 7 times the hourly cost of the archaic c5a, I can’t wait for an optimized program to cross 1 trillion c/s, but for someone else to front the dough.

* my math might be egregiously wrong here, I wrapped up this section at six in the morning.

Epilogue

So what can be done? Are we out of luck and is privacy dead? The famous (and somewhat obligatory) quote from Edward Snowden, the NSA whistleblower:

Arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don't care about free speech because you have nothing to say.

TL;DR? Follow the tl;dr’s and you should honestly be fine.

Further Reading

  1. Google Wallet pin code can be cracked in as little as 18 minutes— Sept ‘11

  2. Smartwatches Can Be Used to Spy on Your Card’s PIN Code — Jan ‘16

  3. Is 123456 Really The Most Common Password? by security researcher Mark Burnett.

  4. A Glimpse Into the World of Internet Password Dumps, also published by Burnett.

  5. Analyzing the Patterns of Numbers in 10 Million Passwords by Max Woolf, insightful.

  6. PasswordsCon — brilliant research papers presented here.

  7. Unmasked by WPEngine. The psychological reasons a person chooses a password.

  8. The Secret Lives of Numbers currently hosted at Turbulence. An exhaustive 2002 empirical study to determine the relative popularity of every integer between 0 and one million. The authors themselves pitch the project best:

    We surmise that our dataset is a numeric snapshot of the collective consciousness.

  9. Recovering passwords from pixelized screenshots by Sipke Mellema, creator of Depix.

  10. Why Blurring Sensitive Information is a Bad Idea by Dheera Venkatraman.

  11. Refocus-it can be used to refocus images acquired by a defocused camera, blurred by Gaussian or motion blur, or any combination of these.

  12. Why You Should Stop Using Other People’s iPhone Cables — Sep ‘2020

  13. Hashcat — advanced password recovery. World’s fastest password cracker.

  14. Privacy matters even if ‘you have nothing to hide’