Passwords - Part 2 of 2 - Leaked passwords
Monday, June 14. 2021
This is the sencond part in my passwords-series. It is about leaked passwords. See the previous one about passwords in general.
Your precious passwords get lost, stolen and misplaced all the time. Troy Hunt runs a website Have I been Pwned (pwn is computer slang meaning to conquer to gain ownership, see PWN for details). His service typically tracks down your email addresses and phone numbers, they leak even more often than your passwords, but he also has a dedicated section for passwords Pwned Passwords. At the time of writing, his database has over 600 billion (that's 600 thousand million) known passwords. So, by any statistical guess, he has your password. If you're unlucky, he has all of them in his system. The good thing about Mr. Hunt is, he's one of the good guys. He wants to educate and inform people about their information being leaked to wrong hands.
8.4 billion leaked passwords in a single .txt-file
Even I have bunch of leaked and published sets of passwords. Couple days ago alias kys234 published a compilation of 8.4 billion passwords and made that database publicly available. More details are at RockYou2021: largest password compilation of all time leaked online with 8.4 billion entries and rockyou2021.txt - A Short Summary. Mr. Parrtridge even has the download links to this enormous file. Go download it, but prepare 100 GiB of free space first. Uncompressed the file is huge.
How long are the leaked passwords?
From those two articles, we learn that there are plenty of passwords to analyse. As I wrote in my previous post, my passwords are long. Super-long, as in 60-80 characters. So, I don't think any of my passwords are in the file. Still, I was interested in what would be the typical password lenght.
Running a single-liner (broken to multiple lines for readability):
perl -ne 'chomp; ++$cnt; $pwlen=length($_);
if ($lens{$pwlen}) {++$lens{$pwlen};} else {$lens{$pwlen}=1;}
END {printf("Count: %d", $cnt); keys(%lens);
while(my($k, $v) = each(%lens)) {printf("Len %d: %d\n", $k, $v);}}'
rockyou2021.txt
Will result in following:
Count: 8459060239
Len 6: 484236159
Len 7: 402518961
Len 8: 1107084124
Len 9: 1315444128
Len 10: 1314988168
Len 11: 1071452326
Len 12: 835365123
Len 13: 613654280
Len 14: 436652069
Len 15: 317146874
Len 16: 215720888
Len 17: 131328063
Len 18: 97950285
Len 19: 65235844
Len 20: 50282947
Visualization of the above table:
It would be safe to say, typical password is 9 or 10 characters short. Something a human being can remember and type easily into a login prompt.
Based on leaked material, how long a password should be?
The next obvious question is: Well then, if not 10 characters, how long the password should be?
Instant answer is: 21 characters. The file doesn't contain any of those.
Doing little bit of statistical analysis: If you're at 13 characters or more, your password is in the top-25%. At 15 or more, youre in top-7%. So, the obvious thing is to aim for 15 characters, no less than 13.
Given the lack of super-long passwords, I went a bit further with Rstudio. I went for the assumption, the password lenghts would form a gaussian bell curve. I managed to model the data points into a semi-accurate model which unfortunately for me is more inaccurate at the 18, 19, 20 characters than with the shorter ones.
If you want to improve my model, there is the human-readable HTML-version of R notebook. Also the R MD-formatted source is available.
Red line is the actual measured data points. Blue bars are what my model outputs.
Result is obvious: longer is better! If you're at 30 characters or more, your passwords can be considered unique. Typical systems crypt or hash the passwords in storage, making it is not feasible to brute-force a 30 char password. Also the reason why leaked RockYou2021 list doesn't contain any password of 21 or more characters: THEY ARE SO RARE!
Looks like me going for 60+ chars in my passwords is a bit over-kill. But hey! I'm simply future-proofing my passwords. If/when they leak, they should be out of brute-force attack, unless a super-weak crypto is used.
Wrap up
The key takeaways are:
- Password, a memorized secret is archaic and should be obsoleted, but this cannot be achieved anytime soon.
- Use password vault software that will suit your needs and you feel comfortable using.
- Never ever try to remember your passwords!
- Make sure to long passwords! Any password longer than 20 characters can be considered a long one.