If any of our hashes are in this database, it means it was "leaked", and we will call the corresponding password weak. Our goal is to force users to use a unique password for their work account.
Download and unpack the database (as of February 2021, it is ~ 22 GB, but it is constantly growing). The first thing to do is to convert all the hashes to low case and get rid of the counter (i.e. what was 0717B19E4348445872D8BB57D5E562B7: 14, will become 0717b19e4348445872d8bb57d5e562b7). This is done with the following command:
cat pwned-passwords-ntlm-ordered-by-hash-v7.txt | cut -d":" -f 1 | awk '{print tolower($0)}' > hash_db.lst
In 5-7 minutes, you should get a sorted hash file dump in the required format.
From this dump, you also need to leave only sorted unique hashes:
cat hashes_sorted.lst | cut -d":" -f 2 | sort -u > hashes_only_sorted.lst
Now we need to compare two arrays of hashes, one of which is ~ 20 GB.
On my laptop, this turned out to be an overwhelming task. Python ate all the memory and did nothing. Then I had to split hash_db into 3-5 GB pieces, and things got off the ground.
Thus, the algorithm looks like this now: we split a 20 GB file into 3-5 GB pieces (multiples of 33 bytes so that all hashes remain intact). We compare each piece in pairs with the array of hashes from the dump using the Python set.intersection. The output will be a file with the intersection of the sets of hashes from the dump and leaked passwords database.
We can see a small python script that does all of the above
here.
It takes 5-20 minutes on an average computer. It is possible, of course, to optimise it, but why? Just enjoy making a cup of tea or coffee and a sandwich while the script works.
I take the resulting file into my excel and filter it by the enabled_account flag. For clarity, I take a few of the company's popular leaked hashes and transform them into passwords through sites like hashes.com or crackstation.net. It is for illustration purposes only but is not the whole projects' goal. We are good guys, and hashes are good enough for us.
The second half is also ready. Now the fun begins.