# PasswordAnalysis **Repository Path**: mirrors_CamDavidsonPilon/PasswordAnalysis ## Basic Information - **Project Name**: PasswordAnalysis - **Description**: This is a description of human-created passwords using markov models - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-24 - **Last Updated**: 2026-03-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [Analysis of 14 million passwords](http://camdp.com/blogs/modeling-password-creation) ================ The coding accompanying the analysis of human-created passwords using Markovian models. See [this article](http://camdp.com/blogs/modeling-password-creation) for a detailed blog post on the subject. --------- encoding.py and EncodingScheme() -------------------------------- This module contains the *EncodingScheme* class to create computer readable data from a multinomial time series (that means it has finite support). From the docs: EncodingScheme is a class to make Markov model data out of raw data. EncodingScheme( list_of_regex_bins=[], to_append_to_end = None, garbage_bin=False ) Input: list_of_regex_bins: a list of regular expressions, as strings, representing how to "bin" the raw data. eg: [ '[0-9]', '[a-z]', '[A-Z]' ] Notes: -Try not to overlap bins, as it will bin the item into the first bin. -To specify all unique bins, leave the list empty. -An exception is thrown if a item is not able to be binned and garbage_bin is false to_append_to_end: if the series data is not the same length ( eg: password data), this specifies what to append to end before performing analysis. If not needed, leave as None. This is still buggy. garbage_bin: a boolean to include a garbage bin, ie a bin that collects everything not collected. Notes: having garbage_bin to True is pretty much useless if all unique bins is set, ie. having list_of_regex_bins = [] attributes: self.unique_bins: a dictionary of the bins used to encode the data and the encode mapping. self.realized_bins: a dictionary of the bins used to encode the realized data values that satisfy the bins. Useful for debugging and seeing what garbage is collected with realized_bins['garbage'] Methods: encode(raw_data): returns the encoded data as a generator mulitnomialMM.py and MultinomialMM() ------------------------------------ From the docs: Create and learn a multinomial Markov model MultinomialMM( encoding=None ) Input: encoding (optional): a EncodingScheme class that will process the data prior to fitting. If no scheme is given, and the data is inputed without encoding, a default encoding will be used (all unique binning). Attributes: self.data: the data used to fit the model self.unique_elements: the found unique elements of the data self.init_probs_esimate: the probability vector of inital emissions self.trans_probs_estimate: the transmission probability matrix of going from emission [row] to emission [col]. Methods: self.fit(data, encoded=True) self.sample( n=1) self.decoded_sample(n=1)