Nov 24

I've seen several different approaches to dictionary generators over time, and this was one method that struck me that it didn't seem like anyone else had used. If someone else has used this method, I apologize, you deserve full credit, but I came up with this particular version completely independently. I first wrote it in JavaScript (if you want that version, feel free to leave me a comment, but it was slow as fuck), then I ported it to Python for better speed. I even wrote comments for once, so that you can understand what I did (it's not very hard to understand, I just need to get into the habit). Enjoy it :)

def generate(size):
	f = file(raw_input("Output file: "), "w") # Set the output file
	y = [] # Create an empty list
	z = True # Set the toggle boolean to "True" (use the x list)
	charset = "0123456789abcdefghijklmnopqrstuvwxyz" # Charset for generating dictionary
	charlen = len(charset) # Length of charset
	x = list(charset) # Make a list with all the characters in the charset string
	for i in xrange(charlen): f.write(x[i] + "\\n") # Write the character set to the output file
	for i in xrange(1,size+1): # Loop until you have generated to the length specified
		if z == False: # If the toggle boolean is "False", append to y
			xlen = len(x) # Length of list x
			for i in xrange(0,charlen): # Loop through all the characters in charset
				for c in xrange(0,xlen): # Loop through all the items in x and prefix them with the appropriate character
					y.append(charset[i] + x[c]) # Append results to y
					f.write(charset[i] + x[c] + "\\n") # Write the results to file
			x = [] # Clear out list x

		else: # If the toggle boolean is "True", append to x
			ylen = len(y) # Length of list y
			for i in xrange(0,charlen): # Loop through all the characters in charset
				for c in xrange(0,ylen): # Loop through all the items in y and prefix them with the appropriate character
					x.append(charset[i] + y[c]) # Append results to x
					f.write(charset[i] + y[c] + "\\n") # Write the results to file
			y = [] # Clear out list y
		z = z == False # Flip the toggle boolean
	f.close() # Close the output file
generate(4) # Call with largest character length you want to generate
  • Digg
  • StumbleUpon
  • del.icio.us
  • Reddit

4 Responses

  1. mc0 Says:

    RUBY IS BETTAR

  2. admin Says:

    I'LL KILL YOU, BETCH

  3. Stephen Paulger Says:

    (sorry admin for the multiple posts, I'm trying to get the code to be formatted correctly otherwise it won't work)

    You should find this code faster, especially when your maximum word length is greater than 4.

    I hope the formatting comes out ok.

    def basencount(n, size):
        r=[0]
        i=-1
        while i*-1 <= size:
            yield r
            i = -1
            r[i] += 1
            while r[i] >= n:
                try:
                    r[i-1] += r[i]/n
                except IndexError:
                    r.insert(0,1)
                r[i] %= n
                i -= 1

    def gendict(ch, s):
        c=len(ch)
        for r in basencount(c,s):
            print "".join([ch[n] for n in r])

    if __name__ == "__main__":
        import string
        charset = string.digits + string.lowercase
        gendict(charset, 4)

  4. admin Says:

    Very nice, and much shorter than mine. Here's another approach I've seen (link):

    def names(length, prefix=''):
    	alphabet = tuple('abcdefghijklmnopqrstuvwxyz')
    	for char in alphabet:
    		print prefix + char
    		if length - 1:
    			names(length - 1, prefix + char)
    
    names(5)
    

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.