I've seen several different approaches to dictionary generators over time, and this was one method that struck me that it didn't seem like anyone else had used. If someone else has used this method, I apologize, you deserve full credit, but I came up with this particular version completely independently. I first wrote it in JavaScript (if you want that version, feel free to leave me a comment, but it was slow as fuck), then I ported it to Python for better speed. I even wrote comments for once, so that you can understand what I did (it's not very hard to understand, I just need to get into the habit). Enjoy it
def generate(size):
f = file(raw_input("Output file: "), "w") # Set the output file
y = [] # Create an empty list
z = True # Set the toggle boolean to "True" (use the x list)
charset = "0123456789abcdefghijklmnopqrstuvwxyz" # Charset for generating dictionary
charlen = len(charset) # Length of charset
x = list(charset) # Make a list with all the characters in the charset string
for i in xrange(charlen): f.write(x[i] + "\\n") # Write the character set to the output file
for i in xrange(1,size+1): # Loop until you have generated to the length specified
if z == False: # If the toggle boolean is "False", append to y
xlen = len(x) # Length of list x
for i in xrange(0,charlen): # Loop through all the characters in charset
for c in xrange(0,xlen): # Loop through all the items in x and prefix them with the appropriate character
y.append(charset[i] + x[c]) # Append results to y
f.write(charset[i] + x[c] + "\\n") # Write the results to file
x = [] # Clear out list x
else: # If the toggle boolean is "True", append to x
ylen = len(y) # Length of list y
for i in xrange(0,charlen): # Loop through all the characters in charset
for c in xrange(0,ylen): # Loop through all the items in y and prefix them with the appropriate character
x.append(charset[i] + y[c]) # Append results to x
f.write(charset[i] + y[c] + "\\n") # Write the results to file
y = [] # Clear out list y
z = z == False # Flip the toggle boolean
f.close() # Close the output file
generate(4) # Call with largest character length you want to generate




November 26th, 2007 at 4:05 pm
RUBY IS BETTAR
November 26th, 2007 at 7:28 pm
I'LL KILL YOU, BETCH
December 27th, 2007 at 4:44 pm
(sorry admin for the multiple posts, I'm trying to get the code to be formatted correctly otherwise it won't work)
You should find this code faster, especially when your maximum word length is greater than 4.
I hope the formatting comes out ok.
def basencount(n, size):
r=[0]
i=-1
while i*-1 <= size:
yield r
i = -1
r[i] += 1
while r[i] >= n:
try:
r[i-1] += r[i]/n
except IndexError:
r.insert(0,1)
r[i] %= n
i -= 1
def gendict(ch, s):
c=len(ch)
for r in basencount(c,s):
print "".join([ch[n] for n in r])
if __name__ == "__main__":
import string
charset = string.digits + string.lowercase
gendict(charset, 4)
December 28th, 2007 at 11:12 pm
Very nice, and much shorter than mine. Here's another approach I've seen (link):
def names(length, prefix=''): alphabet = tuple('abcdefghijklmnopqrstuvwxyz') for char in alphabet: print prefix + char if length - 1: names(length - 1, prefix + char) names(5)