Initial obtaining of user data.

nopraw
David Trail 13 years ago
parent f8a66beb6b
commit 0308e02608

1
.gitignore vendored

@ -0,0 +1 @@
data.json

@ -6,8 +6,12 @@ Details
When one deletes their account on Reddit it does nothing with their comment history other than When one deletes their account on Reddit it does nothing with their comment history other than
obscure the author (replaces with [deleted]) which is not good enough for some people. obscure the author (replaces with [deleted]) which is not good enough for some people.
Usage
-----------
python2 shreddit.py UserName
Caveats Caveats
---------- -----------
- Only your previous 1,000 comments are accessable on Reddit. So good luck deleting the others. There may be ways to hack around this via iterating using sorting by top/best/controversial/new but for now I am unsure. - Only your previous 1,000 comments are accessable on Reddit. So good luck deleting the others. There may be ways to hack around this via iterating using sorting by top/best/controversial/new but for now I am unsure.
- Would make life easier if Reddit just did a "DELETE * FROM abc_def WHERE user_id = 1337" - Would make life easier if Reddit just did a "DELETE * FROM abc_def WHERE user_id = 1337"

@ -0,0 +1,46 @@
#!/usr/bin/env python2
import sys
from json import loads, dumps
from urllib2 import urlopen
from time import sleep
if len(sys.argv) != 2:
raise Exception("You must specifiy a user")
user = sys.argv[1]
sub_section = 'comments'
after = ''
init_url = 'http://www.reddit.com/user/{user}/comments/.json?after=%s'.format(user=user)
next_url = init_url % after
http = urlopen(next_url).read()
json = loads(http)
datum = []
while True:
print "Grabing IDs for after: ", after
after = json['data']['after']
children = json['data']['children']
# This bit fills datum with the id (for removal) and the date (for saving recent posts)
for child in children:
child_data = child['data']
if 'id' in child_data:
datum.append({'id': child_data[u'id'], 'date': child_data['created_utc']})
if after == None:
break
next_url = init_url % after
http = urlopen(next_url).read()
json = loads(http)
sleep(2) # don't want to hammer reddit to hard
print "Script collected all available data."
f = open('data.json', 'w')
f.write(dumps(datum))
f.close()
Loading…
Cancel
Save