Do you think I could just leave this part blank and it'd be okay? We're just going to replace the whole thing with a header image anyway, right?
You are not logged in.
Data Set | Size | Torrent
November 2015 | 5.2 GiB |https://mega.nz/#!odAxwbQA!Fc8PTZpzCqrmLToU02z8vSlX4Ba_2NIcm6l1Ip2FmfQ
December 2015 | 4.8 GiB |
magnet:?xt=urn:btih:8a6100df38765c94289b129c4477d0a44f8fa9b2&dn=2015-12&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80&ws=http%3a%2f%2fweb.lrussell.net%2f~hexagon%2f2015-12%2f
January 2016 | 4.0 GiB |
https://mega.nz/#!5Vg0ETLT!OzA5Ec_iMq5krxVD0lo_89zOVnjLrO6HwaGwwzxwLGY
February 2016 | 4.2 GiB |
https://mega.nz/#!FJwz2DrB!1giLyyuQSikWPVzawGsIxTVslfcXvExlccnnHwYW944
It's very important for the history of EE to be archived--most of the funny chats, worlds, lobby/players online, usernames, player movements, and much more. These records provide the social fabric for EE, during the highs and lows. Unfortunately, the time has come and the space available for EE Analytics is shrinking. After going through a couple of hard drives and some SD cards, I have only one copy of the data, and not much space left.
I'd like to upload the data to archive.org, and am asking permission from the community to upload the data as-is (plaintext with usernames), because encrypting the data would mean that it could get removed. The usernames could be stripped, but that would mean that anyone else who would like to use it would not be able to customize an experience for a specific user. Another benefit is that the community can do what they want with the data, like making visualizations and predictions.
Offline
User Generated Material
In this agreement you give Everybody Edits the right, without special compensation, to publish the Worlds, Games and Material you make for an unlimited time throughout the world on the internet, on TV or any other medium. Thus you give Everybody Edits the right to use, copy, present, reproduce, display, edit, integrate, license as well as distribute the Games and Material which you have created, for both commercial and noncommercial use.
I think EE servers should be in charge of that data not archive.org
Everybody edits, but some edit more than others
Offline
Remove old data and replace it with newer data.
I have never thought of programming for reputation and honor. What I have in my heart must come out. That is the reason why I code.
Offline
Do you really think the data is worth pushing to archive.org?
It doesn't really have much historical significance...
Offline
Remove old data and replace it with newer data.
I'll have to resort to this if there's no more options left. It removes the historical value, though.
Do you really think the data is worth pushing to archive.org?
It doesn't really have much historical significance...
Well, depending on how long EE lives on for it could be historical in many many years from now. It's not really historical to society at large but historical to the EE community.
User Generated Material
In this agreement you give Everybody Edits the right, without special compensation, to publish the Worlds, Games and Material you make for an unlimited time throughout the world on the internet, on TV or any other medium. Thus you give Everybody Edits the right to use, copy, present, reproduce, display, edit, integrate, license as well as distribute the Games and Material which you have created, for both commercial and noncommercial use.I think EE servers should be in charge of that data not archive.org
I'd be happy to give it to one/some of the moderators/admins. The TOS just says that EE has complete control over your data, and they can pretty much do whatever they want with it. An interesting section is:
You may not use any of Everybody Edits material, such as avatars, blocks, images, logos, icons, audio files etc. for commercial purposes outside of the Everybody Edits website. This means that you may not copy, distribute, sell, publish, send or otherwise recirculate Everybody Edits material to a third party without the written consent of Everybody Edits. You may not change, revise or replace any of the material found in the game or website, either in its entirety or parts thereof.
Since I'm not selling the data, and the website is non-profit (i.e non-commercial) it could be okay, as it's user-generated material (which the TOS doesn't say anything about recirculating.)
Offline
How much space does this data require? I have a NAS drive that I'm planning on upgrading. I'm just not sure how my ISP would feel about say... a terabyte of data being downloaded.
Offline
How much space does this data require? I have a NAS drive that I'm planning on upgrading. I'm just not sure how my ISP would feel about say... a terabyte of data being downloaded.
It's only like 1.5TB (I think) uncompressed so it's not too bad. If you could store it that would be awesome!
Offline
I'm not sure why you recorded and want this data to be posted. Could you tell us whats this massive data consists in?
Everybody edits, but some edit more than others
Offline
I'm not sure why you recorded and want this data to be posted. Could you tell us whats this massive data consists in?
I'm very interested in data and statistics, which is why I recorded it and I'd like to post it so that other people can use it (as it's just sitting on my harddrive, unable to be accessed) and I don't really have enough space left.
It consists of (roughly):
- (95%) World event data (five or so of the most popular worlds, determined on a minute basis). Event data is everything that your EE client receives when you join a room (i.e *everything*--movements, blocks, usernames, join, init, left, potions, zombie, trolling, chats, but nothing private since clients don't get that data.) This data is from August 2014 to present, recording virtually 24/7 since then
- (1%) Old EEforum posts and user avatars; EE user profiles
- (1%) lobby data (on a per-second basis) and some corrupted lobby data that I couldn't figure how to decompress
- (0.5%) a ping log to playerio, and website logs
- (0.49%) a bunch of EE worlds
- (2%) some world data that is in an extremely bloated XML format (not sure if I still have it)
- (0.05%) almost every EE swf file from August 2014, recorded every six hours, except I'm missing one for December 2015 I think. Probably not 0.05% but something very small
- (0.05%) Source code for many projects, including unreleased ones, for EE. Also block data for those
- (?%) lrussell's eeindex, preprocessed data, server vm, other third-party EE related source code, png minimaps of thousands of worlds, programatically generated EE chat screenshots
Offline
I bet you have some kind of google glass and record every moment of your life.
Or you are more interested in others?
Everybody edits, but some edit more than others
Offline
I personally would only ever be interested in:
• Chats;
• versions of worlds;
• forum archives (which for the record isn't yet being archived by the awesome way back machine (archive.org));
• I guess EE projects.
The events I personally think might only be fun for the block placement events, so that players can request a speedbuild of rooms.
Old EE versions aren't interesting, as you can't use them, due to them being dependent on the server, which will block them due to an outdated version.
I personally think a lot can be thrown of, but far from everything, but that's my opinion.
Offline
lrussell wrote:How much space does this data require? I have a NAS drive that I'm planning on upgrading. I'm just not sure how my ISP would feel about say... a terabyte of data being downloaded.
It's only like 1.5TB (I think) uncompressed so it's not too bad. If you could store it that would be awesome!
How much is it compressed?
"Sometimes failing a leap of faith is better than inching forward"
- ShinsukeIto
Offline
So are you the one behind those bot accounts randomly connecting to worlds?
For the most part, yeah.
There was a period of time where there was other random accounts connecting to worlds for a couple of days, but that was a long time ago. That wasn't me.
Offline
Zumza wrote:I'm not sure why you recorded and want this data to be posted. Could you tell us whats this massive data consists in?
I'm very interested in data and statistics, which is why I recorded it and I'd like to post it so that other people can use it (as it's just sitting on my harddrive, unable to be accessed) and I don't really have enough space left.
It consists of (roughly):
- (95%) World event data (five or so of the most popular worlds, determined on a minute basis). Event data is everything that your EE client receives when you join a room (i.e *everything*--movements, blocks, usernames, join, init, left, potions, zombie, trolling, chats, but nothing private since clients don't get that data.) This data is from August 2014 to present, recording virtually 24/7 since then
- (1%) Old EEforum posts and user avatars; EE user profiles
- (1%) lobby data (on a per-second basis) and some corrupted lobby data that I couldn't figure how to decompress
- (0.5%) a ping log to playerio, and website logs
- (0.49%) a bunch of EE worlds
- (2%) some world data that is in an extremely bloated XML format (not sure if I still have it)
- (0.05%) almost every EE swf file from August 2014, recorded every six hours, except I'm missing one for December 2015 I think. Probably not 0.05% but something very small
- (0.05%) Source code for many projects, including unreleased ones, for EE. Also block data for those
- (?%) lrussell's eeindex, preprocessed data, server vm, other third-party EE related source code, png minimaps of thousands of worlds, programatically generated EE chat screenshots
Here's an idea. If you get rid of that 95% you'll save a lot of space.
But seriously though that 95% seems entirely worthless, unless it partially includes some world storage. Keeping records of chat, movement, and block placements isn't worth much besides some line graphs. I don't think people will be so keen on releasing chat records either, and this has been strongly argued after ninjasup's huge chat-recording dilemma. I know you spent a lot of time hoarding it but it's time to let it go man.
Offline
I would archive it.
EDIT: I mean make it public to keep the data
Offline
Zumza wrote:I'm not sure why you recorded and want this data to be posted. Could you tell us whats this massive data consists in?
I'm very interested in data and statistics, which is why I recorded it and I'd like to post it so that other people can use it (as it's just sitting on my harddrive, unable to be accessed) and I don't really have enough space left.
It consists of (roughly):
- (95%) World event data (five or so of the most popular worlds, determined on a minute basis). Event data is everything that your EE client receives when you join a room (i.e *everything*--movements, blocks, usernames, join, init, left, potions, zombie, trolling, chats, but nothing private since clients don't get that data.) This data is from August 2014 to present, recording virtually 24/7 since then
- (1%) Old EEforum posts and user avatars; EE user profiles
- (1%) lobby data (on a per-second basis) and some corrupted lobby data that I couldn't figure how to decompress
- (0.5%) a ping log to playerio, and website logs
- (0.49%) a bunch of EE worlds
- (2%) some world data that is in an extremely bloated XML format (not sure if I still have it)
- (0.05%) almost every EE swf file from August 2014, recorded every six hours, except I'm missing one for December 2015 I think. Probably not 0.05% but something very small
- (0.05%) Source code for many projects, including unreleased ones, for EE. Also block data for those
- (?%) lrussell's eeindex, preprocessed data, server vm, other third-party EE related source code, png minimaps of thousands of worlds, programatically generated EE chat screenshots
I'd be willing to store all of it compressed, however transferring that much over my internet connection isn't doable. I can take everything except the World Event data (which isn't even useful, really). My current VPS has 2TB of bandwidth if you want me to give you credentials to upload it there to distribute to others, but it only has ~16GB of HDD space left.
Offline
I have an idea, but it's a little "overkill":
- I have a world file, and it contains all of the events in the world from a certain time period.
- I duplicate that file, and remove the sensitive portions of the chat (i.e usernames, maybe more)
- Then, I create a binary diff between the two files, keeping the diff on my HD and upload the "sanitized" version to archive.org
Now, I have a very small file which is able to completely reconstruct the file downloaded from archive.org. If someone downloaded the public file, all they would really get is a bunch of block commands with opaque ids. This means that I can save ~48x as much data.
It is trivial to take a screenshot of the chat and post it everywhere (i.e put it in your user signature) but I do understand that chats might be private.
Offline
Different55 wrote:How much is it compressed?
About 150GB, which is very manageable however I have a lot of my own stuff, which takes up a lot of space.
In that case I can definitely save a few compressed copies with my regular backups. Make a torrent to host it and I will download and seed whenever I can.
"Sometimes failing a leap of faith is better than inching forward"
- ShinsukeIto
Offline
Hexagon wrote:Different55 wrote:How much is it compressed?
About 150GB, which is very manageable however I have a lot of my own stuff, which takes up a lot of space.
In that case I can definitely save a few compressed copies with my regular backups. Make a torrent to host it and I will download and seed whenever I can.
Actually, a torrent wouldn't be a bad idea. It would make the overall download speed faster because of multiple seeders/leechers. I suppose I could store the full data-set if I downloaded it in bursts to not scare my ISP (150 GB in 6 hours). I'm upgrading my NAS to 2TB of space so it's doable for me I suppose. But it would probably be better to have a "base" data-set that's accessible to more people. Perhaps there's a way to compress it more, 7-Zip Ultra maybe?
Offline
In that case I can definitely save a few compressed copies with my regular backups. Make a torrent to host it and I will download and seed whenever I can.
Sounds good, I'll begin the process of making one as soon as I get things cleaned up. Since I'm giving this data to a few people, I might upload it to archive.org first then use them as a tracker if that's okay.
I suppose I could store the full data-set if I downloaded it in bursts to not scare my ISP (150 GB in 6 hours). I'm upgrading my NAS to 2TB of space so it's doable for me I suppose. But it would probably be better to have a "base" data-set that's accessible to more people. Perhaps there's a way to compress it more, 7-Zip Ultra maybe?
I could give 7z ultra a shot, and see if that works. I'd have to tar it though first. Do you mean a base set by splitting it up the data into smaller chunks?
EDIT: zipping with 7z looks like I can get it under ~120GB, which looks good. I can compress it with xz on extreme for about ~107GB, however it will take ~35 days. I'll have to start zipping shortly.
Offline
Different55 wrote:In that case I can definitely save a few compressed copies with my regular backups. Make a torrent to host it and I will download and seed whenever I can.
Sounds good, I'll begin the process of making one as soon as I get things cleaned up. Since I'm giving this data to a few people, I might upload it to archive.org first then use them as a tracker if that's okay.
lrussell wrote:I suppose I could store the full data-set if I downloaded it in bursts to not scare my ISP (150 GB in 6 hours). I'm upgrading my NAS to 2TB of space so it's doable for me I suppose. But it would probably be better to have a "base" data-set that's accessible to more people. Perhaps there's a way to compress it more, 7-Zip Ultra maybe?
I could give 7z ultra a shot, and see if that works. I'd have to tar it though first. Do you mean a base set by splitting it up the data into smaller chunks?
EDIT: zipping with 7z looks like I can get it under ~120GB, which looks good. I can compress it with xz on extreme for about ~107GB, however it will take ~35 days. I'll have to start zipping shortly.
By base set I mean the most important data. 107 GB is still out of reach for people who want to help keep the data safe. I'd say it should be no more than 15 GB. Knowing what block someone placed, when, and where or if they used the Grinch smiley isn't all that important. It might be fun if someone made a "player" for it, showing everything as it happened. But did you store the timing between each message to do even that?
Offline
By base set I mean the most important data. 107 GB is still out of reach for people who want to help keep the data safe. I'd say it should be no more than 15 GB. Knowing what block someone placed, when, and where or if they used the Grinch smiley isn't all that important. It might be fun if someone made a "player" for it, showing everything as it happened. But did you store the timing between each message to do even that?
I agree that 107GB is still very large, and even if I could compress it down to 1GB, nobody would be able to do anything with it because it would expand to 1.5TB in order to read it. What I could is split them into months, or even into days in gzip (as each month compressed is about 10GB, days compressed are about 250MB, which is manageable.) As long as I get enough volunteers to host a section of the data, the entire data set can be reassembled.
Most of the data is move events, and I could remove them (or compressed them separately in a numerical format) to see if I can get down the file sizes a bit more. As with the timing between each message, it was recorded however a fatal error in my implementation for at least a couple of months means that some timings are off by a lot (when played back, it will look like time is slowing down during high activity and then suddenly speed up when the activity stops.)
Offline
[ Started around 1732401312.63 - Generated in 0.180 seconds, 10 queries executed - Memory usage: 1.81 MiB (Peak: 2.08 MiB) ]