Jump to content
null

Downloading your posts

Recommended Posts

null

There have been a few people interested in this so I thought I would create a thread that shares how some of us have managed to do it. I know @laridae has a Powershell script you can run and hopefully she will repost it here.

Another more manual way to do it for those who want to understand what they are running ...

  • Run Chrome on a desktop PC
  • Create an activity stream in Essential Baby with all your posts from a specific date range. Start with a year at a time because I don't know how Chrome will go loading 50000 or so posts.
  • Click on the Activity tab
  • Click on My Activity Streams
  • Click on Create New Stream
  • Set the ownership to your username
  • Set the time range you want to download
  • Click Save Changes
  • Give the stream a title like "My posts from 2011"
  • Click Save Changes
  • Now for the hard bit, right-click on the page showing the result of the stream and select Inspect
  • You are now presented with a programming interface to allow you to manipulate the page. Please note you should generally not run code you don't understand in the console!
  • In the Console tab of the developer view, paste the following code ... 
async function clickity() { ct=0;
while ($("[data-action=loadMore").length > 0) {  
    await new Promise(r => setTimeout(r, 1000));   
    console.log("clicking " + ct++);
    $("[data-action=loadMore]").click(); 
}; console.log("FINISHED"); }; clickity(); 
  • The above code will look for the "Load More Activity" button and while it finds it in the page, it will wait for 1 second, log in the console that it is going to click on it, then click it. It will complete once the button is no longer found (i.e. the end of the date range).
  • Next run the following code to replace the snippets with the full text ...
async function subsy() { ct=1; total=$("a[data-searchable]").length; $("a[data-searchable]").each(async function(i, item){
        let index = ct;
        let href = $(this).attr('href');
        let comment = (new URLSearchParams(href)).get('comment');
        await new Promise(r => setTimeout(r, 1000*ct++));
        console.log("Getting " + index + " of " + total + " posts; " + href);
        let snippet = $('.ipsType_richText', $(this).closest('li'))
        fetch(href).then(function(response) { return response.text();}).then(function(html) {
             let parser = new DOMParser();
             let doc = parser.parseFromString(html, "text/html");   
             $(snippet).html($('#comment-'+comment+"_wrap", doc).html());
             console.log("Replaced with full comment identifier " + comment);
        });
});}; subsy();
  • Once the script has completed, save the page by right-clicking on the page.
  • Click Save page as
  • Enter the name of the page (e.g. MyPostsFrom2009.mhtml)
  • Select Webpage, Single file
  • Click Save

If you have any easier suggestions then feel free to share. I can write a script to do it but I thought people might be wary of running code written by a random (and rightly so).

Edited by null
Having trouble with forbidden errors while writing the post and added progress message. Also made changes so it can work with Firefox.
  • Like 5
  • Thanks 3

Share this post


Link to post
Share on other sites
laridae

Here is my script.  Nothing sinister in it (its pretty simple really) and it opens an IE window so you can see it at work anyway.

 

 Its not that hard to use, but you do need a Windows PC (not a Mac) as its powershell.  So, what you do is you open Windows Powershell ISE, and paste this (everything from the # to the end) into the top box (its white on mine, not the blue one).

It uses internet explorer to extract the data, so you do need to open that first and log into EB (make sure you tick the box to save your login details)

There are a few settings to change at the top.  The path to save the files to, your userid, and what page number to extract (I suggest just trying it with 1 page to make sure it works first). The click the run button.

Powershell may complain about running it though - there is a setting you need to switch on when you first use it to allow you to run unsigned scripts or something but i can't remember what it is.  Just google the error and you should be able to find out how if it doesn't say.

I did find that a few pages didn't extract properly - but if you have a look through the log (in the blue pane at the bottom) for errors and re-try those pages.  So if page 255 fails you can just set maxpages = 255 and startpage to 255 and it will just do that page.  Don't post anything while you are running it though or it will end up on a different page and miss stuff.

Also, keep in mind, if you have a lot of posts, it could take a while - it took nearly a whole day to extract out mine.  

 

# note - PC only and you must first open Internet Explorer and log in and save your credentials for EB.  edit the following values
# path to save files to
$savepath = "D:\EBExtractor\"
# userid - this can be found on if you click on your profile and is the format usernumber-username
$user = "179977-zzzzzz"
# number of pages of history (25 posts per page) - check your profile for how many pages,
$maxpages = 1
# page to start at (in case you need to start again)
$startpage = 1

$count=$startpage
$link = [System.Collections.ArrayList]@()
$ie = new-object  -com "InternetExplorer.Application"
while ($count -le $maxpages)
{
    "page " + $count

    $ie.navigate("https://www.essentialbaby.com.au/forums/profile/" + $user + "/content/?all_activity=1&page=" + $count)
    $ie.visible=$true
    $link.Clear()
    while($ie.Busy) { Start-Sleep -Milliseconds 100 }
    try {
        $rand = Get-Random -minimum 1000 -maximum 2000
        Start-Sleep -Milliseconds $rand
        $doc = $ie.Document
        $doc.Links | Where-Object {$_.href -like "*findComment*" -and $_.className -ne "ipsType_blendLinks"}| ForEach-Object {$link.Add($_.href)}
        $count=$count+1
    }
    catch {
        "Something strange occurred: $_"
        try {$ie.Quit() } catch{}
        $ie = new-object  -com "InternetExplorer.Application"
    }

    $link | ForEach-Object {
        $path = $savepath + $_.replace("https://www.essentialbaby.com.au/forums/topic/","").replace("/?do=findComment&comment=","-") + ".txt"
    
        $path
        $ie.navigate($_)
        while($ie.Busy) { Start-Sleep -Milliseconds 100 }
        $rand = Get-Random -minimum 1000 -maximum 2000
        Start-Sleep -Milliseconds $rand
        try {
            $ie.Document.activeElement.outertext | Out-File -FilePath $path -Append
        }
        catch {
            "Something strange occurred: $_"
            try {$ie.Quit() } catch{}
            $ie = new-object  -com "InternetExplorer.Application"
        }
     }
}


$ie.Quit() 

  • Like 2

Share this post


Link to post
Share on other sites
DirtyStreetPie

@null Ooooh I just love it when you get all codey! I'm jealous - actually jealous! - that my Javascript is not up to your standard. Is this jQuery? I dabbled a tiny bit about a year ago, but ended up focusing on 'vanilla' JS.

I paused my Javascript learning before I made it to Promises, so I've never even done that before. It's cool seeing a real example of it in the wild. :)

Anyway, back to the issue at hand, it never occurred to me to use JS to fetch posts. A few days ago, I looked into creating a Python script using Scrapy, a library I've never used before, but I figured I wouldn't have time to figure out the library, write the script and test it out on time hehe. ETA: We now have all of November, but I'm worried about scraping password-protected areas, so I'm not doing anything.

Oh, and I see @laridae has contributed her script! So exciting to see different solutions. :D

Edited by DirtyStreetPie
  • Like 2

Share this post


Link to post
Share on other sites
laridae

@null did you check that all the posts are there in full?  I had a look and it looks like its only saving a few lines of the longer posts if you save it as an mhtml.  Though I was using Edge, not Chrome

Edited by laridae
  • Like 1

Share this post


Link to post
Share on other sites
LouCu3

I found this too @laridae

@null - I am totally not codey so if you can work out a way to get the posts to display in full before I save that would be fab!

  • Like 1

Share this post


Link to post
Share on other sites
null

Okay good pick up there Laridae and LouCu3. I modified the original post with a second stage that will replace the snippets with the full post. Let me know how you go!

Edited by null

Share this post


Link to post
Share on other sites
null
10 hours ago, DirtyStreetPie said:

@null Ooooh I just love it when you get all codey! I'm jealous - actually jealous! - that my Javascript is not up to your standard. Is this jQuery? I dabbled a tiny bit about a year ago, but ended up focusing on 'vanilla' JS.

I paused my Javascript learning before I made it to Promises, so I've never even done that before. It's cool seeing a real example of it in the wild. :)

I am not really a Javascript programmer - I just use stackoverflow.com until I get something that works! 😄 This what happens in the office I work in when stackoverflow is down ...

giphy.gif  

  • Like 2
  • Haha 2

Share this post


Link to post
Share on other sites
LouCu3

Thanks @null - worked a treat 😄

Share this post


Link to post
Share on other sites
Paddlepop

@null: Help! The second code won't run/action or whatever the word for working is. The first one worked perfectly. Does something need to be rearranged? I have no idea what I'm doing or what it means but I'll really appreciate it if I can easily get 9 years of posts.

Share this post


Link to post
Share on other sites
null

Hi Paddlepop.

Okay a few things to check:

  • Are you using Chrome?
  • Have you selected the "Expanded" view in the stream results page?

tempsnip.png

Share this post


Link to post
Share on other sites
Paddlepop

null: Yes to both. 

Share this post


Link to post
Share on other sites
TigerQueenofSheeba

Omg.... gobbledegook LOL

  • Like 1

Share this post


Link to post
Share on other sites
null
27 minutes ago, Paddlepop said:

null: Yes to both. 

Okay. Are you getting any messages output to the console like "Replaced ..."?

Share this post


Link to post
Share on other sites
null

Also try resizing your Chrome browser window so it is half the size of your screen.

Share this post


Link to post
Share on other sites
Paddlepop

null: When I paste the second code in it doesn’t seem to do anything. The little blue arrow prompt doesn’t appear in the console box after I’ve put the code. It did after the first part and had a few lines of something come up in there so it was obvious that it had done something. I can’t remember what it was. 
Just saw your next reply. I will try that when I’m on my laptop later today. Currently on my phone. 

Share this post


Link to post
Share on other sites
DirtyStreetPie
13 hours ago, null said:

I am not really a Javascript programmer - I just use stackoverflow.com until I get something that works! 😄

I'm on there quite often when I'm working on something (I'm actually a schoolteacher, former software engineer, so all the code I write these days is for the love of it). I can't do anything on StackOverflow though, because I don't have enough reputation points. Boooo!

  • Like 2

Share this post


Link to post
Share on other sites
purpleduck

I'm on a mac with firefox and safari - any ideas on how to extract with these browsers? Thanks!

Share this post


Link to post
Share on other sites
Paddlepop

@null: Here's a screenshot of what happens with the second code:

67138090_screenshotebcode.png.c0ff892bf5a14288431b116242ba2ef3.png

It doesn't do anything. The blue prompt arrow thingy doesn't reappear and the cursor just sits there blinking away. I reduced the window size like you suggested. 

ETA Laptop using Chrome. 

Edited by Paddlepop

Share this post


Link to post
Share on other sites
null
16 hours ago, Paddlepop said:

@null: Here's a screenshot of what happens with the second code:

67138090_screenshotebcode.png.c0ff892bf5a14288431b116242ba2ef3.png

It doesn't do anything. The blue prompt arrow thingy doesn't reappear and the cursor just sits there blinking away. I reduced the window size like you suggested. 

ETA Laptop using Chrome. 

How many posts are on the page you are trying to download? 

Share this post


Link to post
Share on other sites
Paddlepop

I don't know but it's less than a month's worth when I was brand new to posting, so probably not many. I just checked it now and it won't display any results. Says that there are no results to show in this activity stream yet. All day I've also had that annoying bloody message at the top of EB of "The search index is currently processing. Activity stream results may not be complete." The same message that was up the top for the first few weeks after the software update. It's meant that I couldn't find this thread and I only knew about it because I got a notification for it. 

Share this post


Link to post
Share on other sites
laridae

I've only got 168 pages of posts now. I had about 260 pages when I extracted them all the other day. Good thing I did it then! Hopefully they'll all come back when it finishes reindexing, but yes, I remember it took ages to finish when the site was upgraded, like weeks. That's going to hamper people who are trying to download their posts.

Share this post


Link to post
Share on other sites
chillipeppers

Just out of curiosity. What are you guys going to do with downloaded posts?

  • Thanks 1

Share this post


Link to post
Share on other sites
Paddlepop

Keep them in a file on my computer, and backed up on a hard drive. I'm especially after the posts from when DD was diagnosed with ASD. Lots of important dates and things to keep note of in those posts. 

  • Like 2

Share this post


Link to post
Share on other sites
purpleduck

I've checked my profile content and I have 67 pages.... I've tested just saving the pages as web page complete and html and that seems to, while its not pretty, it gets the text and dates down ok. It will take a while, but is doable....

 

Now to go check my other accounts :ninja:

Share this post


Link to post
Share on other sites
null
12 hours ago, Paddlepop said:

I don't know but it's less than a month's worth when I was brand new to posting, so probably not many. I just checked it now and it won't display any results. Says that there are no results to show in this activity stream yet. All day I've also had that annoying bloody message at the top of EB of "The search index is currently processing. Activity stream results may not be complete." The same message that was up the top for the first few weeks after the software update. It's meant that I couldn't find this thread and I only knew about it because I got a notification for it. 

If you are online this weekend, PM me and we can debug the problem.

  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...