Downloading your PDFs from CiteULike using python and selenium

CiteULike, an online bibliography manager, is unfortunately shutting down. They have a service to export your bibliography as a BibTex file, but this does not include the PDFs you have uploaded to the site. Having web access to the PDFs is one of the main reasons I liked CiteULike (along with the tag cloud).

I have too many PDFs to download them all manually (over 2,000), so I wrote a script in Python to download the PDFs. Unlike prior scraping examples I’ve written about, you need to have signed into your CiteULike account to be able to download the files. Hence I use the selenium library to mimic what you do normally in a web-browser.

So let me know what bibliography manager I should switch to. Really one of the main factors will be if I can automate the conversion, including PDFs (even if it just means pointing to where the PDF is stored on my local machine).

This is a good tutorial to know about even if you don’t have anything to do with CiteULike. There are various web services that you need to sign in or mimic the browser like this to download data repeatedly, such as if a PD has a system where you need to input a set of dates to get back crime incidents (and limit the number returned, so you need to do it repeatedly to get a full sample). The selenium library can be used in a similar fashion to this tutorial in that circumstance.

9 Comments

by Andy Wheeler on February 28, 2019 • Permalink

Posted in Python

Tagged web-scraping

Posted by Andy Wheeler on February 28, 2019

https://andrewpwheeler.com/2019/02/28/downloading-your-pdfs-from-citeulike-using-python-and-selenium/

9 Comments

apwheele
/ March 1, 2019

Also appears that there is an API script using mechanize (and a CiteULike API I was not familiar with), https://github.com/AceCentre/ace-search-engine/blob/master/scripts/citeusync.py.

Via Will Wade on the CiteULike discussion forums, http://www.citeulike.org/groupforum/4546?highlight=57847#msg_57847

Reply
B
/ March 5, 2019

Does your method also download the tags? I too am a heavy user of the “tag cloud” and would like to download my entire library *including tags* and attachments (PDFs mostly). Many thanks.

Reply
- apwheele
  / March 5, 2019
  
  You can get those when you download the bibtex file, they are plopped into the keywords field.
  
  Reply
  - B
    / March 6, 2019
    
    Many thanks. I had searched my .bib for certain keywords and didn’t find them, probably this was an export or other user error on my part. I redid it and now see the tags in the keywords field, just as you described.
Louis guilbault
/ May 5, 2019

I have been using CiteUlike for the past 10 years or so. I just realized this week (May 3rd 2019) that CiteUlike closed recently. Is it still possible to download all of my references that are on CiteUlike. If so, could please let me know how to proceed. Thanks, Louis

Reply
B
/ May 5, 2019

Louis, the citeulike.org website is currently unreachable for me in a browser, which does not bode well. However, still worth trying to export your data using scripts.

In case you are not successful in using Andrew Wheeler’s neat script from this page, you may wish to also try another Python script (from Will Wade) that worked for me. It can handle cases where multiple PDFs are attached to a single citeulike entry. You’ll need to pass in your username and password as command-line arguments:

https://github.com/AceCentre/ace-search-engine/blob/master/scripts/citeusync.py

citeusync.py -u yourusername -p yourpassword

Reply
Eva
/ June 4, 2019

Hi I think that I am quite late but I just realised that Citeulike has been discontinued… I tried the Citeusync script and it doesn’t work as it throws this error:
>urllib2.URLError:

I guess that this is because the website/server/database is not reachable any more.

Do you know if Citeulike owners can be contacted in any way?

Reply
- apwheele
  / June 4, 2019
  
  Sorry I don’t have any special contact info for the owners.
  
  Reply

Web scraping police data using selenium and python | Andrew Wheeler

Andrew Wheeler

Downloading your PDFs from CiteULike using python and selenium

9 Comments

apwheele

B

apwheele

B

Louis guilbault

B

Eva

apwheele

Leave a reply to B Cancel reply

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange

Andrew Wheeler

Downloading your PDFs from CiteULike using python and selenium

Share this:

Related

9 Comments

apwheele

B

apwheele

B

Louis guilbault

B

Eva

apwheele

Leave a reply to B Cancel reply

Recent Posts

Categories

Site RSS Feeds

Follow Blog via Email

Top Posts & Pages

Stack Exchange