Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
IMAP Email Backup (smalldata.tech)
125 points by wheresvic4 on June 4, 2020 | hide | past | favorite | 80 comments


Recommend IMAPSync: https://imapsync.lamiral.info/

Have used it at pretty massive scale and it handles a hell of a lot of different options.


This one... and I used it to migrate 20 years of email from a G Suite account to Gmail account.

https://medium.com/@buro9/one-account-all-of-google-4d292906...


You freaked me out with the 20 years - I didn't think it'd been _quite_ that long. 2006 as "Google Apps for Your Domain", in case anyone else is wondering.


In my Gmail I have more than 20 years, all email I've sent and received since 1999. I've maintained a single source of emails and move it forward as I've changed provider.


+1 for IMAPSync, used it a few times to migrate business and personal email, including into G Suite. It enabled me to offer a very smooth transition for people, entirely hiding from them the complexities of what "move my email" really entails!


I tried this recently for moving from outlook.com to mailbox.org and it was painfully slow. Because it fetched every message individually, it would get throttled at about 10 messages/second.

I wrote my own thing using IMAPclient in python and it was a hundred times quicker.


I have moved more than 2000 mailboxes with over a terabyte of data with it. Worked well for me.


From Outlook.com? It was probably Outlook.com doing the throttling.


Another happy user of imapsync. The best thing IMO is that you can stop/get disconnected for some reason and it will effectively SYNC, without having to praise just to complete without errors


Garbage. Tried it with Fastmail from several different machines and it just locked up at the "Host1: connecting on host1 [imap.fastmail.com] port [993]" stage

...telnet to the same host:port on same machine works fine.


This. I've used it extensively and it's simply unparallelled.


Thunderbird is your desktop client that can archive, compress, and store all emails offline. I've even used Thunderbird for migrating all my email history, folders etc. between servers: simply add new account and drag and drop folders between accounts. Simple enough to be used by non technical people who do not want to:

>./imapgrab.py -d -v -f ~/user@email.com -s imap.mail.server -S -p 993 -u user@email.com -p password -m "_ALL_,-INBOX.Trash,-INBOX.Spam,-INBOX.Junk,-Trash,-Junk,-Spam"


Be careful using Thunderbird to move mail between IMAP4 servers, on Linux at least - I've had it lose messages on me, deleting them from the source before they're successfully written to the destination.

I was moving mail from a Gmail account to a Gandi account about a year ago, before the recent spurt of development activity on Thunderbird.


The only time I've ever lost mail was while trying to move a bunch of messages in Postbox (shares Thunderbird's backend). Wasn't even moving between servers-- this was just a folder-to-folder move.

Always copy, verify that the messages really are at the destination, then delete. I don't know why this isn't what Thunderbird does on its own when you move mail, but it isn't.


>I've had it lose messages on me, deleting them from the source before they're successfully written to the destination.

Copy folders then?


Am I the only one still using Evolution? I think its IMAP implementation is a lot better than Thunderbird's. I think this could be due to the fact that Evolution was initially written by people experienced in writing mail _servers_ but at this point I'm not sure how much of Evolution 1.0 code survives in 3.0.


The trick here is to not close the program while the move is taking place (or so I've found). Also using copy over move is a bit better. Stupid bug though, I know :p.


I always had horrible experience with thunderbird and any amount > 500 emails.

Thunderbird would start do it's things (moving emails for example) then freeze for a while and then all of a sudden stop in the middle of the operation complaining that it's a lot of work and asking me if I want to continue.

It's enraging every time. More so if you consider that IMAP is not transactional afaik.


Unfortunately IMAP is basically entire ephemeral.

The message IDs are mostly permanent, but the rfc says they can change so clients have to test them as per connection.

This means every time you check your mail with an IMAP client it's syncing some of the headers for every message.


I have really bad memory's about a Thunderbird migration, tried to move mails from ~exchange2003-time, but they where on a modern dovecot server, after i moved them everything was completely scrambled, tried with sylpheed and everything was perfect it even was much faster.


You can just put that command in a cron job, though. Or (and I assume the above command is just a layer on top of this) just write a short getmail config file and set cron to run the simple command

    getmail
once in a while.


Around two years ago Thunderbird crapped out on me and deleted all my archives. I've also lost emails when moving large (O(10^3)) IMAP folders.

Not sure how much work has been done on Thunderbird but I do an IMAP backup before archiving or making structural changes.

I should probably have an automated download for backup purposes set up ...


I've used the ImportExportTools NG Thunderbird addon to do the migration. I.e. right-click on all folders I want to download, start a new Thunderbird for the new mail provider, right-click on the new, empty folders and upload to them.


We've had lots of problems trying to use Thunderbird to migrate to Google suite (mostly network / TLS errors when connecting to Gmail).


Same. I ended up rate limiting my connection to 256kbps and it stayed stable.


Last time I tried doing an IMAP backup using... hmmm, some command-line tool (imapsync maybe?), I discovered that Gmail has some kind of rate limiting baked in and starts throwing errors after that. This would, naturally (/s), cause the tool to restart the process. Limiting your connection speed probably caused the backup to go slow enough that you didn't hit the request limit.


Gmail has an abundance of rate limits and, in general, you cannot expect one account to sustain more than about 1 request per second. You might get more but 1 QPS is easy to remember.

https://developers.google.com/gmail/api/v1/reference/quota


I also use Thunderbird Portable to backup my G Suite. Great tool, been using for years without issue. I open once a week and just let it sync.


I use OfflineIMAP: https://www.offlineimap.org/


I also use offlineIMAP, but I find that every few months it freezes and needs a kill -9 to exit. My current workaround is killing it once per week and letting my service manager restart it.


Yep, works very well for me, too.


I never was able to backup emails from gmail successfully ~ 10GB of mails right now. Everytime it fails somewhere in between because of reasons mentioned by various people elsewhere in this thread.

Every year I try and fail.

This year with the comments herein I again tried and get invalid credentials error with my Gmail.

1) Used mutt/getmail/python script ( password is correct verified it )

2) Even changed password and still same issue

3) I also tried to enable less secure apps to access email option in google settings. But keep getting the Authentication error. Can someone please point me to a good note on how to get Authentication error sorted ( maybe more to take care of ) ( I use ubuntu bionic , python 2.7.17, getmail 5.5 )

Error rahu@rahu:~/0del/_working/_backupEmail$ ./ss.sh IMAP Grab 0.1.4 --- List option selected Connecting to IMAP server via SSL Logging into IMAP server Traceback (most recent call last): File "./imapgrab.py", line 444, in <module> imapgrab() File "./imapgrab.py", line 150, in imapgrab ig_list = IG_list_mailboxes(ig_options) File "./imapgrab.py", line 193, in IG_list_mailboxes ig_imap.login(ig_options.username,ig_options.password) File "/usr/lib/python2.7/imaplib.py", line 523, in login raise self.error(dat[-1]) imaplib.error: [AUTHENTICATIONFAILED] Invalid credentials (Failure)


Have you tried Google Takeout?

https://takeout.google.com


I've found GYB works quite well:

https://github.com/jay0lee/got-your-back

It's created by the developer of GAM - a popular tool you may have heard of, for managing G Suite.


> I also tried to enable less secure apps to access email option in google settings. But keep getting the Authentication error.

Turning on 2FA and using Google's "app-specific password" eliminated errors of this sort for me recently.


I've used gmvault to backup ~3GB with no issues.

https://github.com/gaubert/gmvault


Look for imapsync on other comments


In the same vein the push towards not owning your data also comes with issues[0].

I get along well with a tool called offlineimap[1] though it's python2; you can even use it to sync imap backends[2]. I use it simply to populate a Maildir on my machines.

Sadly I have not found it to work with office365 (then again, what does other than outlook and thunderbird's proprietary subscription based extension?).

[0]: shameless plug: https://blog.dijit.sh/importance-of-self-hosted-backups

[1]: http://www.offlineimap.org/doc/quick_start.html

[2]: http://www.offlineimap.org/doc/backups.html


Hmm, my school uses office365 for its email, and I've managed to successfully hook it up to offlineimap for my email. I think the only real "special" thing that I had to do was have it use my email address (i.e. user@university.edu) as my username to authenticate with.

The relevant section of my .offlineimaprc (trimmed for brevity, edited for privacy):

  [Repository remote]
  type = IMAP
  remotehost = outlook.office365.com
  remoteuser = "user@university.edu"
  remoteport = 993


I think the admin might have enabled some compatibility features for you, I get:

    ~ $ offlineimap
    OfflineIMAP 7.3.2
      Licensed under the GNU GPL v2 or any later version (with an OpenSSL exception)
    imaplib2 v2.101 (bundled), Python v2.7.18, OpenSSL 1.1.1g  21 Apr 2020
    Account sync work:
     *** Processing account work
     Establishing connection to outlook.office365.com:993 (work-remote)
    Enter password for user '<user>@massive.se':
     PLAIN authentication failed: AUTHENTICATE failed.
     LOGIN authentication failed: LOGIN failed.
     ERROR: All authentication types failed:
     PLAIN: AUTHENTICATE failed.
     LOGIN: LOGIN failed.
     *** Finished account 'work' in 0:11
    ERROR: Exceptions occurred during the run!
    ERROR: All authentication types failed:
     PLAIN: AUTHENTICATE failed.
     LOGIN: LOGIN failed.
    
    Traceback:
      File "/usr/lib/python2.7/site-packages/offlineimap/accounts.py", line 293, in syncrunner
        self.__sync()
      File "/usr/lib/python2.7/site-packages/offlineimap/accounts.py", line 369, in __sync
        remoterepos.getfolders()
      File "/usr/lib/python2.7/site-packages/offlineimap/repository/IMAP.py", line 452, in getfolders
        imapobj = self.imapserver.acquireconnection()
      File "/usr/lib/python2.7/site-packages/offlineimap/imapserver.py", line 586, in acquireconnection
        self.__authn_helper(imapobj)
      File "/usr/lib/python2.7/site-packages/offlineimap/imapserver.py", line 459, in __authn_helper
        "failed:\n\t%s"% msg, OfflineImapError.ERROR.REPO)


Every IMAP client I've tried so far works with office365, though I haven't tried offlineIMAP with it.

Ones I've tried:

fetchmail

sylpheed-claws

evolution

mutt


I wrote a small tool in go which allows you to delete email in bulk by sender. It uses the gmail api and conforms to rate limits. It’s pretty neat! Slow, perhaps, the first time you run it since it fetches threads one at a time.

https://github.com/poundifdef/gmail-deleter

The next step would be to have it save and archive email threads. (Would this be useful to anyone?)


Reminder: You likely don't need your email provider to store years and years of old emails. Download them locally, then remove them from the service provider.

https://en.wikipedia.org/wiki/Third-party_doctrine

Thanks to the third-party doctrine, messages stored on the provider can be disclosed to the government without a warrant. Your emails have contact history, location histories, attachments, plans, all sorts of stuff.

Don't leave it on the server, where it's available for the taking. Even a password breach will deliver the entirety of it to an attacker.


Honestly, I'm a real proponent of keeping a clean digital life. I practice 0 inbox, aggressively delete old documents, photos on a regular basis. If I haven't touched anything in a few years, the chances that I'm going to need it later on down the line are slim and so off they go.

Important documents like tax info etc, are obviously kept for the minimum legal period but then again they don't take up all that much space.

I've been doing this for around 10 years now and so far I have not had any problems with missing old stuff, _touchwood_.

Long story short, just delete stuff you don't need altogether, makes backups a breeze and frees you from worrying about hackers exposing your musings from when you were a teenager :)


I have a 900GB FLAC music library, because cloud music services can disappear tracks down the memory hole from the middle of your playlists your now-dead loved ones made for you years ago[1], and also because they simply don't have a lot of the latest/newest interesting/underground music in their cloud catalogs.

Also:

    nostromo:~/Pictures$ du -sh LightroomMasters
    1.5T LightroomMasters
    nostromo:~/Pictures$ find LightroomMasters -type f | wc -l
      207655
    nostromo:~/Pictures$ ls -1d LightroomMasters/* | sort -n | head -1
    LightroomMasters/1997
    nostromo:~/Pictures$
I've been taking photographs of events in my life for a long time, and I look at them regularly, which brings me joy. Sometimes I send loved ones photos I took of them 3, 5, 10, 15 years ago that they've never seen, and it brings them joy as well.

There are different paths, and digital minimalism is only suitable for a subset of the population. Many of us derive immense value from having high fidelity records of our lives and the lives of those close to us.

Also, if you store your historical data for a long time, you can do super cool shit like this: https://writings.stephenwolfram.com/2012/03/the-personal-ana...

[1]: In case you can't tell, this happened to me.


Depends what means old document. My 8 years old induction stove glass surface broke by accident and I got full price money back from my home insurance. They were ok with photos of the device and pdf purchase receipt from my email archive.


Agreed, i also have a yearly reminder to move all e-mails older than X months to my local computer (of which i make backups.)


I use DevonThink (Mac-only). In addition to continuous archiving of email, it indexes and stores attachments, files all my receipts, and keeps my PDFs and archived notes there. Very powerful and reliable search.

https://www.devontechnologies.com/apps/devonthink


I've been using a similar setup for a few years, and can totally vouch for it. Emails are probably one of the most important things to backup. Not sure what `imapgrab` adds, since fetching mails from imap is exactly what getmail is doing.

That being said, if you go on and set up getmail, I'm not sure what's the point to continue to use mutt through imap (the author mentions decrypting the archive before reading them, so I assume it's not their daily driver and they keep using direct imap connection otherwise). Imap is still useful if you want to read mails on your mobile.

For the encryption part, I found cryfs [1] to be very useful. It's a fuse based app that encrypts files on the fly as they are written to the pseudo fs and stores them in partioned small size encrypted files. Basically, it means you can use the pseudo fs as a regular directory, store your mail in it and read them from it, and then the special layout of the encrypted directory with its small files is especially rsync-friendly (outside of email system, if I store a big file where only a small part has changed, only the files related to these small changes will be uploaded).

An other advantage of going full local : you can use notmuch [2]. It's a mail indexer that allows fast and advanced search within mails. I have all my mails since 2013 locally, and it takes less than a second to make a full text search within them.

The last thing that I love is the ability to use maildrop [3]. It's regexp based filtering and can forward the mail to anything (mailbox or program), so it allows to have advanced filters that can do quite crazy things (my phone sends me mails when my battery is depleted or fully charged, and it triggers desktop notifications on my computer). Not to be abused, obviously. Don't allow anyone to shut down your computer by sending a email :)

[1] https://github.com/cryfs/cryfs

[2] https://notmuchmail.org/

[3] https://www.courier-mta.org/maildrop/


First: I use neomutt + mbsync (formerly isync) + notmuch. The local searching was at first a nerd special interest project but has actually become very useful for searching, filtering and otherwise managing my mail.

Second: the button on the article to submit to HN seems to be somewhat... off? I wouldn't mind one that would submit in the first instance and then link to the HN thread on subsequent clicks but having a button that just loads a round into the chamber for all users clicking on it seems like incitement to blogspam.


The issue with local search is that you anyways end up doing search on the server when on mobile or when you are not on your home computer. But yes, definitely useful. I personally do 0 inbox so I hardly have anything to search, heh.

Regarding the submit to HN button: if the article has already been posted, HN will itself just link you to it rather than make a new submission. So it is a bit of an intelligent hack where it piggybacks on HN's own code, hope that assuages you :)


>it piggybacks on HN's own code, hope that assuages you :)

Haha, it does actually. That is very cunning!


I’m trying the same setup as you, but I cannot figure out how to use not much from within neomutt. I feel like I am missing something fairly straightforward because apparently not much is integrated into neomutt. Any pointers would be great!


I got it set up with Luke Smith's muttwizard [0]. The actual integration feels a bit clunky (the UI closes when entering a query, but then results are returned to the UI) but it does work quite well. I think the interesting corollary one finds is that a normal filter (I think it's called 'Limit' in mutt?) often does the job most of the time and one doesn't have to think about how the Inverted index is going to treat the query.

[0] https://github.com/LukeSmithxyz/mutt-wizard


Hello there,

I've maintained https://github.com/rcarmo/imapbackup since 2013 (and used early versions of it for years before that), so feel free to try it out and see if it fits your use case (mine was snapshotting existing IMAP mailboxes into mbox format, which was what we could use for migrating mailboxes across systems then, and also fit my personal use case for backups).


Thanks so much for crafting and maintaining imapbackup, Rui! It worked a treat in backing up tens of thousands of messages that Mail.app refused to archive properly.

The feature set is likely to appeal to many HN users:

* ZERO dependencies. (other than Python 2.5 or higher)

* Copies every single message from every single folder (or a subset of folders) in your IMAP server to your disk.

* Does incremental copying (i.e., tries very hard to not copy messages twice).

* Tries to do everything as safely as possible (only performs read operations on IMAP).

* Generates mbox formatted files that can be imported into Mail.app (just choose "Other" on the import dialog).

* Optionally compresses the result files on the fly (and can append to them).

* Is completely and utterly free (distributed under the MIT license).


If you're a gmail and a notmuch user, consider using lieer - https://github.com/gauteh/lieer. It uses the Gmail API to synchronize mail bidirectionally so that you can handle mail locally (except for muting threads).

If any Googler is reading this, why oh why is muting not part of the API?


The tool I used to use for this years ago is isync/mbsync, which worked really well. Nowadays I just use rsync on Maildirs because I don't have any IMAP mail on servers where I don't have shell access, so I'm a bit out of date what the modern state of these tools is. Interesting to hear of a variety of other options in this thread that I'd not come across back then.


> Nowadays I just use rsync on Maildirs because I don't have any IMAP mail on servers where I don't have shell access, so I'm a bit out of date what the modern state of these tools is. Interesting to hear of another option I'd not come across back then

If you use rsync through ssh, I would say that's a very good option, if anything because on your server, you only need ssh to be exposed to the internet rather than ssh + an imap server. Given how servers are constantly attacked, this is way safer.


imapsync is my personal favorite approach to this problem.

In addition to imap -> filesystem backups it supports synchronizing mail in both directions. And both source and destination can be remote systems.


The imapsync page https://imapsync.lamiral.info/ states "imapsync is not suitable for maintaining a synchronization between two active imap accounts while the user is working on both sides. Use offlineimap (written by John Goerzen) or mbsync (written by Michael R. Elkins) for bidirectionnal (2 ways) synchronizations."


Shout out to office365 for being a piece of crap when it comes to IMAP.

If you launch something like fetchmail or imapsync at it, it will close the connection in the middle of the sync because you're doing something nasty in his opinion.

Last time I switched jobs and had to download 18+ GB of emails it took me days to download them all off office 365, and I was to run it in a while true loop in screen. Luckily I didn't wait for the last day to start downloading my emails.


> Luckily I didn't wait for the last day to start downloading my emails.

And also luckily, your company didn't sue you or try to have you arrested for extracting 18+ GB of company property.


That is also 18+ gb of my life.


wait, you use work email for personal stuff??


No, but those years of work are part of my life.

I mean: my colleagues and I faced countless issues, we had both problems and celebrations, and I want to be able to remember all of that in the future.


I've used MailStore Home[1] for a few years and I really like it. It allows me to search across all of my mailboxes, even ones that have since been shutdown, for emails that I only remember a partial subject for.

[1]: https://www.mailstore.com/en/products/mailstore-home/


For gmail, gmvault is great:

http://gmvault.org/


I was in a similar situation a few years ago and decided to build an app for that: https://thehorcrux.com

My goal is to make it simple enough for my grandpa to use it. However, I'm still not there yet. Let me know what you guys think.


just a bit of website critique. your menu wraps on my 1920x1080 screen.

Contact wraps to the next line


Is there a way I can view such backups? Search in them? Something like offline read-only maildir viewer?

I made a separate thread here: https://news.ycombinator.com/item?id=23425995


For those using Protonmail: I did a little writeup on using offlineimap to keep a local backup.

https://github.com/peterrus/protonmail-export-linux


For me there's really no need to specifically backup my email. Instead, my IMAP client is configured to download all messages and attachments to local disk, and regular disk backups should take care of that.


I use offlineimap to sync mail locally and then periodically backup the mail folders to an encrypted USB drive. Using Protonmail with ProtonBridge to pull it down.


Its the reason that I still use POP. Full stop.


Same here. Happy POP user for decades. How do you deal with accessing mail from multiple devices and when away from your main machine tho?


I have my mailbox settings set to retain mail on the server for 30 days.

The fact that they are marked as unread from other machines is a pain, but not insuperable.


I have a cronjob on my NAS that runs mbsync every night. Really simple and efficient.


> python 2.7.x (does not work with python3)

Python 2 has a very, very long tail.


fetchmail is the tool of the trade, and it does not need python.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: