Version 2.71 Release plus Other Major Updates

Good news everyone! Two important tools that rss2email depends on have recently received major upgrades: feedparser and html2text. These should improve rss2email’s ability to handle feeds with poorly formed HTML and other weirdness.

The rss2email application itself also needed to be upgraded some to support these. Changes in this version:

  • Upgraded to feedparser v5.01! (http://code.google.com/p/feedparser/)
  • Upgraded to html2text v3.01! (https://github.com/aaronsw/html2text)
  • Potentially safer method for writing feeds.dat on UNIX
  • Handle via links with no title attribute
  • Handle attributes more cleanly with OVERRIDE_EMAIL and DEFAULT_EMAIL
You can leave a response, or trackback from your own site.

70 Responses to “Version 2.71 Release plus Other Major Updates”

  1. Alaska Jack says:

    Trying to install now, on an Xubuntu 10.10 system. Can extract the folder out of the archive, but then the folder won’t let me do anything with it. It keeps saying “permission denied.” It won’t even let me delete it!

  2. Alaska Jack says:

    OK, well, after several frustrating hours, I think I’m about ready to throw in the towel on this.

    Essentially, I’ve been trying to upgrade to 2.70 on my linux machine, so I could then export the opml, and bring that over to my Mac OS X machine.

    After a lot of hair-pulling (typical of Linux novices attemping non-intuitive procedures with minimal documentation), I eventually got to the point where the machine at least seemed to understand that I wanted to export. But it wouldn’t let me export to a file. I tried

    ./r2e opmlexport

    and

    ./r2e opmlexport file.opml

    and all manner of variations, but rss2email continually just dumped the output to the terminal itself, instead of to a file.

    OK. Well, I tried simply copying and pasting the output from the terminal to a text file. But neither the Mac nor the Linux install could understand it. “E: Unable to parse OPML file,” they both keep telling me.

    Jeesh, OK. Whatever.

  3. AJ: You’re almost there. Do this:

    ./r2e opmlexport > export.xml

    Then you move the export.xml file to your other rss2email install and import that.

  4. Alaska Jack says:

    Oh, thanks Lindsey. I am just learning my way around the terminal. Sorry for the frustration, and thanks for the help. – aj

  5. No problem. I just want you to be able to use and get value from the tool in the end.

  6. Alaska Jack says:

    Success.

    Now to figure out chron. Thanks again!

    – aj

  7. Alaska Jack says:

    Lindsey, has anyone else reported permissions problems with 2.71? I never did solve them. In contrast, 2.70 installed easily. Is it just me?

  8. AJ: no reports so far. If v2.70 works for you, stick with it for a while.

  9. Erik says:

    Permissions are all wrong on the .tar.gz.

  10. Erik: meaning you are unable to download the .tar.gz from here?

  11. Erik says:

    Hi Lindsey,

    No, the problem is after the tar.gz file is extracted. They need to be chmodded.

    $ ls -ld rss2email-2.71
    d——— 2 egh egh 4096 2011-03-04 09:40 rss2email-2.71
    $ chmod 755 rss2email-2.71
    $ cd rss2email-2.71
    $ ls -l
    total 248
    ———- 1 egh egh 7300 2011-03-04 09:11 CHANGELOG
    ———- 1 egh egh 3267 2010-11-12 10:34 config.py.example
    ———- 1 egh egh 168065 2011-02-20 14:41 feedparser.py
    ———- 1 egh egh 14847 2011-02-17 13:42 html2text.py
    ———- 1 egh egh 43 2006-03-16 15:43 r2e
    ———- 1 egh egh 58 2006-03-16 15:43 r2e.bat
    ———- 1 egh egh 7208 2009-12-21 14:04 readme.html
    ———- 1 egh egh 31352 2011-03-04 08:51 rss2email.py
    $ chmod 644 *

  12. Alaska Jack says:

    Lindsey -

    The problem I was having before was my own fault, and I figured out what I was doing wrong. But I don’t think I’m doing anything wrong with the permissions issue. I can download and unpack the archive fine, but then I can’t access the files within. Keep getting a permissions error. I couldn’t even delete them until I used sudo. Might be something you want to check.

    – aj

  13. AJ: yeah, that was reported recently. Something like this should do the trick. First change to your rss2email folder, then:

    sudo chmod 644 *

  14. Alaska Jack says:

    Thanks! I’ll try that when I get home. – aj

  15. Alaska Jack says:

    Ok, well, now that I’ve tried that, and spent the entire rest of the evening researching chmod and experimenting with fixes, allow me to provide some helpful advice for any other newbie struggling with this.

    (1) DO NOT FOLLOW THE INSTRUCTIONS LINDSEY GIVES, ABOVE. YOU WILL LOCK YOURSELF OUT OF PARTS OF YOUR COMPUTER. IF YOU TRY THIS WITH THE RSS2EMAIL-2.71 FOLDER IN YOUR /APPLICATIONS FOLDER, LIKE I DID, YOU WILL LOCK YOURSELF OUT OF THE TERMINAL.APP, WHICH YOU WILL NEED TO *FIX* THE PROBLEM.

    (2) Put the rss2email-2.71 folder where you want it; say, /Applications

    (3) From the terminal, navigate into /Applications.

    (3) Type “sudo chmod 755 rss2email2.71″ (without quotes). What you are doing here is giving yourself permission to get into the rss2email folder. In a nutshell, chmod is the command that sets these permissions, and 755 is the level of permissions you want.

    (4) Now, go into the rss2email folder by typing “cd rss2email2.71″ (again, no quotes)

    (5) Now that you are in, type, without quotes, “sudo chmod 644 *” This assigns level 644 permissions to everything in this directory.

    You should now be good to go. Good luck! Enjoy rss2email. In the meantime, I’ll be spending the rest of my evening booting from an external drive and trying to figure out how to fix my hosed system.

    -aj

  16. Sorry if there was miscommunication. You ran chmod * in the folder above rss2email? I thought I made it clear to change to the rss2email folder and then run chmod. If the rss2email folder itself had 000 permissions, then you may have needed to chmod that folder individually first.

  17. mcepl says:

    I am sorry, but I haven’t better (should I say just “good”?) place to file a bug, I have to bother you here. Could you please take a look at https://bugzilla.redhat.com/show_bug.cgi?id=677095 ? It seems so obvious that if this is real bug you should be flooded with complaints, but I haven’t found a reason, why this should be a mistake on our part. Do you have any idea, what’s going on?

  18. Data says:

    Thank you for this great software.
    But I have one question: Is it possible, to send only one mail with all the feed entries in one mail instead of seperate mails for every feed entry?

    Thank you ver much for a reply!

  19. That’s usually referred to as “digest mode” and is something that we want to do but haven’t had a chance yet.

  20. Data says:

    Thank you very much for your fast response. I think it would be a nice feature to have. Do you have any estimates when this feature will come?

  21. jalex says:

    “it just works” – TM

    :)

  22. Dieter_be says:

    Permissions is one of the things handled pretty badly within rss2email.
    It used to be that *everything* was executable (which is of course bad), but with this version *everything* is _not_ executable (which is also bad). If you want to execute a file – say r2e – you need to run chmod +x on it.

    Lindsey, I’ve mailed you about this before (but never got a response), but I strongly recommend you to start applying better practices (using a version control system, a bugtracker, watching out for bogus whitespace changes, being more careful with permissions, etc)

    I’m actually maintaining a git repository where one branch contains a copy of official rss2email versions, another branch contains non-intrusive patches that clean up the code base (fixed permissions, correct whitespacing, etc) and another branch with various (IMHO) useful (but compatiblity breaking) changes (proper logging, using xdg for configuration, make packaging easier for distributions, etc), and where I remove some useless clutter (such as overriding filesystem fsync behavior, introduced in 2.71, or the inclusion of dependencies)

    I started doing this because I really like the rss2email project, but was frustrated with the actual rss2email codebase and practices. I hope that Lindsey will adapt a source control system (preferably git, that way you can benefit back from some of my patches)

    For more information:
    See http://dieter.plaetinck.be/an_rss2email_fork_that_sucks_less
    https://github.com/Dieterbe/rss2email

    I’m willing to help you out setting up some stuff (like a repository on github), just let me know.

    Dieter

  23. JWells says:

    Again, many thanks for providing and maintaining rss2email! I’ve been using it (under Windows) for some time for many different feeds, and I’m currently using 2.71 (though the 2011/03/04 version of rss2email.py is still versioned as 2.70).

    However, I’ve just run into a problem with one set of feeds, e.g., http://feeds.thebrowser.com/BestOfTheMoment?format=xml where I’m getting 3 to 6 separate emails for each RSS entry. The emails are all sent at the same time, and each has the same X-RSS-URL: but a different X-RSS-ID:. For example the RSS entry:
    X-RSS-URL: http://feeds.thebrowser.com/~r/BestOfTheMoment/~3/3AiKmnn2A9Y/ieHl6d
    led to 3 emails with
    X-RSS-ID: http://thebrowser.com/best/41181 at http://thebrowser.com
    X-RSS-ID: http://thebrowser.com/best/feed/4682/41181 at http://thebrowser.com
    X-RSS-ID: http://thebrowser.com/best/feed/5594/41181 at http://thebrowser.com

    I would blame the feed, except that when I plug the URL into Thunderbird’s Feeds I don’t get any duplication.

    Is this a feed problem, or an rss2email problem?

    Thanks.

  24. JWells says:

    Update: My wrong! The emails aren’t all sent at the same time; they have the same Date: header, but the Received headers show they’re sent on successive runs of rss2email. Still, TB’s Feeds don’t show any duplicates. So, I’m back to wondering if this is a feed problem that rss2email should deal with?

  25. JWells says:

    Update 2: I see the problem as feedparser expanding GUIDs that have isPermaLink=”false”. I’ve opened an issue there, and I’ll see if they agree.

  26. Benjamin says:

    I am currently using rss2email to push feed updates to my gmail account every 2 minutes using the windows task scheduler. My problem currently is that the emails are coming in “marked as read” so I see them only if I proactively check my email. The stranger thing was it was working initially. I set up the updates to send to my gmail account and made sure that I have no filters that could possibly marking these emails as read. What could be causing this?

  27. KT. says:

    I’m running rss2email for about an year now. It is quite a neat tool for me. Thank you Lindsey!

    Today rss2email has started to dump error message on every run. Try this feed:
    http://www.computerworld.jp/rss/rss.rdf
    I found today’s feed includes ^M(control-M, 0x0d, CR) code in some entries and it seems rss2email do not like it.

    Is it possible to workaround this error?
    Thanks.

    === rss2email encountered a problem with this feed ===
    === See the rss2email FAQ at http://www.allthingsrss.com/rss2email/ for assistance ===
    === If this occurs repeatedly, send this to lindsey@allthingsrss.com ===
    E: could not parse http://www.computerworld.jp/rss/rss.rdf
    Traceback (most recent call last):
    File “/usr/local/share/rss2email/rss2email.py”, line 639, in run
    id = getID(entry)
    File “/usr/local/share/rss2email/rss2email.py”, line 404, in getID
    content = getContent(entry)
    File “/usr/local/share/rss2email/rss2email.py”, line 385, in getContent
    return html2text(c.value)
    File “/usr/local/lib/python2.7/site-packages/html2text-3.02-py2.7.egg/html2text.py”, line 450, in html2text
    return optwrap(html2text_file(html, None, baseurl))
    File “/usr/local/lib/python2.7/site-packages/html2text-3.02-py2.7.egg/html2text.py”, line 447, in html2text_file
    return h.close()
    File “/usr/local/lib/python2.7/site-packages/html2text-3.02-py2.7.egg/html2text.py”, line 185, in close
    HTMLParser.HTMLParser.close(self)
    File “/usr/local/lib/python2.7/HTMLParser.py”, line 112, in close
    self.goahead(1)
    File “/usr/local/lib/python2.7/HTMLParser.py”, line 164, in goahead
    self.error(“EOF in middle of construct”)
    File “/usr/local/lib/python2.7/HTMLParser.py”, line 115, in error
    raise HTMLParseError(message, self.getpos())
    HTMLParseError: EOF in middle of construct, at line 2, column 2
    rss2email 2.70
    feedparser 5.0.1
    html2text 3.02
    Python 2.7.2 (default, Jun 29 2011, 20:49:35)
    [GCC 4.2.2 20070831 prerelease [FreeBSD]]
    === END HERE ===

  28. KT,

    I am afraid that this is known bug with the the html2text project, which I don’t maintain myself.

    https://github.com/aaronsw/html2text/issues/10

    I reported this a few months ago, but it hasn’t been fixed yet.

    However, I think if you install BeautifulSoup it may be able to handle the feed:
    http://www.crummy.com/software/BeautifulSoup/

  29. Miranda says:

    I’ve been trying to get rss2email to work with a particular site, but no matter what I do, I get a 403 error. Here is the URL I am trying to add: http://forums.arcade-museum.com/external.php?type=RSS

    Any help would be greatly appreciated.

  30. Miranda: put quotes around the feed url as in:
    ./r2e add “http://forums.arcade-museum.com/external.php?type=RSS”

  31. Mike says:

    W: error 403 [32] “http://www.infoworld.com/news/feed”

    Any ideas – works in other clients?

  32. Mike: bizarre! the infoworld site is blocking rss2email based on its user-agent string. If you want a quick workaround, comment out the following line in rss2email.py:
    feedparser.USER_AGENT = "rss2email/"+__version__+ " +http://www.allthingsrss.com/rss2email/"
    Comment that line out by adding a # character to the beginning of the line like this:
    #feedparser.USER_AGENT = "rss2email/"+__version__+ " +http://www.allthingsrss.com/rss2email/"

  33. Mike says:

    Thanks Lindsey that fixed it – very bizarre.

    Is there a way to output to a local mail file instead of to an email server?

    Thanks again, great job!

  34. Mike: not really, rss2email only supports sendmail and SMTP sending.

  35. AR says:

    r2e’s output to the local MDA results in corrupt MBOX files where the ‘From ‘ line is not separated by a ‘\n’ from the preceding message. Because of this, mailx can’t parse r2e’s messages correctly.

    Can you please look into this? Also see my message from February 14, 2011 at 4:33 pm.

  36. Remco Rijnders says:

    Hi Lindsey,

    Thanks for this nice program which I’ve been using over the past year, first in Debian, now with Mageia. For the latter, I’ve worked on packaging it into a rpm and have also bumped into issues with file permissions on the archive (basically, there are no permissions at all on any of the files after unpacking the archive, one can’t even cd into the unpacked directory without doing a chmod first). I’ve been able to work around this, but for future versions it would really help if the permissions on the published .tar.gz were correct :-)

    Thanks!

  37. JWells says:

    Update: Yes, my duplicate email problem is due to feedparser expanding IDs to URIs when it shouldn’t do so:
    http://code.google.com/p/feedparser/issues/detail?id=296
    Unfortunately it’s unlikely to get fixed soon…

  38. Bryan says:

    Hello Linsey,

    I’ve been using this program for about a year now, and I love it. It works a lot better than a few of the commercial applications I’ve tried.

    I recently had to move all my email off of my ISP’s server and on to a paid email provider due to my ISP’s dynamic pool getting a minor blacklist. Not a big deal, but I’m having a problem getting rss2email to authenticate with my mail service. I get:

    Fatal error: could not authenticate with mail server “smtpout.secureserver.net:25″ as user “myemail@mydomain.com”
    Check your config.py file to confirm that SMTP_SERVER and other mail server settings are configured properly

    The settings are correct, I use them with other programs that require outgoing email. I have a feeling it’s something with the python smtp library, but I don’t know enough to track it down…

    Any ideas here? Yeah, it’s a GoDaddy account…

  39. Bryan,

    Try using “myemail” instead of “myemail@mydomain.com”. Also try “smptpout.secureserver.net” without the port number. Honestly I don’t think it’s an issue with the python SMTP library. It is very mature at this point.

  40. Bryan says:

    Hi Lindsey.

    I tried the email server without a port, and tried just “myemail” instead of the full name with server. To make sure that it wasn’t something else, I also tried setting up a special account with a simple name and an 8-char password, all letters. I also tried double quotes around everything instead of singles. All give me the same error.

    I’m pretty sure there’s no other authentication required with this account (SSL, etc.) since I’m using these same settings with fake sendmail (http://glob.com.au/sendmail/) and Thunderbird is happy with “password, transmitted insecurely” and no security.

    Only thing I can think of otherwise is, since I am running Windows, it’s something to do with that. If you need temporary access to the account in question, please let me know…

  41. Bryan: I actually wrote a mail server testing tool that you could try on your Windows box:
    https://github.com/turbodog/python-smtp-mail-sending-tester

    Using this you could pretty easily try different combinations of settings (e.g. use TLS, don’t use TLS) to see what works.

  42. Bryan says:

    Hi Lindsey.

    I tried your email script, it works great. I’m going to stash that away for later use. I did discover the problem with it – I noticed that you had options for both TLS and SSL in the email test script, and they could be turned on or off regardless of if you need to authenticate with your email server. I took a look at your code and found this section:

    if AUTHREQUIRED:
    try:
    smtpserver.ehlo()
    if not SMTP_SSL: smtpserver.starttls()
    smtpserver.ehlo()
    smtpserver.login(SMTP_USER, SMTP_PASS)

    If I read that correctly (new at python here,) if you have authentication turned on, the program assumes you want to use TLS during hello unless you tell it use SSL. I deleted the first two lines under try: so it would not use any security. That worked and I received the emails as expected. GoDaddy’s mail servers do not use any kind of tranport security, so the TLS request was tripping it up.

    Thanks for putting up with my questions.

  43. no__1__here says:

    Sorry if this is a duplicate, not sure the other arrived…

    I am new to rss2email and was wanting to have two different sets of feeds, each going to different email addresses. I don’t see an easy way to do this? Doing a “r2e new” replaces the old setup. I suppose I could do a “new/import/run” every time to alternate between the two but that seems a bit kludgy. Thanks!

  44. The easiest thing would be to have separate folders for each email address.

    Alternately, if you are comfortable with this, look at the r2e script and it will show you how you could use a separate feed file for each email address.

  45. Legrostdg says:

    Hi,

    I have a bug with “http://www.monde-diplomatique.fr/rss”. rss2email-2.71 gives me:

    W: feed [26] “http://www.monde-diplomatique.fr/rss” timed out
    Exception in thread Thread-26 (most likely raised during interpreter shutdown):

  46. Is that still happening with monde-diplomatique? It worked for me and may have been a temporary issue with their website.

  47. Legrostdg says:

    No, I still have the same error. Would it be possible for you to use github or so for rss2email, as Dieter_be said? It would be easier to track bugs, post my config, allow modifications by other coders…

  48. Legrostdg says:

    Thanks a lot! :-)

  49. Mehmedov says:

    Getting duplicate emails. Any idea? Or a bug? Interestingly, the mails are not exactly the same. They differ in their letter. Some capital, some initial capitals. Sometimes even 3 mails.

Leave a Reply