URLからRSSフィードを検出するRubyGem Feedbag

先日ニュースを見ていたらこんな記事がありました。

大公開！ferretが毎日チェックしている、メディア・ブログ44選｜ferret

とても鮮度のいい情報を配信してくれるFerretさんがおすすめするのであれあば、押さえておくしかない！！

ってことで、これをポチポチRSS登録するかーと思ったんですが・・・・・

めんどくさい(´・ω・`)

40個ぐらいなのでそんなに時間はかからないと思うのですが・・・・

めんどくさい(´・ω・`)

ってことで、RubyでURLからRSSとかAtomとか検出できないかなーと思って探していたら！Gemがありました！！

( ･`д･´)さすが！！

　ってことで今日はそれを記事に。

Gemの名前はfeedbag

Githubのページはこちら。

github.com

更新はだいぶ前にとまっているけど、結果で言えば今（2017/01 現在）も使えました。

早速使ってみましょう。

コード全文

試したことといえば、

CSVファイルにフィードを知りたいURL一覧をまとめる
CSVファイルを読み込んで、フィードを検出する

です。

それでは早速！Gemfileはこれ。

# frozen_string_literal: true
source "https://rubygems.org"

# gem "rails"
gem 'feedbag'

そしてbundle install します。Cloud9環境でテスト的に実施しただけなので、--pathとかつけていないので適宜環境に合わせて設定してください。

$ bundle install
Fetching gem metadata from https://rubygems.org/..............
Fetching version metadata from https://rubygems.org/.
Resolving dependencies...
Installing mini_portile2 2.1.0
Installing open_uri_redirections 0.2.1
Using bundler 1.13.7
Installing nokogiri 1.7.0.1 with native extensions
Installing feedbag 0.9.6
Bundle complete! 1 Gemfile dependency, 5 gems now installed.
Use `bundle show [gemname]` to see where a bundled gem is installed.
$

そしてフィードを知りたいURLを列挙したファイルをlist.txtとして保存しておきます（1行1URL）。コードはこんな感じ。

require 'feedbag'
require 'csv'

csv_data = CSV.read('list.txt', headers: false)

csv_data.each do |url|
    feed_url = Feedbag.find( url.join() )
    puts "#{url.join()} : #{feed_url}"
end

注意点としては、csvから読み取った場合、csvのデータが配列（Array）になっています。Feedbagに配列のまま渡すとエラー（bad URI）になるので.joinで文字列に変換しています。

なんの装飾とかもしていないのでアレですが、こんな感じで表示されます。

http://googlejapan.blogspot.jp/ : ["https://japan.googleblog.com/feeds/posts/default", "https://japan.googleblog.com/feeds/posts/default?alt=rss", "http://googlejapan.blogspot.com/atom.xml"]
http://googlewebmastercentral-ja.blogspot.jp/ : ["https://webmaster-ja.googleblog.com/feeds/posts/default", "https://webmaster-ja.googleblog.com/feeds/posts/default?alt=rss", "http://googlewebmastercentral-ja.blogspot.com/atom.xml"]
https://www.ja.advertisercommunity.com/t5/%E3%83%96%E3%83%AD%E3%82%B0%E8%A8%98%E4%BA%8B/bg-p/adwords_blog# : ["https://www.ja.advertisercommunity.com/googleja/rss/board?board.id=adwords_blog", "https://www.ja.advertisercommunity.com/googleja/rss/boardmessages?board.id=adwords_blog"]
http://youtubejpblog.blogspot.jp/ : ["https://youtube-jp.googleblog.com/feeds/posts/default", "https://youtube-jp.googleblog.com/feeds/posts/default?alt=rss", "http://youtubejpblog.blogspot.com/atom.xml"]

フィードが検出されましたね。すごいすごい。

複数フィードが検出されているところがあったり、フィード自体が検出されていないところとかもあるので、そこらへんは手動で選別、指定する必要があるのかな。

今は画面表示していますがプログラムでやっているのでこれを加工して使っていけるかなと思います。