[Node-RED]SeleniumでWEBスクレイピング ~Node-REDで操作編~
2020/04/25 12:00
category:サーバ全般
前回でサーバ側の準備が整ったので、実際にNode-RED側から操作してスクレイピングします。
Node-REDにSeleniumノードを追加したらポチポチしていきます。
Remote URLはサーバを指定します。同一サーバであれば
http://localhost:4444/wd/hub
になりますね。
であとは、実際のWEB操作をなぞって配置していきます。
テキストボックスやボタンなどは、XPath指定の方が楽だと思います。
Firefoxでの取得方法は
・F12を押おすと、画面下に開発モード用の画面が開きます。
・タブでインスペクタを選択、HTML内のソースをポチポチすると実際の画面で該当箇所がわかるので
選択したい項目まで移動します
・右クリックのコピー → XPath(X) を選択すると、クリップボードにXPathの値が入ります。
・F12押して閉じる
で取得できます。
以下のフローは
【Seleniumサーバに接続する】~【IDとパスワードを入力してボタンを押す】~【移行した先の該当テキスト欄をデバッグに出力】~【画面キャプチャを行う】~【ブラウザを閉じる】~【キャプチャファイルを添付としてメールを送信】
【メール送信の完了を確認】~【キャプチャファイルを消す】
までの一連の操作です。
VNCなどでサーバ側にGUIログインしてると、実際にブラウザ開いて移行して閉じるまでの
動作確認が視認できるので面白いです。
以下フロー
[{"id":"34b503c5.e6dc44","type":"comment","z":"3d611774.0e7aa8","name":"Selenium WEBスクレイピング ~ キャプチャ画像メール ~ 完了後ファイル削除","info":"","x":320,"y":60,"wires":[]},{"id":"5e38438b.9ab55c","type":"open-web","z":"3d611774.0e7aa8","name":"","browser":"firefox","weburl":"http://localhost/scraping/","width":1024,"height":768,"webtitle":"Google","timeout":3000,"maximized":true,"server":"a7760f89.e7ace8","x":340,"y":100,"wires":[["9473ee5c.85d2e8"]]},{"id":"8e54eb0.cfe4b18","type":"inject","z":"3d611774.0e7aa8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":100,"wires":[["5e38438b.9ab55c"]]},{"id":"e3b1fe50.d3a028","type":"close-web","z":"3d611774.0e7aa8","name":"","waitfor":500,"x":650,"y":160,"wires":[["b32b4520.88796","f5fc5a87.ba7a78"]]},{"id":"b32b4520.88796","type":"debug","z":"3d611774.0e7aa8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":810,"y":160,"wires":[]},{"id":"9473ee5c.85d2e8","type":"set-value","z":"3d611774.0e7aa8","name":"user","text":"user","selector":"name","target":"in_data_id","timeout":1000,"waitfor":500,"x":510,"y":100,"wires":[["1acaf6d2.919db9"]]},{"id":"1acaf6d2.919db9","type":"set-value","z":"3d611774.0e7aa8","name":"passwd","text":"passwd","selector":"name","target":"in_data_pass","timeout":1000,"waitfor":500,"x":660,"y":100,"wires":[["787f6c0b.1f518c"]]},{"id":"787f6c0b.1f518c","type":"click-on","z":"3d611774.0e7aa8","name":"","selector":"xpath","target":"/html/body/form/input[3]","timeout":1000,"waitfor":500,"clickon":false,"x":120,"y":160,"wires":[["c6b6cae8.6a9bc8"]]},{"id":"c6b6cae8.6a9bc8","type":"get-text","z":"3d611774.0e7aa8","name":"テキスト取得","expected":"","selector":"xpath","target":"//*[@id=\"last_data\"]","timeout":1000,"waitfor":500,"savetofile":false,"x":300,"y":160,"wires":[["4994609e.07b9c8"]]},{"id":"4994609e.07b9c8","type":"screenshot","z":"3d611774.0e7aa8","name":"キャプチャ","filename":"/z-node-red/web.png","selector":"","target":"","timeout":1000,"waitfor":500,"x":490,"y":160,"wires":[["e3b1fe50.d3a028"]]},{"id":"7578b32a.f987e4","type":"e-mail","z":"3d611774.0e7aa8","server":"mail.example.com","port":"25","secure":false,"tls":false,"name":"test@example.com","dname":"スクレイピングメール送信完了","x":650,"y":220,"wires":[]},{"id":"d96942e0.7d9078","type":"change","z":"3d611774.0e7aa8","name":"メールデータ","rules":[{"t":"set","p":"payload","pt":"msg","to":"","tot":"msg"},{"t":"set","p":"topic","pt":"msg","to":"件名","tot":"str"},{"t":"set","p":"from","pt":"msg","to":"test@example.com","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":400,"y":220,"wires":[["7578b32a.f987e4"]]},{"id":"f5fc5a87.ba7a78","type":"file in","z":"3d611774.0e7aa8","name":"画像ファイル⇒バイナリ","filename":"/z-node-red/web.png","format":"","chunk":false,"sendError":false,"encoding":"none","x":170,"y":220,"wires":[["d96942e0.7d9078"]]},{"id":"aaac53e1.50e998","type":"complete","z":"3d611774.0e7aa8","name":"","scope":["7578b32a.f987e4"],"uncaught":false,"x":110,"y":280,"wires":[["3b778f05.0427a"]]},{"id":"f83251da.c7b278","type":"debug","z":"3d611774.0e7aa8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":570,"y":280,"wires":[]},{"id":"3b778f05.0427a","type":"exec","z":"3d611774.0e7aa8","command":"rm -rf /z-node-red/web.png","addpay":false,"append":"","useSpawn":"false","timer":"","oldrc":false,"name":"","x":340,"y":280,"wires":[["f83251da.c7b278"],[],[]]},{"id":"a7760f89.e7ace8","type":"selenium-server","z":"","remoteurl":"http://localhost:4444/wd/hub"}]