Tuesday, October 30, 2018

My text scraper works fine on large html files

I can take professional web pages and run them through the text scraper, works fine. One bug, I have not yet implemented the singleton tags, so it generates the occasional error message.  The extracted text is exactly the text observed. And I c an group the text a bit according tp the tag that marfked the text.

So I will go ahead and connect this to the join system.  This ism hpw bots read web pages, no need for all that formatting, just sort of group the plain text.  The bots will want a scraper and join  machine in every server.
Other than singletons, I have not yet done escaped characters. Working, minus two minus deficiencies soon to be fixed.

No comments: