Update 6 readDocx.py --- an xml parser missing by ZhanpengZhang · Pull Request #50 · REMitchell/python-scraping

ZhanpengZhang · 2017-09-03T06:36:50Z

In python3.5, I've tried commonly used parsers like "html.parser" and 'lxml', but neither worked.
I mean when they are used, the command wordObj.findAll("w:t") always returns an empty list [], whereas 'xml' gives back what I expect, which is

[<w:t>A Word Document on a Website</w:t>, <w:t>This is a Word document, full of content that you want very much. Unfortunately, it’s difficult to access because I’m putting it on my website as a .</w:t>, <w:t>docx</w:t>, <w:t xml:space="preserve"> file, rather than just publishing it as HTML</w:t>].

Looking forward to your reply.
This is a great book, and let's make it even better!

I've tried commonly used parsers like "html.parser" and 'lxml', but neither worked. I mean when they are used, the command (wordObj.findAll("w:t")) always returns an empty list, whereas 'xml' gives back what I expect, which is [<w:t>A Word Document on a Website</w:t>, <w:t>This is a Word document, full of content that you want very much. Unfortunately, it’s difficult to access because I’m putting it on my website as a .</w:t>, <w:t>docx</w:t>, <w:t xml:space="preserve"> file, rather than just publishing it as HTML</w:t>]. Looking forward to your reply. Thanks. This is a great book, and let's make it even better!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update 6 readDocx.py --- an xml parser missing#50

Update 6 readDocx.py --- an xml parser missing#50
ZhanpengZhang wants to merge 1 commit intoREMitchell:masterfrom
ZhanpengZhang:patch-1

ZhanpengZhang commented Sep 3, 2017 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZhanpengZhang commented Sep 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZhanpengZhang commented Sep 3, 2017 •

edited

Loading