Chapter 05. Data and Feature Preparation


$ cd ~
$ source kfvenv/bin/activate


$ {
  pip3 install lxml
  pip3 install pandas
  pip3 install scikit-learn
  pip3 install scipy
  pip3 install tables
}


$ cd ~/tmp/Kubeflow-for-Machine-Learning-From-Lab-to-Production/ch05/data-extraction/python-notebook/
$ python3 MailingListDataPrep.py


Traceback (most recent call last):
  File "MailingListDataPrep.py", line 118, in <module>
    records += scrapeMailArchives("spark-dev", y, m)
  File "MailingListDataPrep.py", line 41, in scrapeMailArchives
    root = etree.fromstring(r.text.replace('encoding="UTF-8"', ""),
  File "src/lxml/etree.pyx", line 3254, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1913, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1793, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 40
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: link line 32 and head, line 40, column 10