Parsing improvements for #31 #32

SL-Gundam · 2018-02-04T02:01:30Z

This fully fixes Variant5 #31

Also added an option for ConverterExtra to disable adding the CSS class after a tag. I do need ConverterExtra for table support so this CSS class was an annoying addition for me

Let me know what you think

SL-Gundam · 2018-02-04T02:30:36Z

Here an explanation behind the commits that fix some handling issues
cee262f - Fixes the handling of the font tag for Variant5 in #31
d0da9e0 - Fixes a cause for extra line endings that should not be present for various examples in #31

tzi · 2018-02-04T12:20:57Z

I'm pretty OK to merge the following feature if you add associated test cases:

Adjust fixInlineElementSpacing to not trigger for emptyTags
Allow to disable adding the CSS class after the tag

I'm not sure about the numeric arguments, because in the HTML spec you can have string attribute value without quotes.
So, It seems to be more a bug about the attribute value parsing than to allow numeric attribute value.

What do you think?

Cheers,
Thomas.

tzi · 2018-02-04T12:56:25Z

I separated 2 of your commits about inline space fixing in #33, because it's valuable! 🎉

SL-Gundam · 2018-02-04T15:27:31Z

Regarding the test cases: I am planning on adding those but before i spend time on that i wanted to at least have your opinion about the work in progress.
I'm still working on the other examples in #31 so it might be i'll work on them first and add the test cases after that.

Regarding attribute values:
Variant2 (#31) will have an unquoted string attribute value so your right that the current fix is not enough

SL-Gundam · 2018-02-07T23:21:15Z

Variant 4 (#31) is now fixed

6f40bcc - Ignore the last / for url and title comparison (Variant5)
56f4424 - Changed numeric attribute check cee262f to quoted and unquoted
ea771c3 - Variant4 had "*** This message was automatically generated by Microsoft Dynamics CRM ***" escaped as "\*\\*\* This message was automatically generated by Microsoft Dynamics CRM \*\**". This fixes that

Bacause of ea771c3 the test cases are failing. Do you have objections to this commit? if not i will adjust the test cases as required. Otherwise i would like your input on how to handle this

SL-Gundam · 2018-02-09T00:58:14Z

e024197 - Cleans up redundant spaces using rtrim. Since a lot of tests are expecting these spaces, they are currently failing

SL-Gundam · 2018-02-15T22:10:45Z

That should fix all test cases for the previously proposed changes

SL-Gundam · 2018-02-17T21:20:48Z

Some improvement for handling Variant1 in #31

7a0f6a1 - Markdownify only works properly with line feed EOLs so any incoming HTML is now converted
e4a1f9b - After a closing p tag the line breaks need to be flushed before text is outputted
a72f108 - Variant1 uses div tags around various lines. Div tags only take up one empty line so decreased this here to improve the layout of the from, to, sent and subject lines.
75cf897 - After a closing p tag a ltrim was not performed. This caused any text after a closing p tag to have a space in front of it. Since a ltrim was performed after br tags, i added these as well for p tags

Variant1 is not perfect but good enough and further improvements might effect other situations undesirably

SL-Gundam · 2018-02-26T23:02:17Z

552ba1b - Ignores Office namespace tag <o:p xmlns:o="#unknown"></o:p>.
I could not find any documentation on other possible office namespace tags. So further tags will have to be added when we encounter them

SL-Gundam · 2018-02-27T22:47:57Z

0bea029 - I had an issue where plaintext content would be parsed for markdown with undesired results. This addition allows me to pull the regular expressions out of Markdownify and apply only the escaping on my content.
475af00 - The Header escaping did not work as desired if the header wasn't on the first line of the text. This modifies that so that it works properly
dc77382 - This escapes any header markdown with =
27131ac - Trivial change but escapes the regular expressions properly

SL-Gundam · 2018-03-03T01:42:38Z

Some more improvements

f6a3290 - If an unquoted attribute ended the tag, the tag closing character would not be properly detected without this
9efd59c - Some trivial PSR coding standards and add   conversion to normal space
fc34c27 - More (|) character is now escaped

nlisgo · 2018-04-23T11:08:38Z

src/Converter.php

+     *
+     * @return array escapeInText
+     */
+    public function getescapeInText()


Nitpick: should this be getEscapeInText?

Most of the variables use a construction where the first word is not capped like here https://github.com/Elephant418/Markdownify/blob/master/src/Converter.php#L169 and here https://github.com/Elephant418/Markdownify/blob/master/src/Converter.php#L185

I kept the function name to adhere to the same name as the variable being retrieved. @tzi so far has not commented on this. But i think he will go over the code when I've finished all of my "improvements" and added the associated test cases per his request

So the short question is. Should the name of the function be 100% exactly the same as the variable being retrieved? or are there guidelines which decide the function name?

SL-Gundam · 2018-09-01T22:40:52Z

@tzi How to best handle the following code if encountered in html

<![if !supportLists]>
<![endif]>

It's comparable to the <o:p></o:p> but that one at least started and ended with the same tag
Would it be best to make it part of $emptyTags? like

'![if'
'![endif]'

…ing_Improvements

SL-Gundam · 2019-07-12T18:11:40Z

xmlns can have numbers in them apparently (Outlook 2010)

xmlns:udcp2p="http://schemas.microsoft.com/data/udc/parttopart"

SL-Gundam · 2020-05-01T11:53:58Z

I had a situation where an html email contained an empty table tag (microsoft onedrive email). This makes sure that this does not result in an error: b9b3f41

Fix PHP8.3 support

SL-Gundam added 4 commits February 3, 2018 22:57

Handle numeric argument values without quotes

cee262f

Adjust fixInlineElementSpacing to not trigger for emptyTags

d0da9e0

Allow to disable adding the CSS class after the tag

d64cd74

Adjusted test case for this commit d0da9e0

d2eeedc

tzi mentioned this pull request Feb 4, 2018

Fix inline spacing fixer for empty tag #33

Merged

SL-Gundam added 3 commits February 7, 2018 18:49

Fix URL difference on ending slash presence

6f40bcc

Handle unquoted attribute values

56f4424

Escape all * and _ instead of just 1 or 2

ea771c3

SL-Gundam added 2 commits February 9, 2018 01:04

Cleanup redundant spaces

e024197

One str_replace instead of three

9ab0dc9

SL-Gundam and others added 4 commits February 15, 2018 22:48

Adjust testcase for ea771c3

6c9d979

Add back the final rtrim that was removed in e024197

eea8837

Correct test case for e024197

55e5f54

Merge branch 'master' into Parsing_Improvements

84d75cc

SL-Gundam added 4 commits February 17, 2018 22:05

Change all html EOLs to line feeds

7a0f6a1

flushLinebreaks added before handling text

e4a1f9b

Decreased the number of lineBreaks after blockelements

a72f108

Added ltrim for html content after closing p tag

75cf897

SL-Gundam mentioned this pull request Feb 18, 2018

Strip all the remaining HTML tag when keepHtml option is disabled #31

Open

Ignore Office namespace o:p tags

552ba1b

Add function getescapeInText

0bea029

SL-Gundam added 3 commits February 27, 2018 20:45

Fix header markdown escaping

475af00

Add escaping for = markdown headers

dc77382

Add proper amount of slashes for escape regex's

27131ac

SL-Gundam added 3 commits March 3, 2018 00:42

Correction incase the last attribute is unquoted

f6a3290

Replace   with normal space

9efd59c

Add more character to escapeInText

fc34c27

nlisgo reviewed Apr 23, 2018

View reviewed changes

SL-Gundam added 3 commits February 1, 2019 18:23

Merge branch 'master' of github.com:Elephant418/Markdownify into Pars…

2c329f1

…ing_Improvements

Merge branch 'master' of github.com:Elephant418/Markdownify into Pars…

e4f91ce

…ing_Improvements

Allow numbers in xmlns attributes names

fd6763e

SL-Gundam added 2 commits May 1, 2020 13:41

Fix empty table tag

b9b3f41

Correct indentation

9623ae4

tzi and others added 2 commits February 23, 2024 17:49

Fix PHP8.3 support

7db946f

Merge pull request #1 from Elephant418/fix-php8.3

7c97008

Fix PHP8.3 support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing improvements for #31 #32

Parsing improvements for #31 #32

SL-Gundam commented Feb 4, 2018

SL-Gundam commented Feb 4, 2018 •

edited

Loading

tzi commented Feb 4, 2018

tzi commented Feb 4, 2018

SL-Gundam commented Feb 4, 2018 •

edited

Loading

SL-Gundam commented Feb 7, 2018 •

edited

Loading

SL-Gundam commented Feb 9, 2018

SL-Gundam commented Feb 15, 2018 •

edited

Loading

SL-Gundam commented Feb 17, 2018 •

edited

Loading

SL-Gundam commented Feb 26, 2018

SL-Gundam commented Feb 27, 2018

SL-Gundam commented Mar 3, 2018 •

edited

Loading

nlisgo Apr 23, 2018

SL-Gundam Apr 24, 2018

SL-Gundam commented Sep 1, 2018

SL-Gundam commented Jul 12, 2019

SL-Gundam commented May 1, 2020

Parsing improvements for #31 #32

Are you sure you want to change the base?

Parsing improvements for #31 #32

Conversation

SL-Gundam commented Feb 4, 2018

SL-Gundam commented Feb 4, 2018 • edited Loading

tzi commented Feb 4, 2018

tzi commented Feb 4, 2018

SL-Gundam commented Feb 4, 2018 • edited Loading

SL-Gundam commented Feb 7, 2018 • edited Loading

SL-Gundam commented Feb 9, 2018

SL-Gundam commented Feb 15, 2018 • edited Loading

SL-Gundam commented Feb 17, 2018 • edited Loading

SL-Gundam commented Feb 26, 2018

SL-Gundam commented Feb 27, 2018

SL-Gundam commented Mar 3, 2018 • edited Loading

nlisgo Apr 23, 2018

Choose a reason for hiding this comment

SL-Gundam Apr 24, 2018

Choose a reason for hiding this comment

SL-Gundam commented Sep 1, 2018

SL-Gundam commented Jul 12, 2019

SL-Gundam commented May 1, 2020

SL-Gundam commented Feb 4, 2018 •

edited

Loading

SL-Gundam commented Feb 4, 2018 •

edited

Loading

SL-Gundam commented Feb 7, 2018 •

edited

Loading

SL-Gundam commented Feb 15, 2018 •

edited

Loading

SL-Gundam commented Feb 17, 2018 •

edited

Loading

SL-Gundam commented Mar 3, 2018 •

edited

Loading