You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use case: Read PDF, DOC, CSV etc from a buffer or string without a fs.
I think there's also a need for a new CSVReader that can output one-dimensional lists like this:
A1: A2
B1: B2
A1: A3
B1: B3
Etc... And still as one row per doc or joined to one doc.
I think modifying PapaCSVReader for this is not possible, because its constructor has too many boolean arguments, and adding one more would make it even more confusing. Would need to change it to a single config object with clear overloads, which would be a breaking change.
Off topic but something to think about: it's confusing that the PDF reader is assigning id_ to the docs it produces but other readers are not. I think either all readers should do it (if configured in some clear manner to do do so) or none should do it. Hidden functionality like that is potentially dangerous.
The text was updated successfully, but these errors were encountered:
Agree, I just unified ID and metadata for readers and added to read the content of a file by using Buffer, see 73819bf
Luckily this also is a non-breaking change.
About your other request: Agree, PapaCSVReader has too many parameters, a single config object would help. I think it's an acceptable breaking change. You're welcome to send a PR and add your feature.
marcusschiesser
changed the title
Extract utility functions from file readers for standalone usage and make a better CSVReader
Output one-dimensional lists for CSVReader
Jun 6, 2024
Use case: Read PDF, DOC, CSV etc from a buffer or string without a fs.
I think there's also a need for a new CSVReader that can output one-dimensional lists like this:
A1: A2
B1: B2
A1: A3
B1: B3
Etc... And still as one row per doc or joined to one doc.
I think modifying PapaCSVReader for this is not possible, because its constructor has too many boolean arguments, and adding one more would make it even more confusing. Would need to change it to a single config object with clear overloads, which would be a breaking change.
Off topic but something to think about: it's confusing that the PDF reader is assigning id_ to the docs it produces but other readers are not. I think either all readers should do it (if configured in some clear manner to do do so) or none should do it. Hidden functionality like that is potentially dangerous.
The text was updated successfully, but these errors were encountered: