Dumpling: support specifying the character set of the output files #54217
Labels
component/dumpling
This is related to Dumpling of TiDB.
found/customer
Customers have encountered this bug.
type/feature-request
This is a feature requests on the product
Feature Request
Is your feature request related to a problem? Please describe:
Currently dumpling assumes the output is always utf8mb4. If a user wants the result in a different character set such as GBK, they will either need
iconv
or changing@@character_set_results
. The former requires 2x the storage space while the latter may cause unexpected transformation like insertion of unwanted\
and"
.Describe the feature you'd like:
Add a flag that specifies the output encoding (naming and options follow that of Lightning):
--data-character-set=«encoding»
, should support:--schema-character-set=«encoding»
, default to--data-character-set
if unspecified--data-invalid-char-replace='?'
, what to do with characters outside of the specified charset.Both CSV and SQL output format should be supported, and should round-trip with the corresponding Lightning settings (including the cases with custom CSV separator/terminator etc)
Describe alternatives you've considered:
Nothing, document that Dumpling can only output utf8mb4 and require users to use
iconv
to perform conversion.Teachability, Documentation, Adoption, Migration Strategy:
The text was updated successfully, but these errors were encountered: