-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add display capabilities to tokenizers objects #1542
base: main
Are you sure you want to change the base?
Conversation
ArthurZucker
commented
Jun 3, 2024
•
edited
Loading
edited
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…add-display fix git suggestion nit __repr__ should use Debug? small updates Simple lazygit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused as to why in some cases you impl Display for MyStruct
, use derive_more::Display; #[derive(Display)] struct MyStruct
and then use StructDisplay
.
#[serde(untagged)] | ||
pub(crate) enum PyDecoderWrapper { | ||
#[display(fmt = "{}", "_0.as_ref().read().unwrap().inner")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this native rust or are these capabilities from the derive_more
crate?
(the display
macro)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, are you sure .unwrap
is the right thing? Perhaps an .unwrap_or_else(some_default_display_fn)
would work best?
Custom(Arc<RwLock<CustomDecoder>>), | ||
#[display(fmt = "{}", "_0.as_ref().read().unwrap()")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
/// Base class for all models | ||
/// | ||
/// The model represents the actual tokenization algorithm. This is the part that | ||
/// will contain and manage the learned vocabulary. | ||
/// | ||
/// This class cannot be constructed directly. Please use one of the concrete models. | ||
#[pyclass(module = "tokenizers.models", name = "Model", subclass)] | ||
#[derive(Clone, Serialize, Deserialize)] | ||
#[derive(Clone, Serialize, Deserialize, Display)] | ||
#[display(fmt = "{}", "model.as_ref().read().unwrap()")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -220,6 +221,12 @@ impl PyModel { | |||
fn get_trainer(&self, py: Python<'_>) -> PyResult<PyObject> { | |||
PyTrainer::from(self.model.read().unwrap().get_trainer()).get_as_subtype(py) | |||
} | |||
fn __str__(&self) -> PyResult<String> { | |||
Ok(format!("{}", self.model.read().unwrap())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If read()
returns a Result
, then you can probably convert it to a PyResult
here rather than unwrapping it.
If it returns an Option
, then perhaps returning a default value rather than unwrapping
would be preferable.
Ok(format!("{}", self.model.read().unwrap())) | ||
} | ||
fn __repr__(&self) -> PyResult<String> { | ||
Ok(format!("{}", self.model.read().unwrap())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
#[pyo3(signature = ())] | ||
#[pyo3(text_signature = "(self)")] | ||
#[getter] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code equivalent?
tokenizers/display_derive/src/lib.rs
Outdated
_ => unimplemented!(), | ||
} | ||
}, | ||
_ => unimplemented!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to handle errors in macros, but I'd take a look rather than leaving a call to unimplemented!
use crate::utils::SysRegex; | ||
use serde::{Deserialize, Serialize}; | ||
|
||
use crate::tokenizer::{ | ||
Decoder, Encoding, PostProcessor, PreTokenizedString, PreTokenizer, Result, | ||
SplitDelimiterBehavior, | ||
}; | ||
use crate::utils::macro_rules_attribute; | ||
use crate::utils::SysRegex; | ||
use display_derive::StructDisplay; | ||
use serde::{Deserialize, Serialize}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you using rustfmt?
tokenizers/display_derive/src/lib.rs
Outdated
Fields::Named(fields) => { | ||
// If the struct has named fields | ||
let field_names = fields.named.iter().map(|f| &f.ident); | ||
let field_names2 = field_names.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let field_names2 = field_names.clone(); | |
let field_names_clone = field_names.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why do you need to clone?
tokenizers/display_derive/src/lib.rs
Outdated
let mut prefix = (&mut chars).take(100 - 1).collect::<String>(); | ||
if chars.next().is_some() { | ||
prefix.push('…'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't that what the ellipse crate was for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but it was too annoying to use 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then remove it from your Cargo.toml
file 😉
I think what you wrote is perfectly fine and does not require bringing in the extra crate!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I thought I removed it lol on it!
I wanted to remove the derive more crate and implement stuff
no it's not optimal but I need to go